zotero / zotero-bits

CSL-related community feedback for Zotero
54 stars 8 forks source link

Data Set #22

Closed rmzelle closed 1 year ago

rmzelle commented 13 years ago

There have been requests for a "Data Set" item type, but the idea is still rough.

https://github.com/ajlyon/zotero-bits/wiki/DatasetType

Related CSL schema ticket: https://github.com/citation-style-language/schema/issues/74

erazlogo commented 13 years ago

What about the archival collection type? Here are discussion and ticket:

https://www.zotero.org/trac/ticket/661 https://www.zotero.org/trac/ticket/1023 http://forums.zotero.org/discussion/2981/styling-archival-material/ http://forums.zotero.org/discussion/391/1/hierarchical-item-relationships/#Item_48

These are now included in library catalogs and can be imported into Zotero. It would be a great aid in research, archives need to be cited in bibliographies, and this would need to be implemented in hierarchical item types anyway.

avram commented 13 years ago

Elena: Are you suggesting that archival collections and data sets could share a new type? I see some conceptual connection between the two, but I think they differ enough in presentation (data sets are often more like articles in citation, right?) that they'll need to be treated differently.

Could you draft a proposal for archival collections as a type and create a new issue for them?

avram commented 13 years ago

The RIS translator has the comment: // TODO: DATA, MUSIC That is, we would like to support datasets for importing from RIS. Maybe someone can describe what Endnote puts in a dataset item?

adam3smith commented 13 years ago

Dataset should probably have type/genre/medium field - some styles like APSA call for a label such as "computer file" etc. We also need to think about the distinction between producer and distributor, see e.g. here: http://www.lib.ncsu.edu/data/citingdatasets.html I think archive could probably be used for the distributor - in that case all the archive fields should be present.

One issue with datasets is that while they can, in general, be acomodated within other item types, this is often not consistent - I think I note somewhere on the forums that they're sometimes treated like articles, sometimes like monographs and sometimes like a third, hybrid category (e.g. neither italics nor quotation marks).

mfenner commented 12 years ago

DataCite has done some work on the metadata for datasets: http://www.cdlib.org/cdlinfo/2011/01/24/datacite-metadata-scheme-is-published/. Their required fields are Identifier, Creator, Title, Publisher, PublicationYear.

avram commented 12 years ago

That looks like a great set of fields (including the optional ones). Any objections to taking this as our model? Will it cover sufficiently broad use cases?

adam3smith commented 12 years ago

in general yes - as I note above, we do need a distributor field in addition to the publisher field for some styles (e.g. ICQMR) - the archive field works for that, but we do need to make sure it's included.

mfenner commented 12 years ago

Another useful data citation resource was published by the Digital Curation Centre earlier this week: http://www.dcc.ac.uk/resources/how-guides/cite-datasets. The guide also includes a section on elements of a data citation: author, publication date, title, edition, version, feature name and URI, resource type, publisher, unique numeric footprint, identifier, location. The most important ones are author, title, date, location.

mfenner commented 12 years ago

I think the DataCite metadata description as the model for a variety of reasons - especially because they already have more than one million datasets in their database and people will be citing them.

avram commented 12 years ago

Any thoughts on what we'll need from CSL to make this work?

rmzelle commented 12 years ago

I just noticed that datacite.org has started to offer citeproc JSON, but they use (the invalid) "misc" as item type, since CSL doesn't have a suitable item-type. See http://datacite.org/node/63

rmzelle commented 12 years ago

Ignoring the discussion about which fields are needed to handle datasets, is there already enough of a consensus that adding a "dataset" CSL item type is a good idea? @bdarcus greenlighted it in 2009 (http://forums.zotero.org/discussion/4771/item-types/?Focus=25845#Comment_25845), and I'm in support for it as well.

mfenner commented 12 years ago

I would really like to see a dataset CSL item type. Another step to make it easier to add dataset citations to tools handling references.

pdurbin commented 7 years ago

There's a related discussion at https://github.com/IQSS/dataverse/pull/3828#issuecomment-310395228 and I'd like to thank @adam3smith for testing Zotero with Dataverse! We'll keep an eye on this issue.

tangofil commented 3 years ago

I do have use for type: Dataset.

The fields needed are well described in http://www.dlib.org/dlib/january11/starr/01starr.html

"When the DataCite Consortium was founded in 2009, the development of a DataCite metadata scheme was an early priority." The Metadata Working Group did spend a few years creating the scheme, so I think we should just use their suggestion, unless there is something newer.

Social scientists use many datasets that should be cited. Sometimes they are surveys collected in a particular location between two dates (i.e. the source does not change), sometimes they are data from public sources like Eurostat, where new data is added at regular intervals (so the source does change).

adam3smith commented 3 years ago

This is implemented in CSL since version 1.0.1, currently available in Zotero using a workaround and will be available in Zotero as a regular item type in the future. No need for future explanations.

philippemiron commented 1 year ago

Is there any info on when it will be included? This issue is 10 years old.

adam3smith commented 1 year ago

Zotero never does ETAs, but they've been making changes to the data model, so probably not too far out (I'd guess months, but that's just a guess)

adam3smith commented 1 year ago

@dstillman since you're working on standard already (and it'd make a lot of people in my line of work happy if we got this into Zotero) could I advocate for including this into the next type update:

Here are the proposed field: Name: Dataset (there are some discussion here with "Data", "Data set," "Dataset" and several other contenders, but I think Dataset makes the most sense).

I've checked this against the latest iteration of the DataCite Metadata schema and it hits all relevant fields that could possibly be cited. I think it's worth keeping Archive in there for historical data that's not in a repository but in a physical archive.

bwiernik commented 1 year ago

Do we want to label it Medium or Format (which is what is currently used for Audio Recording, Film, Video Recording)? I don't have a strong preference for one over the other.

adam3smith commented 1 year ago

It's Format in DataCite, so if we're using that elsewhere already, let's stick with that.

bwiernik commented 1 year ago

Great, let's go with Format

HughP commented 1 year ago

Do we want to label it Medium or Format (which is what is currently used for Audio Recording, Film, Video Recording)? I don't have a strong preference for one over the other.

@bwiernick Audio and Film should be cited and referenced as either film or audio units; OR collections. So Medium for dataset here makes sense to me. Frankly datasets should not be audio or video content. There is a difference between a dataset and a collection. People need to be citing these groups of audio and video artifacts as collections.

adam3smith commented 1 year ago

Frankly datasets should not be audio or video content. There is a difference between a dataset and a collection. People need to be citing these groups of audio and video artifacts as collections.

The vast majority of people working on data and data-infrastructure would disagree with that (the are data repositories specifically dedicated to video data), but that's also not the point here, so we don't need to solve it.

The reason bwiernik mentioned that 'Format' is used for video-type item types is it is used for the format in which video content is delivered (e.g., on DVD, CD-ROM, Blueray), which is reasonably similar to the types of formats cited for data delivery (where still relevant) such as DVD, CD-ROM, so it makes sense to use the same variable. I honestly think either option would have been fine, but, as I said, aligning this with the Datacite metadata terminology probably makes sense.