Open bellerophons-pegasus opened 3 years ago
I think that this is a good idea and since @dateset
is not so widely used yet, the opportunity to enhance the data model for it is a good one. Please do provide suggestions as to what fields would be appropriate.
Apologies for the long silence.
After looking through examples from other repositories and recommendations listed in #880 I would suggest this data model:
1. For a whole dataset (@dataset
) the record can already be compiled with the mandatory and the allowed fields, e.g.:
@dataset{<key>,
title = {<title>},
date = {<date, ISO>},
publisher = {<publisher>},
eprint = {<persistent uri>}, (or DOI)
eprinttype = {<persistent eprint/uri type, e.g. hdl>},
url = {<url>},
urldate = {<date of access, ISO},
editor = {<editors>},
author = {<creators/authors>},
(also editora, editoratype etc. can be used if necessary)
version = {<version>},
language = {<language>},
keywords = {<keywords>},
abstract ={<abstract/description>}
}
The only thing I do miss in here is a field to enter a hash like e.g. sha1:note
but in the long run I think a dedicated field might be better.
@incollection
can be used as follows:
@incollection{<key>,
title = {<title>},
date = {<date, ISO>},
publisher = {<publisher>},
eprint = {<persistent uri>}, (or DOI)
eprinttype = {<persistent eprint/uri type, e.g. hdl>},
url = {<url>},
urldate = {<date of access, ISO},
author = {<creators/authors>},
(also editora, editoratype etc. can be used if necessary)
version = {<version>},
language = {<language>},
booktitle = {<title of containing dataset>}
bookauthor = {<authors of containing dataset>},
editor = {<editors of containing dataset>},
keywords = {<keywords>},
abstract ={<abstract/description>}
}
Again a field for hash would be nice. Also, if the subdataset does not have a dedicated PID or URI, a field to indicate how to query for the subset would be required. This is related to R8 and R9 of the recommendations by the Data Citation WG of the RDA (Rauber, A., Asmi, A., Uytvanck, D. van, & Pröll, S. (2015). Data citation of evolving data: Recommendations of the Working Group on Data Citation (WGDC). DOI: 10.15497/RDA00016)
Using booktitle
and bookauthor
to store the information of the containing dataset feels a bit odd, but I think it should work. What I am not sure about is if editor
, which for @incollection
refers to the containing collection/dataset, is needed for the subdataset. What I know for sure from the use case I am coming from, is that we will need a way to denote editora
for a part of a dataset and for the containing dataset. These two might even be different from one each other.
I am attaching a file with some examples compiled from the repository I work with. examples.txt
Since we already have @dataset
I think it would be a bit odd to use @incollection
(which is really in a @collection
) for something that is "in a @dataset
". So on first glance I agree that @indataset
would make more sense.
But I'm wondering how exactly we should pull this off. You already mentioned that common field names like bootktitle
don't quite feel right and that we might get in trouble with the role of editor
etc. I'm also worried that an @indataset
entry does not necessarily have the same straightforward connection to its parent @dataset
as say an @incollection
has to its parent @collection
(I'm guessing one could have several 'nesting levels').
We already discussed UNF and friends in #880 and at the time I wasn't too sure how useful and widely used it would be, but if enough people think it is useful we might as well add something as fingerprint
, uniqueid
or some such to the data model now. (It might not only be interesting for data sets as UNFs, but also for software and the software heritage ID). If we implement this like eprint
we could add a fingerprinttype
field upon which we could branch representation if required. [What I would like to avoid, though, - here and generally - is to add all sorts of overly specific fields and entry types to the standard data model/styles that are only useful to a very small audience. I realise that with some things we might have a bit of a chicken-or-egg problem: Certain things might not be popular yet, because they are not properly supported by the software yet.]
Yes, in the long term a dedicated type for subsets of datasets like @indataset
might be the way to go. But I think this would also require some new field names.
Also, this has not to be solved right now. I pointed to this issue over at dataverse to get more attention and maybe already get another 'user' on board, to overcome the chicken-or-egg issue. Next things I'm going to do to get more attention and input to the @indataset
is to see what the RDA has to say on this.
Regarding adding a field like fingerprint
: I would very much welcome this already. But implementing it with fingerprinttype
might already be an overkill, because a notation like sha1:98e2c729d79c410b8e1bfd8d46517dbf3c2e49ab
or UNF:3:DaYlT6QSX9r0D50ye+tXpA==
suffices for the purpose of a reference.
The entry type
@dataset
is already used in repositories like e.g. zenodo (example) and it seems it will also be adopted by Dataverse (according to this issue).More information on the entry type
@dataset
was already stated in #880However, recommendations on data citation also include the advice that when only parts of a dataset are used those should specifically be cited.
Dataverse is already providing citation examples for files and the containing collections/datasets, e.g. DOI:10.7910/DVN/EDQQ4O/FKJNCC. There the file contained has the entry type
@incollection
in bibtex format.While
@incollection
works, I do not think it is ideal as many citation styles explicitly want to have special treatment for datasets; Since subsets (e.g. a subfolder) of datasets still are datasets representing them with@incollection
would hinder the separate treatment.Thus, I believe that
@dataset
should be expanded or a whole new entry type@indataset
analogous to@incollection
should be introduced.In the coming days I could provide suggestions for both cases (expanding
@dataset
and introducing@indataset
) if this is something that might be added to BibLaTeX.