Closed colinpmillar closed 6 years ago
i like the bibtex - might need you to explain how the reference is generated from the input as the two aren't quite the same on the face of it
I think this is a nice discussion: https://nennius.wordpress.com/2013/11/27/using-bibtex-for-dataset-citation/
The idea for us to make similar use of the following fields:
@misc{anudc:4896 author = {Claire O'Brien}, title = {{Impact of Colonoscopy Bowel Preparation on Intestinal Microbiota}, doi = {10.4225/13/511C71F8612C3}, howpublished= {\url{http://dx.doi.org/10.4225/13/511C71F8612C3}, note = {Accessed: 2010-09-30}}
I think this quite a good first thing to run with - is this something you would be happy pushing forward with this @ices-taf/project-admins and test it out on one of our stock assessments?
On a second note I think it makes sense to store these in a file:
citations.bib
as then it has the correct file extension to be recognised by bibliographic database software such as endnote, zotero etc.
looks good - what were you thinking we put for author of the dataset?
The hosting database/Institute?
this is a bit the issue - author is a publishing world handle which doesn't work as well in the data domain. In meta data about data, there is usually a handle for the originator of the data AND the distributor (the former being the owner and the latter being the collating/disseminating body). In most cases for example, ICES is the distributor but not the owner. Let's talk about this. So to be safe, if we have to use one handle it should be 'Distributor'.
Would avoid 'owner' at any time; 'distributor' seem to be the handle to use then. Can we just leave out the 'author' part? Or is that cheating...
just for info, for Marine Scotland, they have used the author label e.g: https://data.marine.gov.scot/dataset/salmon-and-sea-trout-fishery-statistics-2017-season-reported-catch-and-effort-method
and give it as "Marine Scotland"
But we should talk about this at the next meeting.
I wonder if we can constrain the elements of the citation records to be only what we need for
Then, for datasets with no doi, we can offer (or enforce?) the option of providing a meta-data record for that data set?
For now I get the sense we can move forward with a test implementation and finalise the details in the coming weeks. This would allow us to move on writing functions that automatically get the data sets during the bootstrapping phase. Thanks!
with the Marine Scotland example, it makes sense, i have no problem asserting ICES as author on outputs but on inputs (to TAF) it's a bit shaky. Next meeting it is :)
Well – it’s not till the 1st of Nov – but I will try and book a time for two weeks time, before the end of the 1st sprint deadline ☺
I’ll find a time and send round an invite later today
I agree we can move forward with the bootstrap tests now, and decide the citation file formats later.
The BibTeX format, used in LaTeX documents, is mainly tailored for citing traditional publications, such as articles, reports, and books. I have tried using BibTeX in the past when preparing manuscripts and have found it a bit unwieldy. Some of the shortcomings of BibTeX have been addressed by another contender called BibLaTeX, which improves support for European alphabets and adds new fields such as url
and urldate
.
Maintenance of BibTeX has been slow in the last two decades, so url
is not yet defined as a field, but many LaTeX users have saved web addresses under the generic note
field, as supported by several BibTeX software. Others have used the howpublished
field. The Wikibooks page https://en.wikibooks.org/wiki/LaTeX/Bibliography_Management goes as far as saying that "Recently, BibTeX has been succeeded by BibLaTeX".
I'm not really recommending for or against BibTeX or BibLaTeX, but perhaps suggesting that we:
The only field we need for the current test phase is URL. For example, the North Sea herring 2018 assessment used three R packages that are not standard CRAN packages. The URLs are: FLCore https://github.com/flr/R/raw/7b25d8f4/src/contrib/FLCore_2.6.6.tar.gz stockassessment https://codeload.github.com/fishfollower/SAM/legacy.tar.gz/25b3591 FLSAM https://codeload.github.com/flr/FLSAM/legacy.tar.gz/7e078fa
Add a data access field to the citation?
http://vocab.ices.dk/?ref=1435
To help identify private data
Closing as we are agreed on a format similar to bibtex.
implentation: Citations file: write an R function to parse and download data #243
Summary
This could be done simply via a list of web addresses and textual citations including version information, a little like a bibtex citations:
https://en.wikipedia.org/wiki/BibTeX
bootstrap/upload
folder.bootstrap/data
folder by anicesTAF
functionAn example following bibtex format for some SAG data would be something like:
which would result in the reference:
ICES Stock Assessment Database. Copenhagen, Denmark. ICES. [accessed date. http://standardgraphs.ices.dk
and would download an xml file of the stock assessment data for the stock with
assessmentkey = 9941
into thebootstrap/data
folderNotes
The same could be achieved with json format for each data set, and we could use some standard meta data feilds
Tasks required
Links to other issues?
Dependencies file: how do we define what packages are required #217