ices-taf / doc

Community documentation for the TAF project
http://taf.ices.dk
11 stars 4 forks source link

Citations file: how do we define and reference what data is used #239

Closed colinpmillar closed 6 years ago

colinpmillar commented 6 years ago

Summary

This could be done simply via a list of web addresses and textual citations including version information, a little like a bibtex citations:

https://en.wikipedia.org/wiki/BibTeX

An example following bibtex format for some SAG data would be something like:

@Misc{sag,
 title     = "ICES Stock Assessment Database",
 publisher = "ICES",
 year      =  2018,
 address   = "Copenhagen, Denmark",
 uri = "https://sg.ices.dk/download/XMLDownload.ashx?assessmentkey=9941"
}

which would result in the reference:

ICES Stock Assessment Database. Copenhagen, Denmark. ICES. [accessed date. http://standardgraphs.ices.dk

and would download an xml file of the stock assessment data for the stock with assessmentkey = 9941 into the bootstrap/data folder

Notes

The same could be achieved with json format for each data set, and we could use some standard meta data feilds

Tasks required

Links to other issues?

Dependencies file: how do we define what packages are required #217

neil-ices-dk commented 6 years ago

i like the bibtex - might need you to explain how the reference is generated from the input as the two aren't quite the same on the face of it

colinpmillar commented 6 years ago

I think this is a nice discussion: https://nennius.wordpress.com/2013/11/27/using-bibtex-for-dataset-citation/

The idea for us to make similar use of the following fields:

@misc{anudc:4896 author = {Claire O'Brien}, title = {{Impact of Colonoscopy Bowel Preparation on Intestinal Microbiota}, doi = {10.4225/13/511C71F8612C3}, howpublished= {\url{http://dx.doi.org/10.4225/13/511C71F8612C3}, note = {Accessed: 2010-09-30}}

I think this quite a good first thing to run with - is this something you would be happy pushing forward with this @ices-taf/project-admins and test it out on one of our stock assessments?

On a second note I think it makes sense to store these in a file:

citations.bib

as then it has the correct file extension to be recognised by bibliographic database software such as endnote, zotero etc.

neil-ices-dk commented 6 years ago

looks good - what were you thinking we put for author of the dataset?

Lotte-ices-dk commented 6 years ago

The hosting database/Institute?

neil-ices-dk commented 6 years ago

this is a bit the issue - author is a publishing world handle which doesn't work as well in the data domain. In meta data about data, there is usually a handle for the originator of the data AND the distributor (the former being the owner and the latter being the collating/disseminating body). In most cases for example, ICES is the distributor but not the owner. Let's talk about this. So to be safe, if we have to use one handle it should be 'Distributor'.

Lotte-ices-dk commented 6 years ago

Would avoid 'owner' at any time; 'distributor' seem to be the handle to use then. Can we just leave out the 'author' part? Or is that cheating...

colinpmillar commented 6 years ago

just for info, for Marine Scotland, they have used the author label e.g: https://data.marine.gov.scot/dataset/salmon-and-sea-trout-fishery-statistics-2017-season-reported-catch-and-effort-method

and give it as "Marine Scotland"

But we should talk about this at the next meeting.

I wonder if we can constrain the elements of the citation records to be only what we need for

  1. to download the data
  2. to create a textual reference
  3. access the fuller meta data elsewhere

Then, for datasets with no doi, we can offer (or enforce?) the option of providing a meta-data record for that data set?

For now I get the sense we can move forward with a test implementation and finalise the details in the coming weeks. This would allow us to move on writing functions that automatically get the data sets during the bootstrapping phase. Thanks!

neil-ices-dk commented 6 years ago

with the Marine Scotland example, it makes sense, i have no problem asserting ICES as author on outputs but on inputs (to TAF) it's a bit shaky. Next meeting it is :)

colinpmillar commented 6 years ago

Well – it’s not till the 1st of Nov – but I will try and book a time for two weeks time, before the end of the 1st sprint deadline ☺

I’ll find a time and send round an invite later today

arni-magnusson commented 6 years ago

I agree we can move forward with the bootstrap tests now, and decide the citation file formats later.

The BibTeX format, used in LaTeX documents, is mainly tailored for citing traditional publications, such as articles, reports, and books. I have tried using BibTeX in the past when preparing manuscripts and have found it a bit unwieldy. Some of the shortcomings of BibTeX have been addressed by another contender called BibLaTeX, which improves support for European alphabets and adds new fields such as url and urldate.

Maintenance of BibTeX has been slow in the last two decades, so url is not yet defined as a field, but many LaTeX users have saved web addresses under the generic note field, as supported by several BibTeX software. Others have used the howpublished field. The Wikibooks page https://en.wikibooks.org/wiki/LaTeX/Bibliography_Management goes as far as saying that "Recently, BibTeX has been succeeded by BibLaTeX".

I'm not really recommending for or against BibTeX or BibLaTeX, but perhaps suggesting that we:

  1. Look at how other scientific workflow systems keep track of data and software citations (I will start looking into this)
  2. Consider which fields we definitely need, and which fields could be optional for TAF users
  3. Consider whether it's practical to have a similar or different citation format for data vs. software

The only field we need for the current test phase is URL. For example, the North Sea herring 2018 assessment used three R packages that are not standard CRAN packages. The URLs are: FLCore https://github.com/flr/R/raw/7b25d8f4/src/contrib/FLCore_2.6.6.tar.gz stockassessment https://codeload.github.com/fishfollower/SAM/legacy.tar.gz/25b3591 FLSAM https://codeload.github.com/flr/FLSAM/legacy.tar.gz/7e078fa

colinpmillar commented 6 years ago

Add a data access field to the citation?

http://vocab.ices.dk/?ref=1435

To help identify private data

colinpmillar commented 6 years ago

Closing as we are agreed on a format similar to bibtex.

implentation: Citations file: write an R function to parse and download data #243

colinpmillar commented 5 years ago

Example of data publishing:

https://risweb.st-andrews.ac.uk/portal/en/datasets/finescale-harbour-seal-usage-for-informed-marine-spatial-planning-dataset(4f86d1c0-f999-4ca2-b6a8-6ea63a83400b).html