tdwg / dwc-qa

Public question and answer site for discussions about Darwin Core
Apache License 2.0
49 stars 8 forks source link

collectionCode - Darwin Core Hour Input Form 2/14/2017 11:48:24 #40

Open iDigBioBot opened 7 years ago

iDigBioBot commented 7 years ago

A user submitted this information via the Darwin Core Hour webform: Timestamp: 2/14/2017 11:48:24 Please provide a topic of interest: How is the term "collectionCode" supposed to be used? Are there any existing standards recommendations? Are you capable of and interested in participating: No Who else would you recommend to participate in the presentation: What resources can you point to: Your name: Your email:

tucotuco commented 7 years ago

Darwin Core provides several terms to help people distinguish data sets, namely institutionCode, collectionCode, datasetName, and their related identifiers institutionID, collectionID, and datasetID. The institutionCode is meant to hold the official acronym for an organization, such as "MVZ" for the institution "Museum of Vertebrate Zoology". This acronym, along with a catalog number, is commonly used to identify cataloged material in scientific publications.

Practices vary within and among institutions in terms of how cataloging is done, and how specimens are identified. In one institution, the catalog number might contain information to designate which collection in that institution the specimen belongs to, for example "Herp 2371", while in another, the catalog number might not contain this information, for example, "2371". The collectionCode is meant to allow specimens in institutions that follow the latter practice to distinguish specimens from different collections within that institution when sharing with the rest of the world. Thus, institutionCode = "MVZ", collectionCode = "Herp", catalogNumber = "2371" is sufficient to identify the specimen of interest from among many at the Museum of Vertebrate Zoology with catalog number "2371".

The datasetName allows institutions to further separate subsets of data, or to name them explicitly. For example, the University of British Columbia Beaty Biodiversity Museum (institutionCode = "UBCBBM") has the Cowan Tetrapod Collection (collectionCode = "CTC"), within which are several distinct data sets, including one with datasetName = "Cowan Tetrapod Collection - Avian". As another example, the University of Kansas (institutionCode = "KU") has a herpetological collection (collectionCode = "KUH") as a single data set, the name of which is spelled out in datasetName = "University of Kansas Biodiversity Institute Herpetology Collection".

The corresponding identifier fields institutionID, collectionID, and datasetID are meant to contain globally unique and persistent identifiers for the three corresponding concepts. The first two of these terms, institutionID and collectionID would best be populated with references to entries in a registry of institutions and collections, such as the Global Registry of Biodiversity Repositories (http://grbio.org), for example, institutionCode = "NHMO", institutionID = "http://grbio.org/cool/2knt-7f1r", collectionCode = "BI", collectionID = "http://grbio.org/cool/wes0-t2ie".

The datasetID is best populated with an identifier for a published data set in which the record can be found. As such, a publication reference such as a Digital Object Identifier (DOI) is a good candidate, for example datasetID = "https://doi.org/10.15468/aomfnb" for records in the 2015 eBird Observation Dataset (see http://www.gbif.org/dataset/4fa7b334-ce0d-4e88-aaae-2e0c138d049e).

debpaul commented 7 years ago

Documentation page added. See https://github.com/tdwg/dwc-qa/wiki/Institutions-and-Collections. @tucotuco this is assigned to you so I will leave it up to you when you would like to close.

garymotz commented 7 years ago

Is it generally best practice to use a shortened URL for institutionID = "http://grbio.org/cool/2knt-7f1r" or is institutionID = "http://grbio.org/institution/natural-history-museum-university-oslo" acceptable as well?

Is the major intent to ensure that the value is a resolvable URI, regardless of whether or not it is a shortened or more-or-less human readable URL?

dagendresen commented 7 years ago

I am not concerned with the human-readability of the identifier (dwc:institutionID or any dwc:nnn-ID term). I would choose the short cooluri form from GRBio rather than the longer URL form. I value a long-term persistent resolvable identifier much more than human-readability!

Using VIAF numbers as institution identifiers might perhaps also be useful: institutionID = http://viaf.org/viaf/113146937739813830943/

VIAF is coming from the library community. VIAF numbers are permanent, but one institution (or person) might end up with more than one VIAF code. ISNI numbers are curated to be persistent and ensure than one institution or person have only one ISNI code. ORCID are a subset of the ISNI codes.

Might it be possible to aspire to assigning ISNI numbers for all institutions in GRBio that do not yet have such a number...? And later on to aspire to recommend "older" identifier systems used for biodiversity institutions and people to be linked (and possibly resolved) to the corresponding unique ISNI number...?

http://www.isni.org/ http://www.gbif.no/news/2016/bibsys-november-2016.html

tucotuco commented 7 years ago

I support the primacy of persistence and resolvability. I like the suggestion of promoting ISNI and I hope the people working on the NCD standards are also listening and can provide their perspectives.

On Fri, Sep 1, 2017 at 3:41 AM, Dag Endresen notifications@github.com wrote:

I am not concerned with the human-readability of the identifier (dwc:institutionID or any dwc:nnn-ID term). I would choose the short cooluri form from GRBio rather than the longer URL form. I value a long-term persistent resolvable identifier much more than human-readability!

Using VIAF numbers as institution identifiers might perhaps also be useful: institutionID = http://viaf.org/viaf/113146937739813830943/

VIAF is coming from the library community. VIAF numbers are permanent, but one institution (or person) might end up with more than one VIAF code. ISNI numbers are curated to be persistent and ensure than one institution or person have only one ISNI code. ORCID are a subset of the ISNI codes.

Might it be possible to aspire to assigning ISNI numbers for all institutions in GRBio that do not yet have such a number...? And later on to aspire to recommend "older" identifier systems used for biodiversity institutions and people to be linked (and possibly resolved) to the corresponding unique ISNI number...?

http://www.isni.org/ http://www.gbif.no/news/2016/bibsys-november-2016.html

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/dwc-qa/issues/40#issuecomment-326502926, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcP63YCIQIVlwcy6gt-WsthIZuYIMrSks5sd6cXgaJpZM4MAtvx .

godfoder commented 7 years ago

TDWG NCD Co-Convener (w/ @debpaul) Here.

For NCD (Standards track), our work is likely to directly borrow the terms, definitions, and examples from darwin core where there are existing elements, so there should be no duplication of effort or conflicts here.

For NCD (Implementation track), I like the idea of promoting the use of ISNI style identifiers. I was hoping to promote the use of ORCIDs for identifying people, so having an equivalent identifier for the collection and institution seems like a natural fit.

At least for institutions, it seems like the libraries may well have already done our work for us and issued institution identifiers for many places. Issuance of collection identifiers might be more problematic, but possibly also something that could be done with less curatorial control (uris, arks, handles, uuids) where there is already a strong institution identifier in place to provide context.

tucotuco commented 7 years ago

Excellent

On Fri, Sep 1, 2017 at 6:06 PM, Alex Thompson notifications@github.com wrote:

TDWG NCD Co-Convener (w/ @debpaul https://github.com/debpaul) Here.

For NCD (Standards track), our work is likely to directly borrow the terms, definitions, and examples from darwin core where there are existing elements, so there should be no duplication of effort or conflicts here.

For NCD (Implementation track), I like the idea of promoting the use of ISNI style identifiers. I was hoping to promote the use of ORCIDs for identifying people, so having an equivalent identifier for the collection and institution seems like a natural fit.

At least for institutions, it seems like the libraries may well have already done our work for us and issued institution identifiers for many places. Issuance of collection identifiers might be more problematic, but possibly also something that could be done with less curatorial control (uris, arks, handles, uuids) where there is already a strong institution identifier in place to provide context.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/dwc-qa/issues/40#issuecomment-326683843, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcP6_ZQJp-6zYSE34aE0JH3bFP4qojbks5seHHHgaJpZM4MAtvx .