tdwg / esp

Earth Sciences and Paleobiology Interest Group
13 stars 10 forks source link

What is best practice for assigning globally unique IDs? #13

Open dennereed opened 7 years ago

dennereed commented 7 years ago

What recommendation should we give for assigning globally unique ID numbers to fossil specimens. SESAR?

DimEvil commented 7 years ago

as long as they are globally unique, it's ok. In INBO we use this setup [occurrenceID] = N'INBO:VLINDERS:' + Right('000000000' + CONVERT(nvarchar(20),tMt.WRME_ID),8)

Where fixed is INBO (our institute) VLINDERS (shortName for the dataset) tMt.WRME_ID (the unique ID for a record within the dataset)

--> INBO:VLINDERS:00989254

(from other databases we have things like this: INBO:NBN:BFN0017900009ZWX where BFN0017900009ZWX is unique within the database))

It's also possible to just generate UniqueId's: https://www.guidgenerator.com/online-guid-generator.aspx There is pro and contra for using generated Unique ID's, mostly in the human readable aspect.

falkogloeckler commented 7 years ago

What recommendation should we give for assigning globally unique ID numbers to fossil specimens.

A recommendation would be to adopt a agreed standard like the CETAF members did: http://cetaf.org/cetaf-stable-identifiers See also our recent publication https://doi.org/10.1093/database/bax003

dennereed commented 7 years ago

Dimitri's suggestion matches the recommended best practice outlined in DwC for occurrenceID in the absence of a guaranteed GUID, which is to concatenate institutionCode + collectionCode + catalogNumber. The downside is that there is no guarantee of truly unique id.

Falko's suggestion to follow CETAF means generating stable URIs for each specimen in accordance with W3 linked data best practice.

The question then becomes, what is the best recommendation to paleobiologists, from researchers to institutions, on how to establish reliable, and persistent URIs. Anyone out there who can comment, or has experience generating stable URIs for collections?

hollyel commented 7 years ago

I think a guideline for a best practice is the best path to go down. It would be incredibly difficult to get everyone to use the same type of GUID. From my understanding, most institutions that are currently sharing data should already be generating GUIDs per record as they are required by some of the aggregators/portals.

At the NMNH we generate a GUID per specimen record and will soon being adding GUIDs to multimedia objects as well. We use the EZID resolving service with UUID tail automatically generated in our collections management system (EMu). That string then gets attached to the EZID shoulder that is specific to our museum name ID. The ID resolves through the EZID service, which bounces it back to a NMNH server.

Example: http://n2t.net/ark:/65665/3f693ef93-8ecc-4a3b-a376-fd0520be555d

DimEvil commented 7 years ago

It would indeed be great to make GUID's always resolvable, but that is another issue. And we do not need everybody to use the same type of GUID (indeed practically impossible :) ) as long as the GIUD's are indeed unique they can be used for simple data publishing.

So, I would recommend to make the ID's globally unique by using a series of prefixes, followed by the uniquerecord ID or use a GUID generator or GIUD service (and make sure these unique GUID are also available in the database or at least that you can make the connection between the published record and the record in the database)

debpaul commented 7 years ago

Hi @dennereed I note you say (in your first part of this ticket)...

globally unique ID numbers to fossil specimens.

Then you bring in

best recommendation to paleobiologists, from researchers to institutions, on how to establish reliable, and persistent URIs.

So, I think you get there's a difference between generating a GUID and coming up with a URI, right? A URI is a string that is unique, hence it can act as a type of GUID. A URI may also "resolve" (be a URL), but doesn't have to. But GUIDs do not have to be URIs, they can be UUIDs for example.

Some ideas to consider:

debpaul commented 7 years ago

Note the EZID system that @hollyel suggests is also nice because if for some reason (and there will usually be one) the domain name changes, you can update your information in the EZID system and the resolver service will then make sure the "old" URIs resolve (find) the associated data in the new place.

debpaul commented 7 years ago

Oh @dennereed, what is SESAR?

dennereed commented 7 years ago

@debpaul SESAR is the System for Earth Sample Registry. Its a service for generating GUIDs for geological specimens.

debpaul commented 7 years ago

aha yes @dennereed , IGSN, this acronym I know :-)

debpaul commented 7 years ago

I think the IGSN is very robust and may meet your needs very well. Does IGSN suit you as a researcher? Would paleo collections adopt?

dennereed commented 7 years ago

Thanks everyone for the commentary, suggestions and ideas. I'll take a crack at summarizing this and adding it to the use_case_1 document. This is clearly an important issue that deserves extensive documentation, suggestions, and examples.

dennereed commented 7 years ago

Hi all. Just came across the TDWG GUID Applicability Statement, which provides very good guidance on this topic.

debpaul commented 7 years ago

And from @idigbio https://www.idigbio.org/sites/default/files/internal-docs/idigbio-standards/iDigBioGuidGuide-2013-06-26.pdf

dennereed commented 7 years ago

Deb. Thanks for this! Very helpful resource FYI some of the links listed under GUID Resources in that document go to empty wikis or 404

e.g.

https://www.idigbio.org/wiki/index.php/Globally_Unique_IDs_(GUID) https://www.idigbio.org/content/idigbio-guid-statement

-- Denne Reed Sent with Airmail

On March 10, 2017 at 1:54:26 PM, Debbie Paul (notifications@github.com) wrote:

And from @idigbio https://github.com/idigbio https://www.idigbio.org/sites/default/files/iDigBioGuidGuideForProviders_v1.pdf

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/paleo/issues/13#issuecomment-285795109, or mute the thread https://github.com/notifications/unsubscribe-auth/AGPBkMKaVvtwryw37AjvfKaWTdw0sKoSks5rkcaRgaJpZM4MWZfb .

debpaul commented 7 years ago

Both of those links work for me (but I'm logged in at iDigBio). I wonder if it's because they are technically deprecated. The first 404 you indicate - shows me our page - with a note that says: This Wiki is not current and the material is here for historical purposes. For info on GUIDs go here: https://www.idigbio.org/content/guid-guide-data-providers-0

dennereed commented 6 years ago

Created an FAQ wiki page on this topic at https://github.com/tdwg/paleo/wiki/Unique-Identifiers