tdwg / material-sample

A Task Group of the Observations and Specimen Records (OSR) Interest Group
2 stars 0 forks source link

Other Deliverable - catalogNumber review #6

Closed Jegelewicz closed 1 year ago

Jegelewicz commented 3 years ago

Task Group will make a recommendation [...] as to which class in the Darwin Core standard these properties belong which may also include recommendations for terms being revised, added, disambiguated, or deprecated. Depends upon definitions provided [in primary deliverable].

Current Darwin Core Placement/Definition

http://rs.tdwg.org/dwc/terms/catalogNumber

this term is a property of Occurrence

Defintion

An identifier (preferably unique) for the record within the data set or collection.

Examples

145732, 145732a, 2008.1334, R-4313

Comments

Jegelewicz commented 3 years ago

We also have

materialSampleID

http://rs.tdwg.org/dwc/terms/materialSampleID

this term is a property of MaterialSample

Definition

An identifier for the MaterialSample (as opposed to a particular digital record of the material sample). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the materialSampleID globally unique.

Examples

06809dc5-f143-459a-be1a-6f03e63fc083

Comments

Recommended best practice is to use a persistent, globally unique identifier.

Which fairly closely resembles what I think most museum professionals would call a "catalog number" except that museums notoriously apply a single catalog number to multiple MaterialSamples.

Given that, I think that catalogNumber would more appropriately associated with Record-level. While this might often be the same identifier as materialSampleID, more often it is an umbrella identifier for more than one MaterialSample (MSB:Mamm:5000 has 2 parts; skin and skull, which I would consider separate MaterialSamples that share a catalogNumber and in this case, an occurrenceID).

Given this, I suggest the following

catalogNumber

this term is a property of Record-level

Definition

Identifier assigned to something in some collection to distinguish it from other things in that collection. It is not expected to be globally unique, nor an IRI, and it could refer to a MaterialSample, an Organism, an image, a video, a sound, a human or machine observation. It is a local identifier for both material and digital entities, the use of which has significance within a collection.

Examples

MSB:Mamm:5000, https://arctos.database.museum/guid/MSB:Mamm:5000

Comments

As MaterialSamples may be transferred among institutions and subsampled, a given MaterialSample may have multiple associated CatalogNumbers.

tucotuco commented 3 years ago

@Jegelewicz The first comment looks like it is about recordNumber, not catalogNumber, which is given as the last term in the second comment.

tucotuco commented 3 years ago

My two cents on clarifying what these terms are for...

I am not sure recordedBy was meant to come into this conversation, so I will leave it out for now.

A catalogNumber is meant to be a number assigned to something in some collection to distinguish it from other things in that collection. It is not expected to be globally unique, nor an IRI, and it could refer to a specimen, all the parts of an organism, an image, a video, a sound, a human or machine observation. It is a local identifier for a class that is a combination of material and digital entities, the use of which has significance within a collection.

A materialSampleID is meant to be, ideally, a resolvable (IRI that returns metadata when requested) global unique identifier for a physical object. A good example of this would be an IGSN. At a minimum it must be unique within a dataset, but then its utility is limited to connecting it to other records in the dataset). It can not refer to an image, video, sound, human or machine observation. It could be used for a specimen, but it could also be used for MaterialSamples derived from specimens or other MaterialSamples. For example, a DNA extract could have a materialSampleID and be derived from a toe tissue sample (with a materialSampleID) of a tuco-tuco (with a materialSampleID). materialSampleID is very specifically an identifier for an instance of material entity for which there is a purpose in distinguishing it from all other material entities. Even though the MaterialSample that is identified can be related to those other material entities by containing them, being part of them, or being derived from them, it must not share an materialSampleID with them.

An occurrenceID is meant to be, ideally, a resolvable (IRI that returns metadata when requested) global unique identifier for an assertion that an Organism was present or absent, or a that no Organism identifiable as a member of a Taxon was present at a particular place at a particular time following some protocol for detection. It does not identify a material entity or a digital entity - it is an identifier for instances of an abstract concept of an exemplar of a Taxon having been present or absent, possibly backed by evidence in the form of material or digital entities.

Based on these I agree that a catalogNumber can't be rigorously defined as a property of an Occurrence (nor is it, in Darwin Core, it is merely grouped under that Class for convenience). Whatever the destiny of catalogNumber for organizational purposes, otherCatalogNumbers should share that fate.

Jegelewicz commented 3 years ago

@tucotuco thanks for keeping me honest! I did indeed copy the recordNumber definition! I fixed that, and now my next comment probably doesn't make sense.....

But you definition is much more concise - I edited my comment to include your contribution.

afuchs1 commented 3 years ago

@tucotuco #Based on these I agree that a catalogNumber can't be rigorously defined as a property of an Occurrence (nor is it, in Darwin Core, it is merely grouped under that Class for convenience).

I agree, Is there value in identifying the set of attributes such as catalogNumber, otherCatalogNumbers, preparation, disposition which can potentially relate to a number of categories PreservedSpecimen, FossilSpecimen, LivingSpecimen, MaterialSample and put them in their own 'dwc class/heading' in this case they are then easily identifiable as the physical catalogue number for the record in question - this doesn't change the term definitions just provides clarity that these terms are not occurrence based.

tucotuco commented 3 years ago

@afuchs1 Yes, definitely. If we are lucky and we end up agreeing that MaterialSample is a catch-all (superclass) of PreservedSpecimen, FossilSpecimen, and LivingSpecimen, then we can organize terms that apply only to those under MaterialSample. We'll see if catalogNumber and otherCatalogNumbers fit into that category or if they can be also be used for other things (HumanObservations, MachineObservations, MaterialCitations, or even Organisms).

smrgeoinfo commented 2 years ago

The key question is what does the value supplied for the catalogNumber property identify.
Does it identify 1. a thing in the world or 2. a digital object that represents a thing (material, information, or temporal) in the world. MaterialSampleId clearly identifies a thing in the world. If one takes the position that a catalog is a collection of descriptions of things (books in a library, specimens in a collection, data records in a dataset), then the second meaning seems applicable.

dagendresen commented 2 years ago

what does the value supplied for the catalogNumber property identify

The vast majority of specimens (estimated 90%) are not digitized and most are not even described in a "catalog" in the sense of a "book".

Jegelewicz commented 2 years ago

what does the value supplied for the catalogNumber property identify. Does it identify 1. a thing in the world or 2. a digital object that represents a thing (material, information, or temporal) in the world.

My hot take - we need both. catalogNumber traditionally identifies a thing in the world and it seems to me that we are missing some sort of recordNumber (as we have in Occurrence) which identifies a digital object that represents a thing in the world). I am totally willing to concede I am wrong - just my gut reaction.

Jegelewicz commented 1 year ago

@stanblum Connects a thing to a ledger with information about that thing. Broadened for observations - a number assigned to an observation is an equivalent thing.

would be an appropriate property

@smrgeoinfo this is only one of any kind of identifier that could be assigned

@stanblum Do not move to MaterialEntity

TG agrees

Jegelewicz commented 1 year ago

But does that mean we are saying MaterialEntity CANNOT have a catalog number?

Everyone seems to think it belongs to the dwc:Record class, but then again maybe not.

I think "catalog" carries too much baggage, but that is beyond our scope.

Jegelewicz commented 1 year ago

Could be used, but we are just going to ignore as out of scope for now.

Jegelewicz commented 1 year ago

But does that mean we are saying MaterialEntity CANNOT have a catalog number?

Could be used, but we are just going to ignore as out of scope for now.

dagendresen commented 1 year ago

Screenshot 2023-03-15 at 17 59 30

From the Google Cloud mirror of the GBIF index

tucotuco commented 1 year ago

Anything that can be cataloged can have a catalog number. That would make organization at the record level the most appropriate. MaterialEntities would be as welcome to have catalog numbers as anything else.

On Wed, Mar 15, 2023 at 2:04 PM Dag Endresen @.***> wrote:

[image: Screenshot 2023-03-15 at 17 59 30] https://user-images.githubusercontent.com/4330242/225385638-568a638f-edf4-41d7-bb1f-67c7dca68b78.png

From the Google Cloud mirror of the GBIF index

— Reply to this email directly, view it on GitHub https://github.com/tdwg/material-sample/issues/6#issuecomment-1470417486, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ724LGYNQ6B77GCEFA4TW4HZARANCNFSM5COQNXMQ . You are receiving this because you were mentioned.Message ID: @.***>

dagendresen commented 1 year ago

Agreed - catalogNumber is clearly widely used across all Darwin Core classes.

However, I tend to think that the use of catalogNumber for species occurrences should rather be redirected to recordNumber.

I have been thinking of classes having twins of identifiers (machine-friendly) and names (human-friendly) - and believe that many museum collection curators will not let go of their "catalogNumber"s as the "names" for their specimens.

taxonID -- scientificName organismID -- organismName eventID -- fieldNumber locationID -- locality occurrenceID -- recordNumber PreservedSpecimen (materialSampleID) -- catalogNumber

materialEntityID --> catalogNumber ??? --> materialName ??

dagendresen commented 1 year ago
basisofrecord count (catalogNumber not null) count (all) Percentage
HUMAN_OBSERVATION 1 594 760 354 1 962 066 853 81%
PRESERVED_SPECIMEN 205 898 386 213 445 792 96%
OBSERVATION 23 338 249 23 369 732 100%
OCCURRENCE 17 254 310 20 307 099 85%
MATERIAL_SAMPLE 15 563 708 51 217 283 30%
FOSSIL_SPECIMEN 8 742 898 10 111 229 86%
MACHINE_OBSERVATION 3 126 540 15 180 143 21%
MATERIAL_CITATION 2 050 186 3 010 520 68%
LIVING_SPECIMEN 1 277 307 1 996 625 64%
TOTAL 1 872 011 938 2 300 705 276 81%

From the Google Cloud mirror of the GBIF index