tdwg / material-sample

A Task Group of the Observations and Specimen Records (OSR) Interest Group
2 stars 0 forks source link

`dwc:occurrenceID` in the context of `dwc:catalogNumber` review #21

Closed cboelling closed 2 years ago

cboelling commented 2 years ago

Originally posted by @tucotuco in https://github.com/tdwg/material-sample/issues/6#issuecomment-903191602

An occurrenceID is meant to be, ideally, a resolvable (IRI that returns metadata when requested) global unique identifier for an assertion that an Organism was present or absent, or a that no Organism identifiable as a member of a Taxon was present at a particular place at a particular time following some protocol for detection. It does not identify a material entity or a digital entity - it is an identifier for instances of an abstract concept of an exemplar of a Taxon having been present or absent, possibly backed by evidence in the form of material or digital entities.

From the current definitions of dwc:Occurrence and dwc:OccurrenceID I understand that a dwc:occurrenceID is intended an identifier for a dwc:Occurrence (the being present of one or more individuals of taxon X (somewhere) within geographical location L (at some time) during time interval T), which is, while less tangible than a preserved specimen, a real thing and exists independently of what biodiversity researchers think or do.

The concept described in the quote is different and on the level of assertions (i.e. what a human agent thinks about an ocurrence for a given X, L, T) including assertions of absence, i.e. that a given occurrence, specified by X, L, T, did not occur.

I just would like to make sure I understand the existing terms correctly or if there are new requirements for those. I created this as a new issue to keep the discussion in the original issue #6 close to its core topic.

Jegelewicz commented 2 years ago

One issue that I see here is that for most museums - the "catalog number" describes BOTH the "occurrence" AND the object(s) or organism in the collection related to it. So museums are going to need to up their game in creating unique IDs for occurrence, material sample, and so on....

cboelling commented 2 years ago

for most museums - the "catalog number" describes BOTH the "occurrence" AND the object(s) or organism in the collection related to it. So museums are going to need to up their game in creating unique IDs for occurrence, material sample, and so on....

I think that this is a key point for the issues tackled in this task group and in particular also for #11 on linking the type of artefact used to infer an occurrence to a pointer for that occurrence.

I am tempted to think that given the current definitions of the relevant terms in DwC that a dataset that uses the "catalog number" to describe BOTH the "occurrence" AND the object(s) or organism in the collection related to it is not DwC compliant and cannot be treated as such by recipients of that data.

dagendresen commented 2 years ago

Recall, as is already covered in our discussions, that an important contributing reason why museums describe their specimens as occurrences is that dwc:occurrenceID is required when publishing specimens in GBIF ;-)

deepreef commented 2 years ago

One issue that I see here is that for most museums - the "catalog number" describes BOTH the "occurrence" AND the object(s) or organism in the collection related to it. So museums are going to need to up their game in creating unique IDs for occurrence, material sample, and so on....

THANK YOU!!! This is something I've been making noise about for years (and why I'm so excited about this Task Group!)

In summary: Catalog Numbers are assigned to physical things in collections (i.e., MaterialSample instances). When DwC expanded to accommodate unvouchered observations, the core record became an Occurrence (~=intersection of the organism represented by the physical thing and the place and time of its extraction from nature), and its (mandatory) identifier was occurrenceID. In the vast majority of cases, the cardinality of Catalog Numbers to (true) occurrenceID values is 1:1 (because most specimens in Museums were only collected once) -- so representing physical objects identified by a unique catalog number using the unique identifier representing its extraction from nature is not that big of a problem. But as soon as you get derivative MaterialSample instances (e.g., tissue samples extracted from whole specimens), we break the 1:1 cardinality. In many/most cases, the tissue sample gets its own unique catalog number, separate from the voucher specimen, but it shares the same circumstances of extraction from nature (Occurrence), so now we have two catalog numbers connected to the same occurrenceID. The problem, though, is that I bet in most cases when Museums have more than one MaterialSample derived from the same Occurrence (e.g., voucher and tissue sample), they end up being represented by two different occurrenceID values.

I think the recent activity surrounding MaterialSample (this Task group and the DwC discussions that promoted it) is that we've reached critical mass where this problem (deviation of 1:1 cardinality between MaterialSample and Occurrence) now needs to be addressed at a community-wide level.

Back to your point, years ago we added materialSampleID to all our specimen records, in addition to the occurrenceID. To the uninitiated, this seems like a redundant identifier (why have two different unique identifiers for the same record?!?!). But the reason we do this is to accommodate cases where there is not a 1:1 correspondence between MaterialSample and Occurrence. For example, when a tissue sample is extracted from a voucher, the tissue sample gets its own catalog number, and its own materialSampleID, but it inherits the same occurrenceID as the voucher.

The logical consequence of this is that the DwC terms catalogNumber and otherCatalogNumbers should be organized in the MaterialSample class instead of the Occurrence class (where they are currently organized). I really think we need to get there, but this has huge implications for DwC data providers and consumers, because for so long the so-called "Darwin Core triplet" (institutionCode+collectionCode+catalogNumber) has a long history of being a "natural key" for Occurrence instances; and indeed, I think a non-trivial number of content providers concatenate these three values to function as the value of occurrenceID.

In other words, getting the community on board with disentangling MaterialSample properties from what has traditionally been represented as Occurrence properties has non-trivial consequences.

RogerBurkhalter commented 2 years ago

@deepreef I know that many of our neontological collections at my museum use the "so-called "Darwin Core triplet" (institutionCode+collectionCode+catalogNumber)" as the OccurrenceID, and it will be very difficult to get them to change to a machine-generated UUID or PID. The one collection I am CM for, Invertebrate Paleontology, has UUID's where recommended, it's not that hard. Getting smaller collections on board, especially those with limited resources of money or people, will be a major task.

deepreef commented 2 years ago

@RogerBurkhalter: yeah, that's exactly what I did. For each specimen table that was the source of records for Occurrence instances (which already had a occurrenceID field with auto-generated UUID), I simply added a second field to the same table for materialSampleID with auto-generated UUID. As long as I know internally that the occurrenceID represents the "specimen at collecting event" (actually dwc:organism at dwc:event, but that's already discussed in issue https://github.com/tdwg/material-sample/issues/2 of this Task Group), and that the materialSampleID represents the physical specimen itself, it's easy to manage the links when cardinality differs from 1:1.

Indeed, I imagine most collections are at the mercy of their respective CMS and how it manages data and translates/exports it to DwC. But if we can achieve some sort of clarity and stability on the definitions of these various DwC classes (especially MaterialSample, Organism and Occurrence) and better understand the relationships among them, perhaps the CMS developers will begin adjusting their underlying data models to accommodate the various IDs properly.

Jegelewicz commented 2 years ago

In the vast majority of cases, the cardinality of Catalog Numbers to (true) occurrenceID values is 1:1 (because most specimens in Museums were only collected once)

I think this may be less true than you think, especially as physical specimens have been subsampled and shared for molecular study. See https://github.com/ArctosDB/arctos/issues/4032#issuecomment-963586066

deepreef commented 2 years ago

I think this may be less true than you think, especially as physical specimens have been subsampled and shared for molecular study. See ArctosDB/arctos#4032 (comment)

Agreed -- "vast" was an overstatement; but I bet if we looked at GBIF or iDigBio data, we'd still get a 1:1 correspondence between Catalog Number and occurrenceID in the majority of cases. But my larger point was that we can't rely on that -- even if exceptions are a minority, we need to accommodate them more robustly than we have been historically using DwC.

dagendresen commented 2 years ago

At least ALL museum collections using the Darwin-Core-Triplet (see also Guralnick et al 2014) approach to build their occurrenceIDs (as is STILL today recommended in the Darwin Core definition for occurrenceID!!!) would by design have a 1:1 cardinality of catalogNumber to occurrenceID!

occurrenceID (...) In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the occurrenceID globally unique. (...) Examples (...) urn:catalog:UWBM:Bird:89776

When generating a sub-sample from a museum specimen (or any MaterialSample), the Darwin-Core-Triplet as occurrenceID would be less of a problem if only the occurrenceID identifier-string was maintained unchanged as the occurrenceID also for the sub-sample (and not generated a-new from a new catalogNumber).

Indeed, I imagine most collections are at the mercy of their respective CMS and how it manages data and translates/exports it to DwC (deepreef)

Because MaterialSample (approx 2013-03-28) and materialSampleID (approx 2013-05-25) are relatively recent additions to Darwin Core, most museum collections would likely not have any materialSampleIDs assigned to their specimens (yet)? I see an important mission of the MaterialSample task group to (finally) build the foundation for museum collections to start implementing MaterialSample and materialSampleID - and to demand such implementations from their collection management systems.

afuchs1 commented 2 years ago

as the data manager of a combined herbarium, living collection and seed bank collection we have many use cases where an occurrence (when and where material was collected) has many catalogue numbers (how it is physically represented in the collection).

deepreef commented 2 years ago

I see an important mission of the MaterialSample task group to (finally) build the foundation for museum collections to start implementing MaterialSample and materialSampleID - and to demand such implementations from their collection management systems.

Indeed! I think that those of us connected with data management systems that do incorporate materialSampleID values should coordinate and compare notes, so we (this Task Group) can develop a series of recommendations for how to disentangle occurrenceID values from materialSampleID values, in future presentations of DwC content.

deepreef commented 2 years ago

as the data manager of a combined herbarium, living collection and seed bank collection we have many use cases where an occurrence (when and where material was collected) has many catalogue numbers (how it is physically represented in the collection).

Do you already assign materialSampleID values? If so (and even if not), how would you recommend collections like yours develop a process for associating catalog numbers with materialSampleID values, then aggregating sets of those values (i.e., MaterialSample instances) linked to a single occurrenceID value?

dagendresen commented 2 years ago

Here is a recent example with material from the same occurrence (collected in June 2021) deposited in the vascular plant herbarium (CMS = MUSIT) and in the DNA tissue bank (CMS = Corema) at the museum in Oslo. When publishing the DNA bank in GBIF a few years ago we quickly became aware of the restrictive requirement for distinct occurrenceID in each dataset - in practice blocking us from publishing derived tissue samples with the "correct" occurrenceID (because multiple tissue samples are often extracted from the same material sample/specimen). The tissue sample specimen is thus (unfortunately) published with the occurrenceID mapped to the assigned materialSampleID and the (correct) occurrenceID is instead published as relatedResourceID. (An organismID was minted as well for linking, but the vascular plant herbarium CMS did not support this term).

Herbarium Oslo https://doi.org/10.15468/wtlymk
occurrenceKey https://www.gbif.org/occurrence/3393367309
catalogueNumber 1628289
occurrenceID urn:catalog:O:V:1628289
DNA tissue bank https://doi.org/10.15468/nzszik
occurrenceKey https://www.gbif.org/occurrence/3357457309
catalogueNumber O-DP-81046/1-T
organismID urn:uuid:52036a5e-1943-5f16-9326-217c3c4a4fa1
occurrenceID urn:uuid:9516f60c-d4c0-4e39-9235-37c89fee38f2
materialSampleID urn:uuid:9516f60c-d4c0-4e39-9235-37c89fee38f2
relatedResourceID urn:catalog:O:V:1628289
Jegelewicz commented 2 years ago

Here is an example of what happens with Arctos data and how we could (now) pass a MaterialSampleID.

https://arctos.database.museum/guid/DMNS:Mamm:12344 is from the same individual/collection event as https://arctos.database.museum/guid/MSB:Mamm:233616

If you look at DMNS:Mamm:12344, this is how it would work:

Term Value Note
catalogNumber DMNS:Mamm:12344 Ideally, we would pass the url http://arctos.database.museum/guid/DMNS:Mamm:12344
occurrenceID http://arctos.database.museum/guid/DMNS:Mamm:12344?seid=877493 this url is built at the time data is published each month and is a concatenation of the catalog record url, "?seid=", and the specimen event id
OrganismID http://arctos.database.museum/guid/DMNS:Mamm:12344 Arctos does have a place to enter this, but if nothing is entered there, the default is the url for the catalog record (assumes there are no other samples of this organism)
Associated Occurrences (same individual as) MSB:Mamm http://arctos.database.museum/guid/MSB:Mamm:233616 Here we place a concatenation of all "relationships". Because this record has the "same individual as" relationship, this is where one would find the "parts from a single organism"
MaterialSampleID https://arctos.database.museum/guid/DMNS:Mamm:12344/PID21958887 currently we do not pass anything here, but we have recently assigned on-the-fly numbers to individual parts in a catalog record which you can see at the bottom of the page. These numbers can be stabilized by the collection and turned into PIDs which means that the part can then never be deleted. We just added this feature recently and I am not aware of anyone making use of it yet.

Each of the exercises make me realize how differently we are all approaching this and how I need to work with the Arctos community to get data in the appropriate places....

afuchs1 commented 2 years ago

as the data manager of a combined herbarium, living collection and seed bank collection we have many use cases where an occurrence (when and where material was collected) has many catalogue numbers (how it is physically represented in the collection).

Do you already assign materialSampleID values? If so (and even if not), how would you recommend collections like yours develop a process for associating catalog numbers with materialSampleID values, then aggregating sets of those values (i.e., MaterialSample instances) linked to a single occurrenceID value?

We don't have a separate materialSampleID's. Everything is treated as catalogued items within a collecting occurrence (currently we allocate an internal sequential number to as there is no real world candidates and deliver to DwC by appending the institutionCode) and each item has a unique catalogNumber by virtue of adding a suffix for the different physical items across all 3 collections. eg. CANB897925.1 herbarium sheet; CANB 897925.9 seed packet; CANB 897925.6 cutting (now dead), a DNA collected at the time of collection is also given a catalogNumber. The institutionCode and accNo are not necessarily unique within an occurrence as we have combined separate institutional collections over time and used different accession numbering schemes, but they all link to the same occurrenceID. Currently we don't handle allocation of an ID to samples taken from items well (ie. DNA taken from a preservedSpecimen or livingSpecimen), but in my mind whether these get a materialSampleID or catalogNo is less important than it having an ID which can be resolved, knowing what the status of that 'thing' is. Does it still exist or was it transitory, what was/is the type of material, and can we create explicit relationships between these 'things' and hold data about them. If this is held in the data then we can deliver it to any schema. eg. image taken of this sheet, DNA sampled from a leaf on this sheet, cutting taken from plant grown in the gardens originally collected from a cutting in the wild. Each of these are essentially 'object' relationship 'object' about which we can hold additional data. (I think I have gone off track)

Jegelewicz commented 2 years ago

@afuchs1 I don't think you went off track at all! I think that is the essence of what MaterialSample should be about - "this"!

deepreef commented 2 years ago

Agreed! I think this gets to the heart of what we're trying to address in this Task Group.

I'm still wrestling with the boundary between Organisms and MaterialSamples. In many cases, the distinction is clear -- but in some cases it gets murky. For example, consider this real-world use-case:

Diver encounters a rare fish on the reef, and gets in-situ video of it. The fish is collected and brought to the surface alive, and transported half-way around the world (still alive). It is then photographed again (alive) in an aquarium. Some years later it dies and becomes a specimen at a Museum, where it is photographed again before preservation. Several tissue samples are taken, and the remaining specimen is preserved in alcohol. Over time, the specimen is moved from one shelf to another, or put on display, or loaned, or whatever.

There's a lot to unpack there, and while it may seem like a bit of an edge case, it's not that sharp of an edge, and whatever we come up with ought to be able to accommodate this kind of use case.

I'm still (mostly) confident that an instance of dwc:Occurrence represents an intersection of an instance of dwc:Event and dwc:Organism, and that any associated dwc:MaterialSample instances (living or dead or extracted tissue) do not participate in any dwc:Occurrence instances directly (in the same way that dwc:Identification instances do not participate directly in dwc:Occurrence instances). Instead, dwc:Occurrence instances associated with dwc:MaterialSample instances are inherited through a dwc:Organism intermediary.

However, I also recognize that dwc:MaterialSample instances participate in what could be framed as dwc:Event instances directly (assuming a dwc:Event instance is an action that happens and a particular place and time). For example, when I photographed the live fish in an aquarium in the example above, is it the Organism instance that was photographed, or the MaterialSample? Same question for photographing the dead specimen at the Museum. And same for when the tissue sample was analyzed for DNA sequencing. In all those cases (live aquarium photo, dead specimen photo, DNA sequencing), information about where, when and by whom fit nicely into a dwc:Event instance -- but at least in some cases, it would be a MaterialSample that participated in the Event, not an Organism. But would we call that intersection of MaterialSample+Event an Occurrence (sensu DwC)? Or is the nature of that intersection somehow different from a "proper" dwc:Occurrence instance? Same applies to a tiger in a zoo, I guess.

Ultimately, we want to be able to attach media items to both Organisms and MaterialSamples, and somehow track the circumstances of the Events where those media captures took place. I'm just not clear in my head whether all, or some of those involve dwc:Occurrence instances.

OK, now I'm the one who is gone way off track! Sorry about that! I know the above is probably way to abstract and conceptual for what we're trying to accomplish with dwc:MaterialSample -- but it seems to me that having this sort of stuff more or less understood and sorted out will only improve whatever standards recommendations emerge from, this Task Group.

dagendresen commented 2 years ago

Following your line of thought - what is the thing/class taking part in an Event ... in and Occurrence.

How do we model environment or ecosystem or nature types or geology? (which are not appropriately modeled as Organism). Would these only be properties of a Location? Or is there room for a new class for these things (in Darwin Core? from before they became MaterialSamples?). In my mind, we sample MaterialSamples (which also can become accessioned collection specimens) from such things. E.g. water samples, minerals, geological samples, etc. for other purposes than recording any living things. In my mind, we thus already have many MaterialSamples (accessioned specimens) at the museum in Oslo that is not derived from any Organism.

Apropos - Is an Occurrence with occurrenceStatus = absent then actually an Occurrence at all?

(However, I am jumping outside the topic of this thread here)

deepreef commented 2 years ago

These are the kinds of questions that keep me up at night contemplating. I guess one fundamental thing we ought to pin down is: is the "Sample" of MaterialSample a noun or a verb? Noun: "Some material thing that represents a sample of some abstract or material thing" Verb: "Some material thing that has been sampled from some abstract or material thing" The distinction is subtle (if it even exists), but I tend to lean toward noun, which doesn't require that there be a "sampling event". Sure, there may be a sampling event, and this may be true for the vast majority of MaterialSample instances, but treating it as a noun means that the sampling event is not necessarily intrinsic to the MaterialSample itself. This is probably way too esoteric (and maybe unnecessary), but I guess it boils down to whether a MaterialSample instances is always, or merely usually, the result of a sampling event.

We deal with non-organism stuff the same way we deal with organism stuff, in that we treat "Organism" as a subclass of "Individual" (other subclasses could be things like "vehicle", "sunset", "habitat", etc.) They're all fundamentally abstract, and many (but not all) of them have material manifestations. So instances of MaterialSample are not limited to being physical representations of living things. A non-fossil rock can be a MaterialSample in my mind (even if outside the scope of DwC).

I'm reluctant to apply any of these things directly to Location, because most of them are bounded by time ... which I think would make them technically Events (at least in my mind).

Apropos - Is an Occurrence with occurrenceStatus = absent then actually an Occurrence at all?

Yeah, that's another one that keeps me up at night.

I worry this may mead down one of those very distracting philosophical paths that would be appropriate in some context, but not this one. On the other hand, I think some of these fundamentals are important to allow us to nail down the scope & definition of dwc:MaterialSample.

dr-shorthair commented 2 years ago

The key property of a Sample - material- or otherwise - is the intention that it be representative of something larger. This is particularly obvious from the verb form 'to sample'. If you don't want to consider the act that created it, or the intention to represent something, then 'sample' is just a fancy name for 'thing'.

dagendresen commented 2 years ago

I worry this may mead down one of those very distracting philosophical paths that would be appropriate in some context, but not this one. On the other hand, I think some of these fundamentals are important to allow us to nail down the scope & definition of dwc:MaterialSample.

What is a MaterialSample?

(PreservedSpecimen + FossilSpecimen + LivingSpecimen + tissue samples & environment samples => MaterialSample)

Could a material non-organism thing in situ that is not yet sampled still be a kind of "MaterialSample". Could such a thing in situ qualify for a dwc:catalogNumber if it is accessioned/catalogued (by a museum)? I guess such a non-organism thing would anyway not be part of an Occurrence? (and never be assigned a dwc:occurrenceID).

In my use case, thinking of a "nature type" (which could also be lifeless) evaluated to be designated for active conservation by national nature protection legislation. (Would we at all want/care about to enable Darwin Core to describe the monitoring and conservation of and ecological research on such things?).

(Sorry for staying outside of the thread main topic)

albenson-usgs commented 2 years ago

From my perspective the only time you have an occurrence is when you have an organism (or some part of an organism that can be identified, e.g. DNA) in its natural environment. Therefore the fish photographed in the aquarium is not an occurrence, nor when it dies and goes to a museum, or when tissue samples are taken. Those are all events, sure, but they are not occurrences.

I don't have trouble with occurrenceStatus = absent is still an occurrence. You went an looked for an organism using methods that would usually find it if it was there and didn't see it. I think you all are getting to philosophical here. Researchers use these 0 in their analyses all the time. It's not like Darwin Core invented this out of thin air. I was just on a call yesterday for seagrass monitoring where they want to make sure to include when a species of seagrass occurs in one plot at a field site but is absent from another plot because it has importance in the analyses they do.

dagendresen commented 2 years ago

an occurrence is when you have an organism (or some part of an organism that can be identified, e.g. DNA) in its natural environment

What is an Occurrence?

If the Occurrence is the intersection between Event and Organism, there would be an Occurrence (that CAN be described and identified by an occurrenceID) each and all times this intersection happens - not limited to the "natural environment" (sensu in situ in "wild" nature)?!

If a LivingSpecimen can be BOTH "MaterialSample" and "Organism" at the same time (??) then places and times where it is CAN be described as "Occurrence"s would be broader than the original "collecting" event when it was sampled from "wild" nature (sensu in situ)?! Thus, even a tiger in a Zoo is a valid Occurrence?

For a cultivated crop resulting from crop breeding and conserved as a LivingSpecimen there is no "wild natural environment at all" -- so would we then agree that the "natural environment" is in the agricultural field?

dagendresen commented 2 years ago

I think that occurrenceID in practice is used to identify the different "Evidence" of (mostly) "Occurrence"s and not as an identifier for the "Occurrence" itself! And that this is causing all kinds of problems!

baskaufs commented 2 years ago

Way behind on this conversation, but I can say with confidence that @tucotuco confirmed years ago that occurrences do not have to be restricted to natural occurrences. It's buried somewhere in the tdwg-content email archives.

tucotuco commented 2 years ago

I can confirm that there was never a restriction on Occurrences being "natural". The purpose of establishmentMeans is to distinguish between cases and now has a lovely recommended standard vocabulary ( https://dwc.tdwg.org/em/).

On Fri, Nov 12, 2021 at 12:28 PM Steve Baskauf @.***> wrote:

Way behind on this conversation, but I can say with confidence that @tucotuco https://github.com/tucotuco confirmed years ago that occurrences do not have to be restricted to natural occurrences. It's buried somewhere in the tdwg-content email archives.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/material-sample/issues/21#issuecomment-967203119, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ727QR4NX3HOZF5UJK7TULUXAFANCNFSM5GDS52SQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

albenson-usgs commented 2 years ago

@tucotuco from Rich's example above (fish in the water -> aquarium -> museum -> tissue sample) can you tell me which of those are occurrences and therefore get an occurrenceID?

tucotuco commented 2 years ago

Occurrence: "An existence of an Organism (sensu http://rs.tdwg.org/dwc/terms/Organism) at a particular place at a particular time."

A record of the time and place the video of the fish (Organism) was taken is a good candidate for a distinct Occurrence. A record of the time and place the fish was put in the aquarium is a good candidate for a distinct Occurrence. A record of the time and place the fish was photographed in the aquarium is a good candidate for a distinct Occurrence. A record of the time and place the fish died in the aquarium is a good candidate for a distinct Occurrence.

A record of the time and place the fish was accessioned in the museum could be an Occurrence, but not one that anyone in our community has expressed publicly as an interesting one from the perspective of science, rather, it is interesting from the perspective of collection management. Similarly with the specimen as it moves around.

Jegelewicz commented 2 years ago

Wouldn't they all be occurrences? However, I don't think that establishmentMeans does what is necessary here. None of the terms in the controlled vocabulary accurately describe any of the occurrences besides the fish in the water. The paleo people have discussed the use of in situ/ex situ as a method for getting to "natural" or not.

deepreef commented 2 years ago

I can confirm that there was never a restriction on Occurrences being "natural".

Yes, definitely, which is why the fish in the aquarium in my example is within scope of dwc:Occurrence (like a tiger in a zoo). That's because (in my mind, at least), the fish swimming in an aquarium is still very much an instance of dwc:Organism, and its presence in space & time within the aquarium can be represented as dwc:Event, so it should be treated as within scope of dwc:Occurrence. But the questions I'm wrestling with are:

1) At what point does it stop being an instance of dwc:Organism (if ever), and therefore no longer eligible to participate in new/future dwc:Events, and by extension, no longer eligible to participate in new/future dwc:Occurrence instances? (This assumes that dwc:Occurrence=[dwc:Organism+dwc:Event] -- which I guess is still open to debate).

2) At what point did it start being an instance of dwc:MaterialSample? My gut tells me this happened when a human took control of its disposition (in this case, when it was captured from the reef).

3) How do we characterize the participation of instances of dwc:MaterialSample directly in instances of dwc:Event (e.g., DNA sequencing of a tissue sample, loaning of a specimen, etc.) in the absence of an instance of dwc:Organism, if we agree that dwc:Occurrence=[dwc:Organism+dwc:Event] and if we agree that, at some point, the dwc:Organism ceased to be.

I don't know how effective "ASCII art" is in a GitHub post, but here is a visual representation of what I'm getting at:

dwc:Event instances through Time --> Sperm meets egg    Captured    Dies    Preserved    Subsampled    Analyzed    Disintegrated <------------- dwc:Organism ---------?->                                     <-?---------------------- dwc:MaterialSample-------------------?->

Where in this timeline can dwc:Occurrence instances exist? How do we represent intersections of dwc:MaterialSample and dwc:Event when we can't comfortably reference an instance of dwc:Organism (right side of the timeline)? Should those dwc:MaterialSample+dwc:Event instances be represented as dwc:Occurrence instances? If not, then what are they? Some other sort of properties/history of the dwc:MaterialSample instances (not necessarily framed as dwc:Event instances)?

My head is seriously spinning now, and we're definitely drifting away from the Issue subject (dwc:catalogNumber)... but I still feel that we need to find consensus on this stuff before we can hope to lock in a definition of MaterialSample that won't need to be fundamentally altered a year or two down the road.

RogerBurkhalter commented 2 years ago

In-situ vs. ex-situ, I think these are both "natural" occurrences. However, the rock the fossil was in is either in-site or not, i.e. "float" material, scree, boulders, or conglomerates: displaced from the original point of deposition. Although the fossil within the rock probably never was alive in that spot. Only in rare circumstances are fossil occurrences in life position, i.e. dead where they lived. The vast majority of the time they were washed in or otherwise transported to the place of deposition. I do make a distinction in MaterialSample if the collection object was in-situ or not, but they all count as occurrences.

deepreef commented 2 years ago

@RogerBurkhalter - this is something else I also wrestle with. In my mind, there are two distinct events associated with the fossil that we want to track and represent through DwC:

1) The event when/where the organism died; and 2) The event where/when the fossil was taken into custody by a human.

I think 1 above is unambiguously an instance of dwc:Occurrence, even if the place and time of its associated dwc:Event needs to be inferred/estimated/guessed with appropriately scaled error/uncertainty.

It's less clear to me that 2 above is/should be represented as an instance of dwc:Occurrence. That ties into the question posed by @dagendresen :

Could a material non-organism thing in situ that is not yet sampled still be a kind of "MaterialSample".

And it also ties into my ASCII-art timeline, modifed as:

dwc:Event instances through Time --> Sperm meets egg    Dies                    Fossilized                    Collected <--- dwc:Organism --->                                        <----- non-organism thing----->                                                                                              <- dwc:MaterialSample->

Where along this timeline should we represent instances of dwc:Occurrence?

RogerBurkhalter commented 2 years ago

First, for the collections I deal with, we can only infer #1 in the broadest way and I do not associate a dwc:Event to when the object/organism died. That only leaves #2, when the fossil was taken into custody by a human (i.e. collected), and to #2 I would attach a dwc:Event. If I implied otherwise in the previous post it was in error. As to @dagendresen 's comment, I would think you would need to sample something to have a MaterialSample, otherwise you have an observation (human or machine), otherwise, how do you know it exists and is in-situ? So, are observations material samples?

Jegelewicz commented 2 years ago

@deepreef I have also been contemplating this timeline because I have been thinking about OrganismID.

In my gut, I feel like "organism" infers "alive" and I think that your ASCII art about the fossil represents this view. There was actually some period of time where the organism was dead, but not yet fossilized, however at the point of death isn't it still a non-organism thing?

Could we define "organism" as a living or once living taxonomically homogeneous entity that may produce any or all of LivingSpecimen, PreservedSpecimen, FossilSpecimen? Then we define MaterialSample as anything derived from (among other possibilites) LivingSpecimen, PreservedSpecimen, FossilSpecimen?

dagendresen commented 2 years ago

@RogerBurkhalter

As to @dagendresen 's comment, I would think you would need to sample something to have a MaterialSample, otherwise you have an observation (human or machine), otherwise, how do you know it exists and is in-situ? So, are observations material samples?

--> you would need to sample something to (...) know it exists and is in-situ ?? 😊

Sorry if I express my thoughts very confused - but it is because I am confused and lost in this 🙃

When you make an observation, there is SOMETHING that is observed. If the thing you record to have observed is a non-Organism thing, is the temporal event that happened then always not an Occurrence? If the Organism becomes a MaterialSample, will recorded observations of it then never be an Occurrence?

What is a LivingSpecimen?

(I was in part thinking of Steve @baskaufs accessioned tree at Vanderbilt).

And I was in particular thinking of _in situ crop wild relative (CWR) populations_ conserved, monitored, and somewhat regularly visited in situ. I tend to think of these things as instances of LivingSpecimen. And if LivingSpecimen is a type of MaterialSample - would these things then be MaterialSamples?? Or maybe they would not be?? Tissue samples (more typical MaterialSample) are sampled during monitoring to check for genetic drift, etc, but what is the CWR population itself? Is it only an instance of Organism? Even if in many ways treated exactly as an accessioned thing? Maybe I am only fooled by the word LivingSpecimen?

And I am also thinking of traditional crops, so-called landraces. When simply farmed, I tend to think of these as plain instances of Organism and the on-farm condition as the natural environment. But when landraces are designated for conservation on-farm, and treated exactly as they would be accessioned, I tend to think of them as LivingSpecimen. So would they then be instances of MaterialSample - or not?

I am completely fine with thinking of CWR populations and on-farm landraces as plain instances of Organism. But I do also think of them as a sort of "specimen" (or at least as an accessioned thing) ... whatever that means ...? My confusion is that I think these (very same things) would have both (a series of) occurrenceIDs and a catalogNumber. 🙃

@deepreef helped me a lot in a previous thread here to comfort me that these things maybe can be both at the same time -- but confusion is creeping back up at me. 🤕

(Maybe there has never really actually been such a real thing as a dwc:LivingSpecimen???)

deepreef commented 2 years ago

@RogerBurkhalter :

First, for the collections I deal with, we can only infer #1 in the broadest way and I do not associate a dwc:Event to when the object/organism died. That only leaves #2, when the fossil was taken into custody by a human (i.e. collected), and to #2 I would attach a dwc:Event. If I implied otherwise in the previous post it was in error.

Understood, and no I don't think you did imply otherwise. I was just framing it the way I always think about it. But "broadest way" still has value for the original occurrence. I mean, we need to have some basis for asserting that Tyrannosaurus rex lived in what is now western North America 66-68mya, on a patch of land that represented what we now refer to as Laramidia. There are no limits on the scope of "where" and "when" of a dwc:Event. So I think if you haven't captured Occurrence records of this sort (my '1'), it could be useful to do so.

I guess as long as a fossil specimen is still considered an instance of dwc:Organism, then there is no problem with treating 2 as a dwc:Occurrence instance as well -- but that calls into question the scope of the definition of Organism. Obviously it would mean that neither death nor disarticulation nor disintegration (mineralization) represents the termination of an instance of dwc:Organism, so maybe my timelines would look more like these:

dwc:Event instances through Time --> Sperm meets egg    Captured    Dies    Preserved    Subsampled    Analyzed    Disintegrated <------------------------------------------ dwc:Organism ---------------------------------------...                                     <-?------------------------ dwc:MaterialSample------------------------...

dwc:Event instances through Time --> Sperm meets egg    Dies                    Fossilized                    Collected <------------------------------------ dwc:Organism ------------------------------------...                                        <----- non-organism thing----->                                                                                              <- dwc:MaterialSample-...

dagendresen commented 2 years ago

Maybe it is easier, after all, if the Organism ceases to be an Organism when it dies???

deepreef commented 2 years ago

Maybe it is easier, after all, if the Organism ceases to be an Organism when it dies???

Yeah, but then how to represent the time & place where a fossil is collected? The definition of dwc:Occurrence explicitly refers to dwc:Organism, not dwc:MaterialSample (let alone notdwc:non-organism thing). I guess if a fossil instantiates a dwc:MaterialSample the moment a human takes control over it, then we'd need to model this as a MaterialSample+Event, which strictly speaking, is not an Occurrence. Fundamentally, I guess it's no different from finding a feather in the woods. But also fundamentally, I think it's no different from sending a specimen out on loan, or rendering a DNA sequence from tissue sample.

Clearly, I won't be getting much sleep tonight...

dagendresen commented 2 years ago

Describing the occurrences of things is a rather poor proxy for describing the things themselves. Up till now, it felt like the only way to describe a museum specimen used to be indirectly by describing where and when it used to occur. (What happens if both Organisms and MaterialSamples can occur? at an Event, instead?)

At the top of this issue thread @Jegelewicz wrote:

One issue that I see here is that for most museums - the "catalog number" describes BOTH the "occurrence" AND the object(s) or organism in the collection related to it. So museums are going to need to up their game in creating unique IDs for occurrence, material sample, and so on....

I think we need a much less Occurrence-centric Darwin Core ;-)

Jegelewicz commented 2 years ago

If I had a blank slate, I would write the definitions as follows:

Organism

Any organic, living system that functions as an individual entity.

LivingSpecimen

An Organism that has been marked for testing, examination, or study.

PreservedSpecimen

A remnant, impression, or trace of an Organism of current geological age that has been protected from decomposition.

FossilSpecimen

A remnant, impression, or trace of an Organism of past geologic ages that has been preserved in the earth's crust.

MaterialSample

A representative part or a single physical item from a larger whole or group especially when presented for inspection or shown as evidence of quality.

Occurrence

Evidence of an Organism at an Event. Evidence currently includes:

Jegelewicz commented 2 years ago

I changed my definition of organism because of this:

Organism was present or absent, or a that no Organism identifiable as a member of a Taxon was present at a particular place at a particular time following some protocol for detection.

I don't think that any of that is clear in the current definition of Occurrence. There is no reference to absence, only presence. It seems to me that a lot of people have ideas about what the Darwin Core terms are meant to convey and the super-vague definitions let everyone decide what they think this all means. If an occurrence is also meant to convey some sort of planned protocol, then most of iNaturalist observations are not occurrences.

I really think we need to be critical of definitions that include the word being defined in the definition.

Current Organism definition is part of what causes us so much grief!

A particular organism or defined group of organisms considered to be taxonomically homogeneous.

That is simply NOT a definition.

albenson-usgs commented 2 years ago

I don't think that any of that is clear in the current definition of Occurrence.

@Jegelewicz I'm not sure why it needs to be. Seems to me it's taken care of by occurrenceStatus which is part of the Occurrence class. I don't think any of the class definitions include all of the pieces of information that are identified in the terms that belong in that class?

Jegelewicz commented 2 years ago

@albenson-usgs agree - it's waaay too early here!

Jegelewicz commented 2 years ago

BUT I do think this is still true

I really think we need to be critical of definitions that include the word being defined in the definition.

Current Organism definition is part of what causes us so much grief!

A particular organism or defined group of organisms considered to be taxonomically homogeneous.

That is simply NOT a definition.

dagendresen commented 2 years ago

@Jegelewicz - If I had a blank slate, I would write the definitions as follows: ... Occurrence = Evidence of an Organism at an Event.

+1

(The current definition is: An existence of an Organism (sensu Organism) at a particular place at a particular time)

Jegelewicz commented 2 years ago

@albenson-usgs you got me thinking. If "Occurence" is a class that is meant to convey information about

An existence of an Organism (sensu http://rs.tdwg.org/dwc/terms/Organism) at a particular place at a particular time.

catalogNumber - seems like this would be better labeled as occurrenceNumber georeferenceStatus - belongs with location? preparations - belongs with PreservedSpecimen/FossilSpecimen/MaterialSample? Can you have a "preparation" of an occurrence? disposition - belongs with PreservedSpecimen/FossilSpecimen/MaterialSample? Can you have a "disposition" of an occurrence? otherCatalogNumbers - seems like this would be better labeled as other OccurrenceNumbers

I think another issue is having terms that don't seem to belong in the occurrence class placed there. What DO we need to describe an Occurrence and what classes are we under-describing? Does "assigning" terms to a class limit their use? In other words - the only way for me to transmit information about the sex or life stage of a PreservedSpecimen would be to record an occurrence of it in a collection?

I agree with @dagendresen

I think we need a much less Occurrence-centric Darwin Core ;-)

tucotuco commented 2 years ago

BUT I do think this is still true

I really think we need to be critical of definitions that include the word being defined in the definition.

This observation has arisen numerous times and disturbs me. I wish this was not a point of contention, as the purpose for it is for convenience for PEOPLE to quickly recognize the term. I see people thinking of it as a dictionary entry where one would be dismayed to see a word defined by itself. That is simply not the case here. Consider, we could just as easily have chosen (and were indeed strongly encouraged to do so) term names with no semantic baggage, such as http://rs.tdwg.org/dwc/terms/Class001. The machines wouldn't care. The people would, because they would lose the convenience of "understanding" the term without having to go look up the definition. This was exactly one of the reasons for the strong encouragement not to use words at term names. If the term name were called Class001, would you have the same problem with the definition being circular? I posit that you would not. So what is the difference? It is the same definition either way. My point is that this on one side hand we have a name and a label for a concept - a shortcut or alias, and on the other side we have a definition (canonically in English) to convey the better understanding of what it means. The definition has to stand on its own, and can not rely on the label for further edification.

Current Organism definition is part of what causes us so much grief!

A particular organism or defined group of organisms considered to be taxonomically homogeneous.

That is simply NOT a definition.

It is if there is an understanding of what "organism" (not dwc:Organism). If there isn't, then that needs to be added to the definition or the usage notes to clarify what meaning of "organism" in English is being used. It's not circular, it is simply incomplete. So what is it about the word "organism" in English that leaves us unsatisfied? For one, the life part. A church (as an organization) is an organism as well, but we're in a community where that usage doesn't cause us much confusion. So where exactly is the problem?

Oxford Languages says:

"an individual animal, plant, or single-celled life form"

Webster's Unabridged Dictionary says: "an individual constituted to carry on the activities of life by means of parts or organs more or less separate in function but mutually dependent : a living being"

If we mean something different by organism in our definition than in these dictionary definitions, then it is our job to make that clear, if not, the definition for the dwc:Organism class seems just fine.

RogerBurkhalter commented 2 years ago

I think many of the terms used in Occurrence are legacy terms and definitions from museum collections. I do not agree with changing catalogNumber to occurrenceNumber, we have occurrenceID to handle that. A catalogNumber infers a physical object that has been cataloged in a museum or repository, where machine and human observations are not routinely cataloged or numbered by a collection (or have not been). Yes, when I run across an observation in a field notebook or measured section log of an occurrence that has no corresponding collected objects, I record that as an observation as a humanObservation, and the CMS gives it an occurrenceID, but I use a UUID that does not resemble my museum catalog number for my institution. Of the hundreds of thousands of images (analog photographs and digital images) we have, not all are of objects collected nor are all of the objects in the images reposited at our institution. These machine observations I have not even begun to work on until such time as we have a DAM and students/volunteers to scan negatives and prints.

+1 @tucotuco

deepreef commented 2 years ago

@dagendresen:

Describing the occurrences of things is a rather poor proxy for describing the things themselves.

The value of dwc:Occurrence is not about serving as a proxy for describing the things themselves. It is about understanding something about biodiversity in space and time. That is the primary value of aggregating data about specimens (in the context of their collecting events) and observations. But just because that's the primary value, doesn't mean it's the only value. This group (dwc:MaterialSample) is focused on is accommodating information exchange needs of physical objects curated in collections, independently of whatever biodiversity research value can be derived from understanding the distribution of biodiversity in space and time.

@Jegelewicz:

[Occurrence =] Evidence of an Organism at an Event.

I'm not comfortable with this collapsing of "Occurrence" into "Evidence of an Occurrence". To me, the Occurrence instance has always been, and should always remain, the abstract fact of a particular organism existing at a particular place and time. There may be multiple pieces of Evidence supporting the truth of an Occurrence (a specimen, a photo, a record in a field notebook, a published reference, etc.) Each one of these individual pieces of Evidence should not be treated as separate instances of Occurrence. So I would re-frame your definition to something like:

Occurrence = Presence of an Organism at an Event, [or absence of an Organism at an Event] that is supported by some form of documented Evidence.

I included the "absence" clause in brackets because I'm not sure there is universal consensus that absences are in scope for dwc:Occurrence (I support it). But the good news is that we've recently affirmed that the scope of an instance of dwc:Organism can extend up to and including every single individual that is identifiable to a particular taxon. Thus, a single instance of dwc:Organism can be a direct proxy for every individual of a dwc:Taxon, and therefore can be used as the dwc:Organism instance participating in an "absence" dwc:Occurrence instance. Indeed, this approach gives us the flexibility (through dwc:Identification) to tie that dwc:Organism to a specific taxon concept (via anchoring to a TNU). But I digress...

The point is, a definition of this sort would accommodate recording both absences of individuals at an Event (e.g., "Wolf # 427 was not with the pack at this location today"), as well as absences of Taxa at an event (e.g., "we saw no individuals of this taxon at this place and time".) The only difference is the scope of the associated Organism instance.

catalogNumber - seems like this would be better labeled as occurrenceNumber

I agree with @RogerBurkhalter on this. We generate values of dwc:catalogNumber as a human-friendly convenience mechanism for humans to refer to a specific instance of a thing that other humans can understand easily. This makes perfect sense in the context of MaterialSample because these are the things we humans curate and manage (by definition in my view). But it doesn't make as much sense to me to assign "occurrenceNumber" values to the more abstract "presence of an Organism at a place and time" instances, because we humans less often need to communicate with other humans about specific Occurrence instances. More often we aggregate Occurrence instances (e.g., same taxon at same location), and only tunnel down to individual instances in cases of doubt or other need for verification. Thus, I think computer-friendly GUIDs (captured with occurrenceID) are better for referencing instances of dwc:Occurrence. In contrast, we humans communicate with other humans about MaterialSample instances all the time (identifications, loans, storage locations, tissue extractions, etc., etc.), so it makes sense to maintain a human-friendly catalogNumber value to them.

The way catalogNumber has and continues to be used in our community, I think it makes MUCH more sense to organize this term with the MaterialSample class, rather than re-define it as something like "occurrenceNumber". In summary, I don't think we need a human-friendly proxy for occurrenceID, but I think there is value in having a human-friendly proxy for materialSampleID; and catalogNumber seems to fit that role perfectly.

dagendresen commented 2 years ago

I think we might have an idealized idea of Occurrence as Organism at Event. (my intention is not to argue against this concept)

And then we have how Occurrence dominantly is used in practice as the Evidence of an Organism at Event. (what I think for the most part is the actual nature of the things that are identified by occurrenceID)

I tend to think that our community might have painted itself into a corner and that maybe accepting that dwc:Occurrence has become predominantly used as the Evidence of might maybe be a possible least bad way out. (... and MAYBE instead consider minting a new_ class "OrganismOccurrence")

In my mind, the semantics of an "occurrenceNumber" is already exactly covered by dwc:recordNumber.

recordNumber = An identifier given to the Occurrence at the time it was recorded. Often serves as a link between field notes and an Occurrence record, such as a specimen collector's number.

I also think that we are in agreement of the value of "Occurrence" - and that we agree (as is the motivation for this task group) that we need ANOTHER concept to describe objects in collections (PreservedSpecimen, MaterialSample, ...). It is the latter need I intended to express by "the occurrences of things is a rather poor proxy for describing the things themselves".