Change term

tdwg / dwc

Darwin Core standard for sharing of information about biological diversity.

https://dwc.tdwg.org

Creative Commons Attribution 4.0 International

206 stars 70 forks source link

Change term - MaterialSample #314

Closed Jegelewicz closed 1 year ago

Jegelewicz commented 3 years ago

Submitter: @jegelewicz
Justification (why is this change necessary?): The definition of MaterialSample is essentially the same as that for PreservedSpecimen. Members of the Arctos Working Group feel that these two terms are currently interchangeable. See https://github.com/ArctosDB/arctos/issues/2432 for further discussion.

From https://dwc.tdwg.org/terms/#materialsample

MaterialSample info

Definition A physical result of a sampling (or subsampling) event. In biological collections, the material sample is typically collected, and either preserved or destructively processed.

Examples A whole organism preserved in a collection. A part of an organism isolated for some purpose. A soil sample. A marine microbial sample.

MaterialSample	info
Definition	A physical result of a sampling (or subsampling) event. In biological collections, the material sample is typically collected, and either preserved or destructively processed.
Examples	A whole organism preserved in a collection. A part of an organism isolated for some purpose. A soil sample. A marine microbial sample.

From https://dwc.tdwg.org/terms/#livingspecimen

PreservedSpecimen info

Definition A specimen that has been preserved.

Comments

Examples A plant on an herbarium sheet. A cataloged lot of fish in a jar.

PreservedSpecimen	info
Definition	A specimen that has been preserved.
Comments
Examples	A plant on an herbarium sheet. A cataloged lot of fish in a jar.

Given the above, we propose that MaterialSample should be more specific to something less than what might be considered a "voucher" in order to delineate it from PreservedSpecimen.

Proponents (who needs this change): Arctos Working Group

Proposed new attributes of the term:

Term name (in lowerCamelCase): MaterialSample (no change)
Organized in Class (e.g. Location, Taxon):
Definition of the term: A physical result of a subsampling event. In biological collections, the material sample is typically collected as a subsample from a preserved or living organism, and either preserved or destructively processed. In geological and environmental collections the material sample is typically collected as a subsample of a larger geologic or environmental construct.
Usage comments (recommendations regarding content, etc.):
Examples: A part of an organism isolated for some purpose. A tissue sample. A soil sample. A marine microbial sample.
Refines (identifier of the broader term this term refines, if applicable): None
Replaces (identifier of the existing term that would be deprecated and replaced by this term, if applicable): http://rs.tdwg.org/dwc/terms/version/MaterialSample-2018-09-06 (added by @tucotuco)
ABCD 2.06 (XPATH of the equivalent term in ABCD or EFG, if applicable): DataSets/DataSet/Units/Unit (added by @tucotuco)

Note: all of the above is my interpretation of the Arctos Working Group conversation.

deepreef commented 3 years ago

I do not agree with this proposal. I think a better approach is to embrace MaterialSample as currently defined, and instead alter the various "BasisOfRecord" terms that are represented as "pseudo-classes" (my term) in DwC.

I do agree that the definition of MaterialSample does need refinement and clarification (especially with respect to the boundary between an instance of Organism and and instance of MaterialSample [I have thoughts on this], but I do not agree that the scope of MaterialSample be fundamentally altered to imply "sub"sampled material.

I feel strongly that the DwC class MaterialSample should retain its original definition in the broader sense, to represent the entire spectrum of what we used to refer to as "CollectionObjects" -- that is, inclusive of single whole-organism specimens, derivatives of such (as proposed above), and also aggregates of such (lots, soil/water samples with multiple taxa, rocks with multiple embedded fossils and/or freshly collected and encrusted with [recently] living organisms, etc.)

In summary, I agree with the need and justification for a change in DwC to reconcile these terms, but I think the main change should be in the terms PreservedSpecimen, FossilSpecimen, LivingSpecimen. Instead of representing these as distinct classes that are mutually exclusive with respect to each other and to MaterialSample, I think it makes much more sense to regard these three terms as, in effect, subclasses of MaterialSample (mutually-exclusive alternatives of each other, but all within the scope of MaterialSample), and likewise either deprecate the terms HumanObservation and MachineObservation (as discussed at the recent TDWG, the distinction between them is fuzzy at best), or treat them as subclasses of a general Observation class, which itself is mutually exclusive with respect to MaterialSample.

dshorthouse commented 3 years ago

Agree with @deepreef here. The DINA consortium is in the midst of modelling this and have come to the realization that a catalogued object (= Physical Specimen, Physical Entity) is an instance of a MaterialSample. It may be derived from other instances of MaterialSample (destructively or non-destructively) and may equally produce one or more instances of yet other MaterialSamples as strictly expressed here by @Jegelewicz. And to take this further, Occurrence terms like catalogNumber, otherCatalogNumbers, associatedSequences, and preparations would be better placed under MaterialSample because these have nothing to do with an Occurrence.

deepreef commented 3 years ago

@dshorthouse : 100% agreement on all of this. We likewise came to the exact same conclusions (including the move of catalogNumber, otherCatalogNumbers, associatedSequences and preparations from Occurrence to MaterialSample).

Obviously it must be true what they say: Great minds think alike. (Or, perhaps, feeble minds think alike? Probably both, and the challenge is figuring out which this represents...)

Incidentally, I would add to the list disposition - as this seems to be more of a property of the physical specimen than the Occurrence instance at which it was extracted from nature.

Another conundrum is how to apply individualCount. As defined, this is clearly a property of Occurrence, but we need a similar property to track number of "units" (for lack of a better term) comprising an instance of MaterialSample as well. The word "individual" harkens back to the old (now deprecated) individualID, which has been replaced by organismID in the Organism class. But we also have organismQuantity, which seems more specific to "The number of individuals represented present at the time of the Occurrence" ("A number or enumeration value for the quantity of organisms."). So... not sure if we can re-purpose individualCount to be something that applies to instances of MaterialSample in this context; or if we need some other way of tracking the "units" of particular instance of MaterialSample. Perhaps this is best handled via MeasurementOrFact instances? There are two separate issues (here and here) about this going on right now...

So many questions....

dshorthouse commented 3 years ago

And yet another conundrum. What is going on with preparations, especially if it were moved into MaterialSample where it probably belongs? Is it a noun, a verb or a gerund? The examples provided could be interpreted as instances of MaterialSample, aggregations of instances, methods employed to produce them, or descriptions of their preservation media or vessel(s). However, the expectation is a singular materialSampleID, which means we should be obliged to make sense of all the relationships among instances of MaterialSample that share a common provenance by using ResourceRelationship. Some of those relationships will be between MaterialSamples and some of those relationships will between be MaterialSamples and Occurrences, the latter comparable in spirit to what we do with basionyms and their relationship(s) to downstream taxon concepts. And, it's from that particular link that we uncover the collecting event details.

Jegelewicz commented 3 years ago

@dustymc

dshorthouse commented 3 years ago

If we extend these realizations to their logical conclusion, we have a problem in how we expect our specimen-based data to be interpreted in the context of an Occurrence. Most (all?) of our aggregators make heavy use of occurrenceID as the canonical anchor for our physical objects that we in the museums community implicitly model as MaterialSample. For us, we're forced to equate occurrenceID and materialSampleID when we share data whereas they are not the same thing. An Occurrence speaks more to an ephemeral, epistemological origin (i.e. basisOfRecord) from which may be derived evidence of past existence manifested as MaterialSample.

The exchange networks of duplicates distributed among herbaria is a concrete example of this. One plant clipped into five pieces, prepared and mounted, each sheet then shipped 'round the world to 5 herbaria. In reality, that's one Occurrence and five MaterialSamples although the participant herbaria have no functional mechanism to produce & share precisely that same progenitor occurrenceID. What they have are their own catalogNumber(s) and vague signals like recordNumber that there was once a unitary Occurrence: one organism at a particular place at a particular time. At present, each herbarium independently creates and attaches a transcribed collecting event (globally plural, locally unique) then shares their data anchored to occurrenceID (globally plural, locally unique) and we solve the problem through yet more abstraction by deploying AI and crafting some clusters with fuzzy edges. But... we're still left with globally plural and locally unique occurrenceIDs for a unitary Occurrence in this example.

Jegelewicz commented 3 years ago

the participant herbaria have no functional mechanism to produce & share precisely that same progenitor occurrenceID. What they have are their own catalogNumber(s) and vague signals like recordNumber that there was once an Occurrence.

This is a long-standing problem and not just for herbaria. Mammal occurrences end up at different institutions or collections when skins, skeletons and genetic material get separated over the years.

Jegelewicz commented 3 years ago

See https://github.com/ArctosDB/arctos/issues/1966 for another side of the story

dustymc commented 3 years ago

I believe Arctos has all of the "pigeonholing problems" mentioned in this thread.

https://arctos.database.museum/guid/UAM:ES:4588 seems to meet some definitions of "FossilSpecimen" and PreservedSpecimen, and is also cataloged as https://arctos.database.museum/guid/UAM:Mamm:53942.

Many things in herbaria are "LivingSpecimen" pending a little water and sunlight.

catalogNumber and otherCatalogNumbers seem closer to Occurrence than MaterialSample to me, but we could easily map through one more denormalization. (We do have "MaterialSample otherCatalogNumbers" but I don't think they're exposed via DWC.)

https://arctos.database.museum/guid/MVZ:Egg:10460 is more or less another example of "rocks with multiple embedded fossils."

Observation class, which itself is mutually exclusive with respect to MaterialSample.

We have "there was never a physical part" and "someone says there were physical parts, but they are permanently unavailable for various reasons." I do not see much functional distinction.

So many questions....

Yep!

dshorthouse commented 3 years ago

This is a long-standing problem and not just for herbaria. Mammal occurrences end up at different institutions or collections when skins, skeletons and genetic material get separated over the years.

Nit-picky, but by "occurrence" here, you mean MaterialSample or specimen. Occurrences don't go anywhere. There may have been a single Occurrence - a single organism collected in a single event. But, the parts - the MaterialSamples - are now scattered among many homes. They all have a relationship to that original Occurrence (perhaps through a parent MaterialSample that no longer exists eg carved up in the basement of the Smithsonian from a previously documented MaterialSample) but there are barriers to knowing it, agreeing on it, using it, and then sharing it.

Jegelewicz commented 3 years ago

Nit-picky, but correct.

albenson-usgs commented 3 years ago

So eDNA are MaterialSamples and not Occurrences? Is it both? When is something not an occurrence? Because eDNA have associatedSequences and isn't all of this wrapped up in the occurrence core anyway? So what does it really mean practically for a term to be "placed under MaterialSample"?

Personally I think a larger community discussion needs to happen around basisOfRecord and what its intended to convey. I field a lot of questions in the OBIS and GBIF US communities about this term because it's required and has a controlled vocabulary so data providers and managers have to apply it and it isn't really clear how a downstream user will interpret it.

For the Machine Observations TDWG group, especially for biologging data we are using basisOfRecord to distinguish between observations of an animal where the animal is in hand and having a tag placed on it (HumanObservation) versus the subsequent observations of that animal by a machine (MachineObservation).

Jegelewicz commented 3 years ago

Another issue we have grappled with - https://github.com/ArctosDB/arctos/issues/2075

or not finished grappling with....

dshorthouse commented 3 years ago

@albenson-usgs If we're strict about the definition of an Occurrence then yes, eDNA is an agglomerative MaterialSample. The event portion of the Occurrences (plural) to which that initially single sample is linked is immediately knowable but the organisms (plural) that were bulk sampled may not be.

As for the practicality of where terms are placed in the DwC classes, it has to do with the operational identifiers we attach to these items and what is their cardinality within our collection management systems. If catalogNumber is a property of an Occurrence then that assumes a 1:1 relationship between it and an occurrenceID - they are operationally the same. However, if several different specimens (or their derivatives) each with a different catalogNumber are derived from a single Occurrence with its single event then we may have a problem because under some conditions, we may need to break the cardinality. In other words, GBIF wants a unique occurrenceID but I've got 10 catalogued items that were derived from a single Occurrence so I cannot make them unique and still adhere to the definition of an Occurrence unless I only publish one of them. If I buck the definition and give all of them then I have to make artificial occurrenceIDs, which may mean loss of functional collaboration across collections or across institutions if there was intent to share & reuse those occurrenceIDs. And, GBIF's value is diminished. As many of you have noticed, GBIF now has a clustering algorithm at play for occurrence records. Is it not the intent here to collapse all those disparate, artificially unique occurrenceIDs into canonical Occurrences? If it isn't, then what's the point? Why force us to make these occurrenceIDs unique? Some of us have already done that clustering!

campmlc commented 3 years ago

Agree with @dshorthouse. This is highly relevant, as my institution is in the process of setting up an environmental sample/eDNA repository in Arctos, similar to an existing repository at the University of Alaska Museum of the North (https://arctos.database.museum/SpecimenSearch.cfm?guid_prefix=UAM%3AEnv). We are considering including all derived taxonomic IDs and genetic sequences under a single catalog number, as having all been derived from the same occurrence (water sample, soil sample). Alternately, we could catalog each unique taxonomic OTU separately, and link it back to the originally source catalog item via url relationships. The latter is entirely feasible but much more complex, especially if there are hundreds of OTUs that result from a single eDNA sample. What we really need is a way to designate the original source sample, e.g. the water or soil, with a unique source identifier similar to an dwc:organism ID.
Also, our collections have many different examples of catalog items that represent multiple occurrences. These catalog items usually include multiple material samples, e.g. multiple tubes of blood and serum collected from the same animal at different occurrence events . These situations are not hypothetical.

deepreef commented 3 years ago

Hokay... where to begin? (Note to @timrobertson100: Now is the time to go get that cup of tea...)

So, I first climbed into this rabbit hole several years ago, when I started minting materialSampleID identifiers for our specimens. Initially, at least, these had a 1:1 correspondence with occurrenceID values, as presented through DwC. At the time, we had no resources to conduct a major overhaul of our (homegrown) specimen data management systems, but it did trigger a conceptual odyssey that I've been wandering through ever since.

DarwinCore began as a way for the Museum community to share data about preserved specimens (fun fact: the term is credited to Allen Allison, who apparently blurted it out by mistake when he meant to say "Dublin Core" at a ZBIG meeting - or so he tells me). Thus, the original implied basisOfRecord is what we now refer to as PreservedSpecimen. Soon thereafter, it was assumed that the most valuable data extraction from our specimens was in terms of representing points on a map (i.e., distributions of taxa across geography). Non-vouchered observations also represent points on a map, so the implied basisOfRecord was expanded to accommodate what we now refer to as HumanObservation (and in a few cases at the time, what we now refer to as MachineObservation. Accordingly, the core class/term in DwC was changed to Occurrence, as a more general way of representing points on a map.

Somewhere along the way, what we used to think of as "specimens" now became "occurrences", as if they were congruent concepts. But of course, specimens are physical entities with all sorts of properties important to the people who care for them (such as preparations, disposition, etc.), whereas (as @dshorthouse already noted) occurrences are ephemeral things, capturing the abstract idea of an Organism being present in the context of an Event. When MatieralSample was first proposed, it was not (as I recall) an effort to reconcile this logical incongruity. Rather, it was proposed initially to accommodate multi-taxon "gatherings" (e.g., soil, water), which at the time were the basis for the growing notion of eDNA. After some hashing and thrashing on the email discussion forums, the Class was born and now bears the definition "A physical result of a sampling (or subsampling) event. In biological collections, the material sample is typically collected, and either preserved or destructively processed.", and the examples are: "A whole organism preserved in a collection. A part of an organism isolated for some purpose. A soil sample. A marine microbial sample.". In that context, it's kinda hard not to equate MaterialSample with "Specimen".

(I trust @tucotuco or @stanblum or someone else active in early DwC activities will correct any errors in this historical synopsis...)

I've continued to stare at my ceiling late at night (more often than I should probably admit) pondering the essence and meaning of MaterialSample in the context of other DwC classes, but it's gotten a bit more "real" for me recently. We suddenly have a lot more resources to support the digitization of our collections (and, of particular interest for me, integrate collections data and research data more effectively), and so what had been an entirely intellectual exercise to occupy time late at night, in the shower, stuck in traffic, etc., has now become a very specific practical issue for me. Over the next 4 months, I will be updating the core data model behind our collections data, and one of the specific issues that our CMs need to "fix" is the way we track physical objects in our collections -- i.e., as instances of MaterialSample. Indeed, I recently reached out to @tucotuco

A lot of the discussion above focuses on the boundary between Occurrence and MaterialSample. While I agree that is relevant to the extent that many content providers present "specimen" data as instances of Occurrence, an some have therefore (mistakenly, in my view) equated the two concepts, it's also the easy one to deal with. Instances of MaterialSample very clearly represent physical things preserved in collections, whereas instances of Occurrence represent abstract facts concerning the presence of an instance of Organism at an instance of Event. You don't have to go too deep into the conceptual weeds to grasp the fundamental difference between these two concepts.

Much more challenging (for me, at least), is defining the boundary between MaterialSample and Organism. The way I conceptualize an instance of Organism (which intersect with instances of Event via instances of Occurrence, and with instances of Taxon via Identification), is as a conceptual entity (with physical manifestation) that essentially begins when a sperm meets an egg (or when a single-cell organism divides, or whatever mechanism of reproduction is relevant), passes through all manner of metamorphoses over space and time, and then "ends" at some point. One of the key questions is: what marks the end of the existence of an Organism? The two most obvious candidate answers are: death, and disintegration.

This distinction (death vs. disintegration) comes into play when trying to understand the boundary between an instance of Organism, and an instance of MaterialSample. And this is where my intellectual meanderings keep bumping into a wall. In fact, I recently exchanged a series of emails with both @tucotuco and @baskaufs , primarily to aske the question (among others): Is the TDWG community ready to wrestle with this question? and On what forum should that wrestling take place? Both questions seemed to be answered in the preceding posts on this issue (i.e., "Yes", and "Here").

This post is already too long (even by my standards), so rather than regurgitate all my thinking on this, I'll close by providing a use case, and some follow-up questions.

Use case: A bird is flying across a field, and while traversing a road, gets hit by a car. The driver pulls over, recognizes the bird as something interesting, and contacts the local Museum. The dead bird is then brought to the Museum and given to the VZ CM, who photographs it, assigns a catalog number to it, writes out a label of the pertinent details, and sticks it in a freezer. Some time later, the bird is removed from the freezer, thawed, and prepared for long-term preservation. In keeping with standard protocol, the skin is removed and preserved following one set of protocols, some tissue samples are taken and preserved following another set of protocols, and the remaining tissues are separated from the skeleton and the bones preserved following yet another set of protocols. By traditional practice at the Museum, the same Catalog Number issued to the whole bird is applied to the three separate sets of objects (Skin, Tissue, Skeleton), and one "Specimen" record is created in the database to record all the pertinent information.

I think most would agree that the living bird flying across the field is an instance of an Organism, and that its unpleasant encounter with the car as if flew across the road constitutes an Event, and together this Organism+Event intersection represents an instance of Occurrence. I suspect that most people would also agree that the three preparations derived from that Organism instance represent three instances of MaterialSample.

That's the easy part. But here are the questions to consider: 1) When did the first MaterialSample instance come into being? The moment the bird encountered the car and it died? The moment the whole bird arrived at the Museum? When it was assigned a catalog number? When it was placed in the freezer (i.e., "preserved")? When the three preparations were created? 2) Related to this, was the whole bird in the freezer an instance of MaterialSample, serving as a "parent" of the three derived MaterialSample instances (Skin, Tissue, Skeleton)? (perhaps suggesting the need for a new term parentMaterialSampleID?) 3) Did the instance of Organism cease to exist when the bird's heart stopped beating? Does it continue to exist as a physical entity after the three preparations are created (and after the remaining tissue material disintegrates)? Does it continue to exist as a conceptual entity until all of the physical matter that comprised it fully decomposes? 4) Related to the above: what is the semantic relationship between an instance of MaterialSample and an instance of Organism? Something like "isDerivedFrom"?

I have my own thoughts on answers to these (and other) questions, but obviously this post is already WAY too long!

Note: several more posts came in as I was writing this, and I continue to agree 100% with the assertions of @dshorthouse.

campmlc commented 3 years ago

@deepreef Regardless of the persistence of the Organism, the "identifier" associated with this organism absolutely has to persist as a linking parent identifier with all subsequent derived parts and preservations, material sample or otherwise, including and especially, parasites and tissues and sequences and media that are deposited other collections and institutions and repositories, to track these back to the source organism and occurrence. This is also true for source/parent material such as soil/water etc for eDNA, which technically is not an "organism" but which also has the same need to track parent/child relationships from a source collection object and occurrence.

deepreef commented 3 years ago

@campmlc :

Regardless of the persistence of the Organism, the "identifier" associated with this organism absolutely has to persist as a linking parent identifier with all subsequent derived parts and preservations, material sample or otherwise, including and especially, parasites and tissues and sequences and media that are deposited other collections and institutions and repositories, to track these back to the source organism and occurrence.

I ABSOLUTELY agree! I was focused more on the conceptual entity of the Organism instance. We need to understand what the "thing" is before we can correctly represent the semantic/cardinality relationships between a digital record (and identifier) representing an Organism, and the other digital identifiers we mint for other classes of "things".

albenson-usgs commented 3 years ago

We are considering including all derived taxonomic IDs and genetic sequences under a single catalog number, as having all been derived from the same occurrence (water sample, soil sample).

I'm not following this. An occurrence is the observation of a taxon at a place and time. What you are talking about here (to me) is an event. The occurrences are the OTUs or taxa that you detected in the event. For me I can't understand how this is one occurrence. This is an event with many occurrences.

What we really need is a way to designate the original source sample, e.g. the water or soil, with a unique source identifier similar to an dwc:organism ID.

I don't understand why you wouldn't use an eventID for this. The event being a sample of water collected at a place and time.

albenson-usgs commented 3 years ago

If we're strict about the definition of an Occurrence then yes, eDNA is an agglomerative MaterialSample

Ok but the associatedSequences is basically the identification of the occurrences. Using @deepreef's logic above it's the intersection of Taxon via Identification that tells you there was an Occurrence which is part of an Event.

dshorthouse commented 3 years ago

This is an event with many occurrences.

Aha! What I think you mean here is an event with many "things". You can't have an occurrence without an event - they are inextricably linked. It is equally incongruous to imagine an occurrence with many events unless we invoke quantum entanglement. The nut @deepreef is getting us to crack is what are these "things"? We could call them MaterialSamples and there appears to be reason for doing so especially when split & scatter protocols are employed.

albenson-usgs commented 3 years ago

What I think you mean here is an event with many "things".

But the things are not MaterialSamples because the material sample is also the event (a sample of water, a sample of soil). The many "things" are many different sequences that tell us multiple taxa were present at that place and time (or nearby at least).

Jegelewicz commented 3 years ago

An occurrence is the observation of a taxon at a place and time.

Yep - https://dwc.tdwg.org/terms/#occurrence

This is an event with many occurrences.

I agree - the trouble is, we don't do a good job of this. Events don't have identifiers that are shared by everyone and it is VERY easy to end up with multiple interpretations of a single event.

deepreef commented 3 years ago

@albenson-usgs :

Ok but the associatedSequences is basically the identification of the occurrences. Using @deepreef's logic above it's the intersection of Taxon via Identification that tells you there was an Occurrence which is part of an Event.

Yes -- this is another one of the conundrums. Per DwC definition of associatedSequences:

A list (concatenated and separated) of identifiers (publication, global unique identifier, URI) of genetic sequence information associated with the Occurrence.

This is another example of a term currently nested within the Occurrence class, that doesn't belong there. The question is: where does it belong? An argument can be made that the sequences are really associated with the Organism that held the genome from which the sequences were derived. Another argument can be made that the sequences are associated with the MaterialSample, extracted from the Organism, from which the actual sequence was created. But I don't think a case can be made that associatedSequences are properties of an Occurrence. Unlike properties like sex, lifestage and others, which change over the course of the ever-changing essence of an Organism over the course of its existence (and therefore need to be anchored to a moment in time, or captured in the form of an Event), the DNA sequences derived from an Organism are the same across its lifetime.

But that does not address your point, which brings in Taxon and Identification. The latter is the intersection between the former and an instance of Organism. As such, a DNA sequence can serve as "Evdience" (not yet a DwC class -- but perhaps there is a need for it) for a taxonomic Identification, but it is not, strictly speaking, the Taxon itself (nor the Identification itself). Going a step further, I wouldn't call the sequences themselves instances of MaterialSample; I think their more analogous to an Image or other form of multimedia, essentially serving as some sort of "representation" of the Organism, derived directly from a MaterialSample (e.g., tissue sample, or water/soil sample).

But the things are not MaterialSamples because the material sample is also the event (a sample of water, a sample of soil). The many "things" are many different sequences that tell us multiple taxa were present at that place and time (or nearby at least).

I don't view samples of water or soil as "Events", any more than I view specimens as "Events". This diagram of Darwin-SW is very helpful, I think, in showing the semantic relationships among many of the core DwC classes. Unfortunately, it doesn't include a node for MaterialSample (the purpose of my previous epically-long post was an attempt to start figuring out exactly where MaterialSample would fit in this graph -- my current thinking is somehow embedded within "Token", aka "Evidence").

In summary, Events are independent of any Organism (or derivatives of organisms). They are essentially a moment in space-time (intersection of Location with a timestamp, plus some other properties). An Occurrence is the intersection of an Event and an Organism. The intersection of an Organism and a Taxon is an Identification. By my thinking, all the other stuff we traffic in (PreservedSpecimen, FossilSpecimen, LivingSpecimen, HumanObservation, MachineObservation, MaterialCitation, etc.) all represent forms of "Evidence" that support either the truth of an Occurrence instance, or the veracity of an Identification instance; but also are intrinsic things that exist independently of these evidentiary roles.

I would definitely conceptualize PreservedSpecimen and FossilSpecimen as examples of MaterialSample; but it's less clear to me whether LivingSpecimen is best framed as an instance of MaterialSample, or Organsim, or both.

Food for thought: take my use-case of bird, and imagine that prior to flying across the field and being hit by a car, it lived in a Zoo. Was it a MaterialSample when it was in the Zoo, before it escaped, flew across the field, then got hit by the car? If so, was it the same instance of MaterialSample before it ended up in the Museum freezer? And does it matter whether it was conceived and born in the Zoo? What if it was collected in the wild?

What I'm trying to get at is the "essence" of an instance of MaterialSample -- ultimately to define it, but even before that, I'd like to know it when I see it (with apologies to Justice Potter Stewart).

dshorthouse commented 3 years ago

The many "things" are many different sequences that tell us multiple taxa were present at that place and time (or nearby at least).

This is an interesting one & permit my adventurous thought experiment. What if your eDNA sample came from a river? And, after the data are worked-up, you get "cougar" as a hit among all the other microorganisms. What the heck? Turns out, through radio collar data, you discover that there's another Occurrence record captured that clearly shows a cougar on the river bank some time prior to you scooping your water sample. She cut her gums on a fish she was eating. You could argue that, through calculating the speed of water & rolling back the clock, the Occurrence records represented by your eDNA and that of the radio collar data are precisely the same. They are merely lines of evidence, derived from precisely the same event and precisely the same animal. It's just that the motions of the water added a bit of noise to your appreciation of time. Now what? Surely we need something to differentiate what we have. One record was derived from a radio collar & one was derived from eDNA but, crucially, we do REALLY want to make the joins between these things because there's a story. Is there one Occurrence here or two?

Jegelewicz commented 3 years ago

In summary, Events are independent of any Organism (or derivatives of organisms). They are essentially a moment in space-time (intersection of Location with a timestamp, plus some other properties). An Occurrence is the intersection of an Event and an Organism. The intersection of an Organism and a Taxon is an Identification. By my thinking, all the other stuff we traffic in (PreservedSpecimen, FossilSpecimen, LivingSpecimen, HumanObservation, MachineObservation, MaterialCitation, etc.) all represent forms of "Evidence" that support either the truth of an Occurrence instance, or the veracity of an Identification instance; but also are intrinsic things that exist independently of these evidentiary roles.

This makes all kinds of sense to me.

Jegelewicz commented 3 years ago

Is there one Occurrence here or two?

An Occurrence is the intersection of an Event and an Organism.

It seems like there is ONE. The machine observation and material sample are evidence for it.

Although, technically, unless the bleeding coincides EXACTLY with the radio collar ping, maybe not?

dshorthouse commented 3 years ago

Although, technically, unless the bleeding coincides EXACTLY with the radio collar ping, maybe not?

Way to rain on my parade, @Jegelewicz.

albenson-usgs commented 3 years ago

Is there one Occurrence here or two?

Ok yes I see this as one occurrence. So you're saying that the associatedSequences for this Organism being in the MaterialSample Class will help us make the link between these two occurrences which are really one made by two different sampling methods?

For reference, in the OBIS world there is a real world problem like this coming from EurOBIS where you have an ARMS sampling as well as eDNA sampling happening at the same location and time and therefore you may have evidence of the same occurrence coming from different sampling methods.

debpaul commented 3 years ago

On 2021-04-22 4:54 PM, David Shorthouse wrote:

One record was derived from a radio collar & one was derived from eDNA but, crucially, we do REALLY want to make the joins between these things because there's a story. Is there one |Occurrence| here or two?

At first, I would think ONE, but aren't you giving us a story that may have a different explanation? What if there's another uncollared cougar? How can we be certain (unless the collared cougar has a known DNA sample to match to the eDNA) that it was the same cougar?

You could say the evidence suggests they are one and the same cougar, given two pieces of information. But you could also wonder if there's another cougar?

campmlc commented 3 years ago

Is there one Occurrence here or two? My two cents are there could be just one, if we can confirm the exact same radio-collared cougar (e.g. she was darted and blood sample taken which matches the DNA of the water) but they could be interpreted as two depending on the difference between the timestamp of the machine and the time recorded for the DNA collecting event. Since these are likely going to end up in different physical and data repositories, they will undoubtedly be considered as two, and ideally we'd have some way of linking them. I am going to ask about the concept of Material Sample being equivalent to Preserved Specimen. While in many cases this can be so, it depends on what we consider PreservedSpecimen to represent. In the museum world, this can be a bird skin in a drawer (a voucher specimen), perhaps in combination with associated skeletal material or even tissues etc. Or it could represent just tissues, or just a skeleton, or a single herbarium sheet, or a lot of 100 fish? This term seems to be used in all these contexts? To my understanding, MaterialSample should be equivalent to the parts of an organism, whatever has actually been preserved. So from the bird/car example, the Organism = bird collected at Event = died on highway at x place y time creates an Occurrence of bird + place+time (?), and the resulting skin and skeleton and tissues are all MaterialSamples related back to that organism. However, perhaps only the tissues were saved, because the bird was mostly destroyed. Then the PreservedSpecimen would be tissues only and the MaterialSamples might be one or more vials of different tissue types. In my understanding, MaterialSample is something that can be put into a discrete container and barcoded. It is something that can be loaned or subsampled for loan. DNA sequences are derived from them. There could be one or many of these associated with an organism in a given museum collection. They would typically, but not always, share a museum catalog number, because sometimes MaterialSamples from the same Organism and the same Event = the same Occurrence can be scattered across different collections in the same institution (e.g. tissue repositories vs bird collections), or between different institutions, and end up with different museum catalog numbers. Museums currently don't do a good job of designating which part of the Event ->Organism ->Occurrence->MaterialSample tree museum catalog numbers represent. They could be any part of this tree. And then, we can have multiple Organisms associated with the same Event = multiple taxa from a single eDNA sample, or multiple fossils preserved in the same rock, or parasite collected with their hosts. By my understanding of the previous discussion, these would be different occurrences+ associatedoccurrences? But depending on the museum collection or database, they may be cataloged separately or together. No wonder we are having trouble understanding how we each interpret these terms!

dshorthouse commented 3 years ago

You could say the evidence suggests they are one and the same cougar, given two pieces of information. But you could also wonder if there's another cougar?

Et tu, Brute? Just to be provocative then, the Organism is a specific cougar litter.

dshorthouse commented 3 years ago

Ok yes I see this as one occurrence. So you're saying that the associatedSequences for this Organism being in the MaterialSample Class will help us make the link between these two occurrences which are really one made by two different sampling methods?

I would hope so! Why else is GBIF investing in algorithms to cluster occurrences? Sure, it's to identify incipient threads between records, but why not also use it to encourage the use of shared identifiers when it's evident that occurrences are one & the same, whose MaterialSamples and observations are dispersed across projects, collections and institutions? That said, we're stuck with a nice tool but no means to implement it because occurrenceID is meant to be both locally & globally unique.

deepreef commented 3 years ago

Is there one Occurrence here or two?

OK, here's my take: The moment where the cougar was eating the fish and cut its gum, documented from the collar through MachineObservation evidence, was one Occurrence (two if you also count the Occurrence of the fish at the same event, but I know that's not what you meant). In this context, the cougar is an instance of Organism, and that organism is essentially a sack (skin) containing trillions of cells (the majority of which are actually non-mammalian; but that's another distraction we don't need here).

So now we have a subset of those trillions of cells escaping the sack, and entering the environment -- where they are picked up in a water sample. If you include the liberated blood cells as continuing to represent the same Organism (cougar), then you have several options: 1) Treat the event of it eating the fish and the event of the water collection as the same instance of Event, with sufficient scope of both place and time (e.g., sufficiently large value of coordinateUncertaintyInMetersProperty), and the cougar intersected with that Event once, yielding one Occurrence. 2) Treat it as two separate Event instances, with more granular place/time properties, and acknowledge that the "cougar" was in both places at both times (and hence, two separate Occurrence instances). 3) Probably other options that I'm too lazy to think through and type out.

This is how I would manage it:

n-number of Occurrence instances for the cougar Organism, documented as individual time-stamped points on a map based on the MachineObservation evidence from the collar. "n" would be defined by the granularity of the recorded points where/when the collar (as proxy for the cougar) was. Presumably one of those points was at the riverbank, while it ate the fish.
One water sample, representing a MaterialSample instance, containing trace bits of many different Organisms
n-number of taxa identifiable from the eDNA extracted from the water sample, each of which is assigned to a "virtual" Organism instance
Each of these n-number of virtual Organism instances intersects with the water-collection Event, yielding the same "n" number of Occurrence instances (one for each virtual Organism, meaning one from each identified Taxon).

Now, here's the trick: assuming one of those "virtual" Organism instances from the eDNA analysis of the water sample maps to a taxon that is a cougar (based on DNA evidence), do we treat it as the same Organism that is represented by the collar data, or is it a different Organism instance? I don't think you could automate that answer. Sure, if you did whole genomes of both the cougar (when you put the collar on) and the sample from eDNA, you could probably confidently say "same beast!", and collapse the Organism to one. In that case, you would represent it as two Occurrence instances for the same cougar Organism. If you're in the camp that the blood cells "are" the cougar, then you'd probably want to expand the geographic scope of the Event where the water sample was taken (via coordinateUncertaintyInMetersProperty) to include the whole footprint of where the "rest" of the cells from that Organism could have been at that moment. But if you're in that camp, then why limit to only the river moving the blood cells of the cougar away from the sack with the rest of the cougar cells? Suppose I drew a blood sample from the cougar when I collared it. Then I put that sample in the car. Now we have two Occurrences for the same cougar -- one out in the field where it's recovering from the tranquillizer, and one in the vial in the car as I'm driving home. Indeed, you can track that vial all over the place (on the airplane flight home, etc.), and call all of it Occurrence instances of the cougar.

From my perspective, that path leads only to madness.

This is why I want to define the boundary between Organism and MaterialSample. And why I want to clarify that instances of Organism participate in Occurrences, but instances of MaterialSample do not. We need to track how our MaterialSamples move around in place and time, but by some other semantics than Occurrence instances.

Welcome to what goes on inside my head late at night | in the shower | stuck in traffic...

deepreef commented 3 years ago

I am going to ask about the concept of Material Sample being equivalent to Preserved Specimen. While in many cases this can be so, it depends on what we consider PreservedSpecimen to represent. In the museum world, this can be a bird skin in a drawer (a voucher specimen), perhaps in combination with associated skeletal material or even tissues etc. Or it could represent just tissues, or just a skeleton, or a single herbarium sheet, or a lot of 100 fish? This term seems to be used in all these contexts?

Yes! I would regard all of these things as representing instances of MaterialSample. The beauty of this class is that there is a many-to-many relationship with Organism. That is, one Organism might yield many MaterialSample instances, and one MaterialSample instance might contain many Organism. This makes it complicated, but also powerful from an informatics perspective. This is why I envision MaterialSample as conceptually the same thing as "Collection Object" from the ASC model.

To my understanding, MaterialSample should be equivalent to the parts of an organism, whatever has actually been preserved.

Agreed! That's one of the examples given ("A part of an organism isolated for some purpose.") However, it's not limited to parts of an Organism. It can also be "A whole organism preserved in a collection", or "A soil sample. A marine microbial sample."

In my understanding, MaterialSample is something that can be put into a discrete container and barcoded. It is something that can be loaned or subsampled for loan. DNA sequences are derived from them. There could be one or many of these associated with an organism in a given museum collection.

Yes -- fully agree!

the same Occurrence can be scattered across different collections in the same institution

Here's where I would take a different path (or at least different wording). The Occurrence itself existed only at the Event (e.g., when the cougar was eating the fish). All these other "things" (MaterialSamples) scattered across different collections are not the Occurrence per se, but rather they all represent Evidence of the Occurrence.

Museums currently don't do a good job of designating which part of the Event ->Organism ->Occurrence->MaterialSample tree museum catalog numbers represent. They could be any part of this tree.

YES! Strongly agree here! And that's why I see catalog numbers as a secondary property, that might be attached at any level in a MaterialSample hierarchy. But for me the question is: do we attach catalog numbers (per se) to Organisms? Or only to MaterialSamples?

Suppose we have a tree in the woods that we revisit every year and take a sample (or several samples) each time we visit it. The tree remains an Organism the entire time, and each extraction from the tree that ends up preserved in a herbarium is a MaterialSample. It's less clear when we're talking about whole Organisms that are preserved. Did they stop being Organisms when they were preserved? When they died? Are they still Organisms in parallel to being MaterialSamples until the specimens disintegrate? Some organizations might assign a catalog number to the living tree in the woods (as a LivingSpecimen). Does the act of assigning the catalog number to the entire tree cause it to become a MaterialSample (to which the catalog number is attached)? Or is it better to say that the catalog number was directly attached to the Organism?

We define classes so that we can assign properties to like things. In my mind, the properties that apply to Organisms and MaterialSamples are different (and also different from properties of Occurrence and Event and Taxon, etc.) but the boundary that is least clear to me is the one between MaterialSample and Organism.

By the way, as tedious as it may seem to some, this exchange is EXTREMELY helpful to me!

Jegelewicz commented 3 years ago

Deep thoughts from @deepreef

I dream about his kind of stuff too and I think you are correct that we need good boundaries. They make for good neighbors and well defined terms lead to better science.

instances of Organism participate in Occurrences, but instances of MaterialSample do not.

I'm not quite sure about that and I don't think you are either:

By my thinking, all the other stuff we traffic in (PreservedSpecimen, FossilSpecimen, LivingSpecimen, HumanObservation, MachineObservation, MaterialCitation, etc.) all represent forms of "Evidence" that support either the truth of an Occurrence instance, or the veracity of an Identification instance

Isn't a MaterialSample (like the cougar blood) evidence of an occurrence (there was a cougar there)? or at least a potential occurrence? Do we need another term for this kind of thing?

dshorthouse commented 3 years ago

By the way, as tedious as it may seem to some, this exchange is EXTREMELY helpful to me!

Ditto for the development of DINA. We could throw our hands up in exasperation, chuck it all & revert to what nearly all the CMS' do: Catalogued Object attached directly to a Collecting Event and also hang off Determinations + near-complete disregard for a hierarchy of material samples. But...that administrative convenience may actually do damage. It buttresses the walls between field work, collections within institutions and across institutions, and all the laboratory-based derivations.

mjy commented 3 years ago

In TaxonWorks we have no concept of Occurrence sensu occurrenceID. I believe in a graph of life framework it likely shouldn't exist. The following rapid, and likely needs refinement and more nuance, but here goes.

We (TaxonWorks) assert what our classes mean, and things that don't meet that definition are not to be added as instance of those things, DwC be damned. Some concepts (definitions off the top of my head, see models for canon):

CollectionObject - The physical object that was collected in the field, returned, assessioned collection, and enumerated. All conditions must be met. Several utility subclases (e.g. CollectionObject::Specimen has total = 1, while CollectionObject::Lot has total > 1).
AssertedDistribution - The biological taxon from a GeographicalArea (gazeteer to 2nd level geopolitical subdivision) as recorded in some Source FieldOccurence(coming) - The biological taxon (not CollectionObject) that was observed in a CollectingEvent. AnatomicalPart - The physical object that has an origin relationship (see below0 with another AnatomicalPart or CollectionObject Extract - The physical sample that has an origin relationship (see below) with another Extract or CollectionObject OriginRelationship - A relation between a new and old thing. If asserted the new thing can not exist without the (prior) presence of the old thing. We can use this to link the fact that some FieldOccurrences are now CollectionObjects or that some Extracts come from other Extracts, and others come from CollectionObjects. CollectingEvent - The unique combination of space, time, collector, and method. Identifier - The information that can be used to differentiate (and localize) instants. Multiple subclasses, can be assigned to most anything in our graph.

The problem with occurrence sensu occurrenceID is that it spans multiple instances of multiple classes, yet the occurrenceID doesn't exist in a graph like ours (and I would argue won't exist in emerging graph-of-lifes). As you can see from the discussion prior it isn't a class of things, it's a utility idea for sharing data. With more precision and better APIs we should be able to reference the ids of our instances directly, without this pseudo-aggregator.

dshorthouse commented 3 years ago

I find this particularly interesting in light of what is meant by Occurrence sensu DwC occurrenceID:

FieldOccurence(coming) - The biological taxon (not CollectionObject) that was observed in a CollectingEvent.

Doesn't a CollectingEvent here presume a CollectionObject? Or, are these incidental biological taxa observed in the field that do not then become or participate as CollectionObjects?

deepreef commented 3 years ago

I'm not quite sure about that and I don't think you are either:

Yes! I'm not sure about it... which is why I lose sleep.

Isn't a MaterialSample (like the cougar blood) evidence of an occurrence (there was a cougar there)? or at least a potential occurrence? Do we need another term for this kind of thing?

Exactly! Berried somewhere among my endless ramblings above is the idea that MaterialSample. Is an example of what I call "Evidence". @baskaufs and I and others have ruminated on this idea ever since the Organism class was introduced to DwC. One of those rare/cool moments of happy convergence was when Rob Whitton and I fleshed out our sematic model of biodiversity-space completely independently from @baskaufs and Cam Webb (not sure of his GitHub tag) when they hashed out Darwin-SW, and we mutually discovered that we had converged on the same model (@baskaufs claims he was influenced by some sort of ASCII-art I posted, so it's not completely independent; but I think we really did converge on the same basic understanding). What they called "Token", Rob and I called "Evidnece". In my original thinking, we would have the following hierarchy:

Evidence (aka "Token")

MaterialSample
- PreservedSpecimen
- FossilSpecimen
- LivingSpecimen
Unvouchered Report
- HumanObservation
- MachineObservation
Multimedia
- StillImage
- Sound
- MovingImage
- etc.
- MaterialCitation

And probably more I'm not thinking of right now. Some of these exist as defined DwC terms; some do not. The idea of "Evidence"/"Token" is that it can represent either "Evidence of Occurrence", or "Evidence of [taxonomic] Identification", or both. but importantly, also neither. That's important as a reminder that these various "Evidence" entities exist as "things" regardless of whether they represent "Evidence" of Occurrence or Identification.

In a recent email exchange with @baskaufs and @tucotuco, I started re-thinking this. Because these "things" exist independently of whether or not they play an evidentiary role, they really shouldn't be framed as subclasses of an Evidence superclass. Rather, these "things" represent "things" of their own (e.g., a specimen is still a specimen, even if you have no idea where or when it was collected -- and don't play the "Earth circa second millennium" card...)

So, going back to my purported ASCII-Art roots, I proposed the following:

[Assertion]---<[Evidence]>---[Token]

In this sense, "Evidence" and "Token" are actually not the "same" thing. The Tokens are the hierarchical lists of things listed above (replace "Evidence" at the top with "Token"), and the "Evidence" is the *relationship" between these Tokens and some sort of "Assertion". An "Assertion", in this context, refers to either an Occurrence (asserted presence of Organism at Event) or an Identification (asserted taxonomic identity of an Organism), which is supported by zero, one or many "Tokens". By contrast, each Token could potentially serve as Evidence of zero, one or many Assertions. Each Token could represent evidence of an Occurrence, or evidence of an Identification, or both, or neither.

So my newer thinking is that "Evidence" (sensu me & Rob Whitton) and "Token" (sensu @baskaufs and Cam Webb) are not really synonyms, but rather the "Tokens" are the "things", and the "Evidence" instances represent the roles those things play as Evidence for "Assertions" (about organism occurrence or taxonomic identity).

But...that administrative convenience may actually do damage. It buttresses the walls between field work, collections within institutions and across institutions, and all the laboratory-based derivations.

Amen, brother!

mjy commented 3 years ago

Doesn't a CollectingEvent here presume a CollectionObject?

TaxonWorks' CollectingEvent instances don't assume anything, DwC instance might. TaxonWorks deals in assertions. If you make an instance of a TW CollectingEvent then you can assert some things, and only infer things based on those assertions. Our CE only asserts facts about collectors, time, space, and method. You can make a new concept for CE and people instantiating instances of it will be asserting other things if you want. Labels are != concepts.

Or, are these incidental biological taxa observed in the field that do not then become or participate as CollectionObjects?

Exactly. CollectingEvents in TaxonWorks are linked to CollectionObjects, FieldOccurrences, and anything else we come up with that would benefit from the assertion of the intersection of time, space, collector, and method.

deepreef commented 3 years ago

FieldOccurence(coming) - The biological taxon (not CollectionObject) that was observed in a CollectingEvent.

I don't believe biological taxa are ever observed. I would maintain that taxa cannot be observed; they can only be defined or asserted. I've never seen a taxon in the field. I've only seen organisms in the field. I only secondarily assert a taxonomic identity to the organism I observed.

The problem with occurrence sensu occurrenceID is that it spans multiple instances of multiple classes, yet the occurrenceID doesn't exist in a graph like ours (and I would argue won't exist in emerging graph-of-lifes).

What are your thoughts on the representation of the "dwc:Occurrence" part of the Darwin-SW graph?

I guess I don't understand what you mean by "sensu occurrenceID". Do you mean in the sense of "An identifier for the Occurrence (as opposed to a particular digital record of the occurrence)", as defined in DwC? I agree with you that it's not a physical thing; but if you really dig deep, none of our classes represent physical things (not even PreservedSpecimen). Even without digging deep, taxa and events are abstract ideas; not things.

As you can see from the discussion prior it isn't a class of things, it's a utility idea for sharing data.

I've always thought of instances to which we assign occurrenceID values to be abstract ideas (i.e., the intersection of an Event and an Organism), and I find that to be an incredibly powerful part of modelling biodiversity information. I guess the problem is, many/most DwC content providers equate the occurrenceID with the material objects stored in their collections. I think it's much better to think of those material objects as evidence supporting the assertion of an Occurrence, rather than Occurrences themselves.

mjy commented 3 years ago

I don't believe biological taxa are ever observed. ... they can only be defined or asserted

We only deal with assertions, I think this is the big difference b/w how you and I model the practical application of our ideas, maybe. You allow for the assertion of Taxa, so why not allow for the assertion of Taxa in the field? It's useful, and you can do meaningful work with it. In the absence of any other informrtion (the physical specimen) you need something to assert with, or you have a big pile of anonymous nodes. So this is semantics IMO. Of course it's all abstract. You don't have to accept the assertions, they are just that. If you choose to, then you can do some logical work with them, if not, do science with somethign else.

mjy commented 3 years ago

What are your thoughts on the representation of the "dwc:Occurrence" part of the Darwin-SW graph?

Basically our FieldOccurrence. I.e. given a TW FieldOccurrence I would infer the existence of at least one CollectionObject, but I would record the evidence as CollectingEvent + OTU (since this is how the data would be used in the absence of the object ever being collected). FieldOccurrence data is never collected without the assertion of some OTU (i.e. we never go to the field and say we saw something, we say we some some Taxon). If we later collected (instantiate) the CollectionObject we would use OriginRelationship to relate the two instances (FieldOccurrence + CollectionObject).

deepreef commented 3 years ago

You allow for the assertion of Taxa, so why not allow for the assertion of Taxa in the field?

I am 100% with you on the "everything is an assertion" part. It's not that I don't allow for the assertion of Taxa in the field; I just think that's not granular enough to accommodate the kind of information I want to track. Whenever I encounter an assertion that "Taxon X occurs in Hawaii", I capture that as a series of assertions (at least two), rather than a single assertion. For example:

A particular organism existed in Hawaii
This particular organism was identified by someone as Taxon X

Maybe you do the same, but just shorthanded it in your description of FieldOccurrence?

I've found that parsing these two assertions out allows more flexible tracking of related assertions. For example, someone might later assert that the organism should be identified as Taxon Y instead of Taxon X. Or, someone might later assert that the organism wasn't actually in Hawaii at the time of the event, but somewhere else. The former is more common, but we have examples of the latter as well.

In the absence of any other infomration (the physical specimen) you need something to assert with,

Agreed! And I call that something "Evidence". Even if it's just a reported observation, the evidence is still captured as "Person A said so".

FieldOccurrence data is never collected without the assertion of some OTU (i.e. we never go to the field and say we saw something, we say we some some Taxon).

OK, maybe we're not so far apart on this. We have very few, if any, examples where an instance of Organism, anchored to an Event through an Occurrence, is not also anchored to a Taxon (including unnamed OTUs) via an Identification. So they are (almost) always "born" together for us. But I prefer to keep them as separate assertions, so they can evolve independently of each other. Doing so requires that an Organism instance is created before any connection between a Taxon and an Event is asserted.

Having said that, there is one area where I am very-much tempted to construct a direct Location-Taxon relationship (or maybe better: Event-Taxon relationship), which is for asserting values of things like establishmentMeans, degreeOfEstablishment, pathway, and occurrenceStatus. These are other examples of terms that (in my opinion) don't really belong in the Occurrence class, unless you extend the allowable scope of Organism to include things like "population" (there was much discussion of this, and for this reason, back when Organism was being debated). But that's a whole 'nother can of worms...

dagendresen commented 3 years ago

I believe that one important source of problems comes from the requirement to shoehorn all these things (MaterialSample, PreservedSpecimen, HumanObservation, etc) into an Occurrence to enable publication in GBIF.

albenson-usgs commented 3 years ago

I think it's much better to think of those material objects as evidence supporting the assertion of an Occurrence, rather than Occurrences themselves.

This makes sense to me!

nielsklazenga commented 3 years ago

I can find no indication, either in the definition of MaterialSample or its placement within Darwin Core, that it was meant to be a superclass of the Specimen classes; in fact, everything indicates that it is meant to be disjunct. Also, for most, if not all, use cases for MaterialSample, a class that is disjunct with the Specimen classes is more useful than one that includes them.

I think the current definition is fine. I think it would be helpful to have different terms for the kinds of Material Sample that have a many-to-one relationship with Occurrences ~that can be published using the Material Sample Core~ and with which the basisOfRecord for the Occurrences would be PreservedSpecimen or LivingSpecimen etc., on the one hand, and the kinds of Material Sample that have a one-to-many relationship with Occurrence and can be used as evidence for Occurrences on the other, but that is a different issue.

mjy commented 3 years ago

Side note, pondering this exchange. I think the DwC community would benefit immensely from one of the philosophies you learn early on in exposure to the the OBO foundry world, another major player in biological standards. There classes don't change, they deprecate (which does not mean go away). If the meaning of your class changes, you mint a new PURL/class for it, your new class, and mark the old as deprecated (the PURL redirects, it is not destroyed), you can add some annotations to link the two, but they typically remain logically disjoint as old != new. Given this approach any proposal, such as this, that says "change", could be thought of as "add". I don't have enough experience with the DwC process, I suspect this is how things actually operate, but its important, I think, to start to get the broader community familiar with this concept as well.

campmlc commented 3 years ago

Strongly agree with @dagendresen that the most" important source of problems comes from the requirement to shoehorn all these things (MaterialSample, PreservedSpecimen, HumanObservation, etc) into an Occurrence to enable publication in GBIF."