Change term

tdwg / dwc

Darwin Core standard for sharing of information about biological diversity.

https://dwc.tdwg.org

Creative Commons Attribution 4.0 International

203 stars 70 forks source link

Change term - MaterialSample #314

Closed Jegelewicz closed 1 year ago

Jegelewicz commented 3 years ago

Submitter: @jegelewicz
Justification (why is this change necessary?): The definition of MaterialSample is essentially the same as that for PreservedSpecimen. Members of the Arctos Working Group feel that these two terms are currently interchangeable. See https://github.com/ArctosDB/arctos/issues/2432 for further discussion.

From https://dwc.tdwg.org/terms/#materialsample

MaterialSample info

Definition A physical result of a sampling (or subsampling) event. In biological collections, the material sample is typically collected, and either preserved or destructively processed.

Examples A whole organism preserved in a collection. A part of an organism isolated for some purpose. A soil sample. A marine microbial sample.

MaterialSample	info
Definition	A physical result of a sampling (or subsampling) event. In biological collections, the material sample is typically collected, and either preserved or destructively processed.
Examples	A whole organism preserved in a collection. A part of an organism isolated for some purpose. A soil sample. A marine microbial sample.

From https://dwc.tdwg.org/terms/#livingspecimen

PreservedSpecimen info

Definition A specimen that has been preserved.

Comments

Examples A plant on an herbarium sheet. A cataloged lot of fish in a jar.

PreservedSpecimen	info
Definition	A specimen that has been preserved.
Comments
Examples	A plant on an herbarium sheet. A cataloged lot of fish in a jar.

Given the above, we propose that MaterialSample should be more specific to something less than what might be considered a "voucher" in order to delineate it from PreservedSpecimen.

Proponents (who needs this change): Arctos Working Group

Proposed new attributes of the term:

Term name (in lowerCamelCase): MaterialSample (no change)
Organized in Class (e.g. Location, Taxon):
Definition of the term: A physical result of a subsampling event. In biological collections, the material sample is typically collected as a subsample from a preserved or living organism, and either preserved or destructively processed. In geological and environmental collections the material sample is typically collected as a subsample of a larger geologic or environmental construct.
Usage comments (recommendations regarding content, etc.):
Examples: A part of an organism isolated for some purpose. A tissue sample. A soil sample. A marine microbial sample.
Refines (identifier of the broader term this term refines, if applicable): None
Replaces (identifier of the existing term that would be deprecated and replaced by this term, if applicable): http://rs.tdwg.org/dwc/terms/version/MaterialSample-2018-09-06 (added by @tucotuco)
ABCD 2.06 (XPATH of the equivalent term in ABCD or EFG, if applicable): DataSets/DataSet/Units/Unit (added by @tucotuco)

Note: all of the above is my interpretation of the Arctos Working Group conversation.

campmlc commented 3 years ago

This makes sense to me: we mint a new organismID for each taxonomic entity identified within a water sample, and then have that Organism instance participate in the Occurrence associated with the water sample collecting Event (even if the presence of the Organism at the Event was only some DNA material in the water)?

On Thu, May 6, 2021, 11:32 AM Teresa Mayfield-Meyer < @.***> wrote:

[EXTERNAL]*

do MaterialSample instances participate in Occurrences only via a representation of an Organism instance?

I was just going to add that ALL of the "things" we have in collections are MaterialSample(s) of Organisms - we NEVER have the whole thing because Organisms have a life over time and capturing the entirety of that is not possible for mere humans.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/dwc/issues/314#issuecomment-833709581, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBCKK6UA2572YJ3LQUTTMLHDJANCNFSM4WOSVQEQ .

Jegelewicz commented 3 years ago

@campmlc yea, mostly I think that works, although in reality the evidence for a taxon in a water sample might eventually end up being from more than one organism so the initial organism might end up being split into two or more. Make sense?

deepreef commented 3 years ago

yea, mostly I think that works, although in reality the evidence for a taxon in a water sample might eventually end up being from more than one organism so the initial organism might end up being split into two or more. Make sense?

Agreed! However, there is one logistic issue, which comes back (again) to my struggles with Organism vs. MaterialSample. Consider this real-world sequence of events: 1) Water sample is collected from the ocean (MaterialSample). 2) Data regarding Event where water sample was collected are recorded 3) Water sample is curated (sucked through filters, which are then stored for later eDNA analysis -- creating another, derivative MaterialSample?) 4) Extracted sample is analyzed for eDNA, and a number of identified and unidentified Taxa are inferred to have been represented in the sample. 5) One new Organism instance is created for each identified Taxon resulting from the eDNA analysis (each with its own organismID

The question is, how does one represent the relationship between the MaterialSample records (at # 1, possibly a second at # 3) and the Event data, during steps 2-4? In other words, how does one capture the link between a MaterialSample instance and an Event instance directly, before there are any identifiable Organism instances to bridge the gap (via an Occurrence instance)?

My only (unsatisfactory) solution is to generate one Organism record (with it's own organismID, but no associated Identifications), with the assumption that at least one organism from one taxon is represented in the sample. Then, when step 5 is complete, the first taxon can be applied to this "place-holder" Organism instance, and any additional taxa can have their own Organism instances minted (one per taxon). And in the case where no taxa can be identified from the water sample, then you could still keep that Organism instance, maybe add an identification to any taxon ("Biota"?), and then represent the Occurrence instance with occurrenceStatus = "absent".

Jegelewicz commented 3 years ago

Water sample is curated (sucked through filters, which are then stored for later eDNA analysis -- creating another, derivative MaterialSample?)

I would say no - just that the sample has been prepared with some method. This is equivalent to a lizard being placed in formalin.

My only (unsatisfactory) solution is to generate one Organism record (with it's own organismID, but no associated Identifications), with the assumption that at least one organism from one taxon is represented in the sample.

I don't believe this is unsatisfactory IF instead of generating an organismID, you generate a lotID or metaMaterialSampleID. This ID could also be associated with any Organisms that eventually result from the MaterialSample. Creating a link back to the original collecting event.

smrgeoinfo commented 3 years ago

Hmm, long discussion here... I haven't digested it all, but going back to the beginning, to the proposed definition :

A physical result of a sampling (or subsampling) event.

The problem I see is that 'sampling event' does not seem to be defined (I don't find it in https://dwc.tdwg.org/terms/), so the definition begs the question-- what does 'sampling' mean. Is 'physical result' necessarily a material object? I'm not a biologist, but have found the definition in W3C SSN to be useful: a sample is a "Feature which is intended to be representative of a FeatureOfInterest on which Observations may be made" . (Feature basically means 'thing'). In this view, a MaterialSample (which I hope means a physical object) is an object (a thing composed of matter that has some defined boundaries) that was collected with the intention of representing something the collector is interested in (the feature of interest). To collect would mean to separate a thing from the feature of interest in a way that it can be transported as an individual.

At some point, this object gets assigned an identifier; from then on the identifier represents the object as it was when the identifier was assigned. Things done before the identifier is assigned are part of the sample collection (origin) process. Things that happen to the object afterwords should be represented as a 'history' associated with the sample-- including splitting, taking parts, chemical treatments, annihilation etc. Many problems arise when analytical preparations are made (e.g. thin sections, SEM stubs, XRF pellets, DNA concentrates, biome cultures...) to produce derivative samples that are not assigned separate identifiers. Such items could be represented as 'non-identified' derivatives in the sample history. It would be better (and more work...) if they are separately identified and linked back to the source sample through a process relationship that describes how they are produced.

I don't get the distinction between 'specimen' and 'MaterialSample'

dshorthouse commented 3 years ago

@smrgeoinfo The word "event" is a bit misleading in this definition of MaterialSample, imho. Likewise, use of the subjective words "preparation" and "specimen" are equally misleading.

A MaterialSample can be a lot, a catalogued specimen, an uncatalogued specimen, a preparation, a derivative, a jar of water, a bag of soil, a bucket of bugs, etc. At least, that's what we're aiming for. As you say, what these "things" have in common is that they are physical objects and some process was invoked to gather them (i.e. "collected" through an Event) or produce them from other MaterialSample(s). What's important in a chain of them is that there is logically a single collecting Event & subsequent processes - event with a little 'e' - that may result in a change of state that spawns new child MaterialSample(s).

In effect, what this discussion is trying to do is collapse the overloaded language of "specimen", "preparation", "sample", "thin sections", etc. into a more expansive meta concept, shift the focus to their relationships, define more precisely what is anchored to that special collecting Event, and in so doing eliminate the explosion of spurious Occurrence records in our aggregators. We presently have a number of Darwin Core terms like catalogNumber and preparations illogically nested under the Class umbrella Occurrence, which by definition requires the tripartite Organism, place, and time. That totally muddles the nice relationships we could represent between a chain of MaterialSample(s) forcing us to share what looks like independently derived expressions of Organism, place and time – a DNA concentrate with a collecting Event and an Organism as though that's the same "thing" as both the leg or the catalogued insect specimen from which it was derived.

The consequence of this re-articulation would be disruptive and far-reaching & it may require significantly tighter coordination among parties that have taken charge of some parts of the pipeline. But the outcome could be cleaner sets of links, unpolluted by extraneous duties. For instance, DNA barcodes could be shared with aggregators from some organizations as MaterialSample(s) without constraint of ownership over the simultaneous sharing of collecting Event or the specimen records from which tissues were obtained, merely to produce an Occurrence record that some aggregators require. However, doing such a thing will require significant redesign of software used to publish data.

smrgeoinfo commented 3 years ago

@dshorthouse -- sounds like we're in agreement! (specimen, preparation and sample are confusing to me!). I don't see any critique of the basic definition approach proposed (perhaps too obliquely?) in my comment above.

Condensed version: A MaterialSample is a physical object, intended to be representative of some physical thing in the world of interest to someone, on which observations can be made. In this view, a physical object is thing composed of matter that has some defined boundaries.

The W3C SSN interpretation of 'sample' does not focus on the sampling event, rather on the ontological dependence between the 'sample' and the 'featureOfInterest' it is intended to represent. The difference between a 'rock' and a 'rock sample' is that the rock sample is intended to represent something, e.g. some rock body (formation) in the Earth. It doesn't matter when or how it was collected, its still a sample because of that relationship. Sample is a Role. Bear in mind that in the SSN context they are also thinking of 'sample' in the social science context, in which the sample is a subset of some population intended to be representative of that population. Obviously a broader concept than 'physical sample' or 'material sample' (which I think are synonyms?)

deepreef commented 3 years ago

I've already rambled on endlessly on this topic above, so I don't want to add more noise. But for me, the key questions for adequately defining MaterialSample are:

1) What is the boundary between Organism and MaterialSample? Specifically, what properties belong to each of these concepts, and are there any properties that (potentially) apply to both?

2) Related: Can/do instances of MaterialSample participate directly in Occurrence instances (e.g., MaterialSample-at-Event)? Or do they "pass through" instances of Organism? Same question applies with respect to direct relationships between MaterialSample and Identification. [Note: for instances of MaterialSample that are unrelated to living things, the same basic questions apply for whatever the superclass of Organism is (we use the term "Individual" for this superclass).]

My third question was about whether MaterialSample instances are restricted to biological "things", or can they also represent non-organismal things. But I sense there is general consensus/agreement that at least the concept behind MaterialSample can apply to non-living things, even if the term might not be the best choice for other domains outside of biodiversity.

Jegelewicz commented 3 years ago

What is the boundary between Organism and MaterialSample? Specifically, what properties belong to each of these concepts, and are there any properties that (potentially) apply to both?

I feel like @smrgeoinfo gave a good answer to that

A MaterialSample is a physical object, intended to be representative of some physical thing in the world of interest to someone, on which observations can be made. In this view, a physical object is thing composed of matter that has some defined boundaries.

I think that we should treat all the "things" we have as MaterialSample, no matter that you have the dead body of a squirrel, you still don't have ALL material evidence ever created by the Organism that is/was the squirrel.

Can/do instances of MaterialSample participate directly in Occurrence instances (e.g., MaterialSample-at-Event)? Or do they "pass through" instances of Organism?

If we agree on the above, MaterialSamples participate directly in Occurrences. I wouldn't say that MaterialSamples pass through Organism, rather Organism creates MaterialSamples that are collected at Occurrences.

Same question applies with respect to direct relationships between MaterialSample and Identification. [Note: for instances of MaterialSample that are unrelated to living things, the same basic questions apply for whatever the superclass of Organism is (we use the term "Individual" for this superclass).]

It seems to me that MaterialSamples are evidence for Identification as well as Organism, some MaterialSamples are better evidence than others. There is more than one Identification that can be inferred from any given MaterialSample. Think about fossil material which might have a biological identification as well as a geological one. Maybe we should look at Organism the same way? Sure we see the squirrel as an Organism or "individual", but maybe we also see the population of squirrels in a certain place (or a herd of zebras, etc.) as an Organism.

Not sure I contributed anything, just thinking out loud.

deepreef commented 3 years ago

If we agree on the above, MaterialSamples participate directly in Occurrences. I wouldn't say that MaterialSamples pass through Organism, rather Organism creates MaterialSamples that are collected at Occurrences.

This is what I'm very uneasy about. I'm still of the mind that only Organisms participate in Occurrences, and MaterialSamples do not. This, I think, was the idea behind "curated" being a pivot-point representing the birth of a MaterialSample. In other words, the "Sample" part of MaterialSample. Organisms do all kinds of things in nature, and we document these things with recordings (images, video, sound, documented observations) as instances of Occurrence without ever needing to mint a single identifier for a MaterialSample. It's only when matter (physical stuff) is sampled (extracted from it's manifestation as an Organism or part of an Organism) that it becomes MaterialSample. We definitely need to trace the fact that the MaterialSample was obtained from an Organism at an Event (aka Occurrence), but to me that doesn't mean that the MaterialSample itself participated in the Occurrence. Rather, it was derived from it. But that begs the question: If a derivative MaterialSample is created (e.g., the extraction of a tissue sample from a whole-organism specimen), then does that creation/extraction instance represent an Occurrence itself?

This is when my head starts to hurt.

It seems to me that MaterialSamples are evidence for Identification as well as Organism, some MaterialSamples are better evidence than others.

Yes, absolutely! But this is where the as-yet-non-standard notion of "Evidence" (="Token" sensu Darwin-SW) comes into play. Put simple, we don't say:

MaterialSample is identified as Taxon

Rather, we say:

MaterialSample serves as "Evidence" that Organism is identified as Taxon

This distinction matters when we represent a knowledge graph of this stuff. So, again, I maintain that only instances of Organism participate directly with instances of Identification.

Think about fossil material which might have a biological identification as well as a geological one.

That's fine, but geological identifications fall outside the scope of dwc:Identification.

Maybe we should look at Organism the same way? Sure we see the squirrel as an Organism or "individual", but maybe we also see the population of squirrels in a certain place (or a herd of zebras, etc.) as an Organism.

Yes -- I think we (mostly?) all agree on that, and it's enshrined in the definition of dwc:Organism.

Not sure I contributed anything, just thinking out loud.

That characterizes every post I've made to any of these tdwg GitHub threads...

Jegelewicz commented 3 years ago

I'm still of the mind that only Organisms participate in Occurrences, and MaterialSamples do not.

To me this seems strange. Organisms move, change, evolve, are born, die, eat, poop, and so on. They "occur" for as long as we view them as an organism and in reality one can NEVER have a whole organism (even though we assert that in parts in Arctos - which makes me think that is a terrible assertion @dustymc ). If we pull a feather off a living bird, the Organism still wasn't involved in the occurrence, the set of material samples that make up the Organism in that moment was. Whether we keep the feather or the whole bird, we still only have a MaterialSample that represents an Organism in space and time.

Organisms do all kinds of things in nature, and we document these things with recordings (images, video, sound, documented observations) as instances of Occurrence without ever needing to mint a single identifier for a MaterialSample

I would mint that identifier and apply it to the image, video, sound, or document in which the "documented observation" is written. Those are all "things" that I would consider samples of some sort. How else will you locate that evidence later? I can see drawing a distinction between physical evidence and digital or written evidence, but either way they should have an identifier or else they are just hearsay?

It's only when matter (physical stuff) is sampled (extracted from it's manifestation as an Organism or part of an Organism) that it becomes MaterialSample. We definitely need to trace the fact that the MaterialSample was obtained from an Organism at an Event (aka Occurrence), but to me that doesn't mean that the MaterialSample itself participated in the Occurrence.

MaterialSample was obtained from an Organism at an Event (aka Occurrence) Was it though? If I find a feather, I cannot record an "Organism at an Event" because I don't know WHEN the Organism left it there or even if the Organism left it there as opposed to it blew in from 100 miles away. I can only say that MaterialSample (feather) was in that place at that time and it represents some Organism and some Taxon. I think this even holds true for the body of a bird that I catch in a mist net (which may have left feathers elsewhere over the course of its life that I might find and catalog separately, not knowing they came from that body). It seems to me that all the "things" we collect are MaterialSample that are evidence for one or more Organisms and are also evidence for one or more taxa depending upon how much they are examined.

Again, not sure I am contributing - but for me, Organism is a very flimsy thing with no real boundaries, just the ones we give it at the moment in time we are observing "it". How can you pin down an Organism at a place and time? It feels like Heisenberg's uncertainty. If we can pinpoint the location and time, we will not have all of the MaterialSamples that make up an Organism over the course of its existence, but if we pinpoint all of the MaterialSamples that make up an Organism we get zillions of locations and times. This is why I think the way I do about Organism and MaterialSample.

deepreef commented 3 years ago

To me this seems strange. Organisms move, change, evolve, are born, die, eat, poop, and so on. They "occur" for as long as we view them as an organism and in reality one can NEVER have a whole organism (even though we assert that in parts in Arctos - which makes me think that is a terrible assertion @dustymc ). If we pull a feather off a living bird, the Organism still wasn't involved in the occurrence, the set of material samples that make up the Organism in that moment was. Whether we keep the feather or the whole bird, we still only have a MaterialSample that represents an Organism in space and time.

Hmmm... That's now how I see it. To me, an Organism is an abstract entity that begins (more or less) when an egg is fertilized, and ends when the molecules constituting its physical manifestation at the time of its death have mostly disintegrated. During the countless instances where/when that Organism existed in space and time during its physical existence, the Organism participates in similarly countless Occurrence instances. A tiny, tiny, tiny, tiny fraction of these Occurrence instances of a tiny, tiny, tiny fraction of Organisms find their way into our databases. In some cases, we have images or videos or sound recordings to serve as evidence that the Organism occurred at an Event. In other cases, we only have the word of the human observer that the Organism occurred at the Event. Sometimes, a human extracts all/most/part of the physical manifestation of the Organism at the Event, and then curates that sampled material in some way; and the resulting MaterialSample then serves as "evidence" that the Organism was present for the Event (and, thus, an Occurrence is an intersection of an Organism and an Event).

At least... that's how I've always imagined it.

I would mint that identifier and apply it to the image, video, sound, or document in which the "documented observation" is written.

I would too, but I would mint that identifier as representing an instance of "Evidence" (aka, "Token"); not as an instance of MaterialSample. This diagram captures my understanding of these concepts and their relationships.

Was it though? If I find a feather, I cannot record an "Organism at an Event" because I don't know WHEN the Organism left it there or even if the Organism left it there as opposed to it blew in from 100 miles away.

Yes, this is the tricky use case, similar or identical to the one presented earlier by @dshorthouse about the cougar and its DNA sample collected downstream. We can infer an Occurrence of both the cougar (and the bird from which the feather came) based on the MaterialSample evidence. That part is consistent with what I've been thinking. But where it's tricky is that metadata about the extraction of the MaterialSample may not coincide well with the precise information of where/when the Organism existed. The easy solution is to establish an Occurrence instance representing the Organism from which the feather\DNA came with sufficiently broad coordinateUncertaintyInMeters and date/time values. But we still want to document the precise location and time where the MaterialSample derived from that Organism was extracted from nature to begin its journey of curation.

This sort of use case is one of the main reasons why I strongly support the formation of a Task Group to explore options to deal with these sorts of examples. I suppose they are particular important for fossil material, where the time/place of the collection of the fossil may be wildly different (millions of years, hundreds of miles?) from where the Organism met its demise. And it also throws a bit of a wrench/spanner into my notion that an Organism ends when "the molecules constituting its physical manifestation at the time of its death have mostly disintegrated" (as that would have been long prior to the extraction of the fossil from nature).

I think this even holds true for the body of a bird that I catch in a mist net (which may have left feathers elsewhere over the course of its life that I might find and catalog separately, not knowing they came from that body).

Yes, and each pone of those feathers constitutes Evidence for a few more of those tiny, tiny, tiny fractions of Events at which an Organism occurred through its lifetime. But the taxonomic identity is still a property of the organism as a whole (from which the feather came). And as noted above, it requires appropriate scoping of error for both space and time parameters of the Event at which the Organism participated (separate from the space and time attributes where a MaterialSample was extracted from nature.

I guess my (still very-much evolving) perspective on this is that there needs to be a direct relationship between MaterialSample and Event, but I wouldn't regard this as an instance of dwc:Occurrence. Perhaps another class is needed for something like "Gathering", which represents the relationship between a MaterialSample and the Event at which it was extracted from nature. I suppose in the vase majority of cases, pairs of: Organism-at-Event (=Occurrence) MaterialSample-at-Event (="Gathering"?) can be computed to recognize when both Event instances are the same, and the MaterialSample of the later was derived from the Organism of the former.

Now my head is really starting to hurt...

It seems to me that all the "things" we collect are MaterialSample that are evidence for one or more Organisms and are also evidence for one or more taxa depending upon how much they are examined.

Yes! On this part we are in 100% agreement!

I'm also in agreement that Organism is a somewhat flimsy concept. The definition intentionally constricts it to taxonomic homogeneity (although at any rank); but is open to scoping in terms of numbers of individuals. But then again I would say that "Occurrence" is equally flimsy; yet we find it useful as an abstract concept (just as I find Organism to be a useful abstract concept).

Jegelewicz commented 3 years ago

"Evidence" (aka, "Token"); not as an instance of MaterialSample

So would you also say that MaterialSample is just a kind of evidence? That's how it seems to me.

throws a bit of a wrench/spanner into my notion that an Organism ends when "the molecules constituting its physical manifestation at the time of its death have mostly disintegrated" (as that would have been long prior to the extraction of the fossil from nature).

How about "totally disintegrated"? As for the fossils - they are (often) not the molecules of the Organism, but a "cast" made of new molecules, at least in the case of permineralized stuff. I think I could still be convinced that fossils are part of the Organism though - and why not?

there needs to be a direct relationship between MaterialSample and Event, but I wouldn't regard this as an instance of dwc:Occurrence. Perhaps another class is needed for something like "Gathering", which represents the relationship between a MaterialSample and the Event at which it was extracted from nature.

Yep - In Arctos our event types include

collection - Event at which an object was collected through removal from functional cultural, biological, ecological, or archeological context.

Which is what is applied to the majority of the "Occurrences" that Arctos collections send to GBIF. The reality is that we are saying "stuff was collected at this time and place", NOT we found this organism or taxon at this time and place (even though both may be true). SO a lot of what are currently "Occurrence" may not be the exact kind of occurrence one would think they are....

deepreef commented 3 years ago

So would you also say that MaterialSample is just a kind of evidence? That's how it seems to me.

I wouldn't say "just" a kind of Evidence -- rather I would say "can serve as Evidence" -- similar to MaterialCitation and various media items (images, sound recordings, videos, etc.) But I think MaterialSample instances are MUCH MORE than "just" Evidence. Indeed, in the context of Museum collections, I think they are the core object around which everything else revolves. In other words, Museums are not caretakers of Occurrence instances. They are caretakers of MaterialSamples (which also happen to serve as great evidence for Occurrences and Identifications and such.

How about "totally disintegrated"? As for the fossils - they are (often) not the molecules of the Organism, but a "cast" made of new molecules, at least in the case of permineralized stuff.

Exactly -- that's why I pointed out the wrench/spanner on the "disintegrated" thing (especially in the context of fossils). I have other thoughts on fossils that effectively would place them in Audubon-Core space, but that's an entirely different discussion.

collection - Event at which an object was collected through removal from functional cultural, biological, ecological, or archeological context.

Yes -- exactly. It has different implications and assumptions than dwc:Occurrence.

SO a lot of what are currently "Occurrence" may not be the exact kind of occurrence one would think they are....

Yes, that's one of several examples of why I think too many square data records have been squeezed into the round Occurrence hole. One of the key steps to untangle that is embracing MaterialSample as it's own "core" entity, rather than as some sort of pseudo-Occurrence. It was absolutely necessary to work with those square data/round data classes to get us to where we are now in TDWG-land, but I get the strong sense that a critical mass of data providers/consumers are ready move things to the next level.

campmlc commented 3 years ago

@deepreef very much agree on all these points

tucotuco commented 3 years ago

With all this agreement it might be a good time to start thinking about the scope of a Task Group. ;-)

On Wed, Jun 9, 2021 at 9:54 PM Mariel Campbell @.***> wrote:

@deepreef https://github.com/deepreef very much agree on all these points

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/dwc/issues/314#issuecomment-858198548, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ727EYDPL3HW7KCDO2DTTSAELJANCNFSM4WOSVQEQ .

RogerBurkhalter commented 3 years ago

@deepreef very much in agreement as well.

deepreef commented 3 years ago

With all this agreement it might be a good time to start thinking about the scope of a Task Group. ;-)

Indeed! And I guess that would start with someone stepping up to lead it (...he writes, as he quickly crouches down behind the desk and starts crawling towards the back exit... ;-) )

Seriously, though -- I would be delighted to get up at 3am (or whatever time it is in Hawaii) every single week to actively participate in such a Task Group (yes, seriously!), but I am absolutely not the right person to lead it (unless the intention is to guarantee that it languishes).

tucotuco commented 3 years ago

Given that I (pretty much necessarily must) participate in every Darwin Core-related Task Group that gets chartered, I do not have the bandwidth to lead it either.

On Wed, Jun 9, 2021 at 11:03 PM Richard L. Pyle @.***> wrote:

With all this agreement it might be a good time to start thinking about the scope of a Task Group. ;-)

Indeed! And I guess that would start with someone stepping up to lead it (...he writes, as he quickly crouches down behind the desk and starts crawling towards the back exit... ;-) )

Seriously, though -- I would be delighted to get up at 3am (or whatever time it is in Hawaii) every single week to actively participate in such a Task Group (yes, seriously!), but I am absolutely not the right person to lead it (unless the intention is to guarantee that it languishes).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/dwc/issues/314#issuecomment-858222708, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ7262VFSZRNYJTNXLDWLTSAMQRANCNFSM4WOSVQEQ .

Jegelewicz commented 3 years ago

Sigh, says the person who started this whole thing.....

I have never lead a TDWG task group, so first I have to figure out the rules/process....

smrgeoinfo commented 3 years ago

A MaterialSample is a physical object, intended to be representative of some physical thing in the world of interest to someone, on which observations can be made. In this view, a physical object is thing composed of matter that has some defined boundaries.

dr-shorthair commented 3 years ago

Arriving very late in this discussion ...

I wonder if it would be helpful to be more clear about the use of, and distinction between, the terms sample and specimen.

'sample' relates to the relationship of the thing to hand with a broader context. A 'sample' is usually a small or manageable thing which stands for (is designed to be representative of) a larger or less accessible thing. A sample is collected for the purpose of making observations. Samples are often ephemeral.
a 'specimen' (in the GLAM context at least) is a preserved and curated thing. It has an ongoing utility, often but not necessarily to act as a sample of something larger.

Some material samples are preserved and curated, so they are also specimens. Some samples are not.

@smrgeoinfo has alluded to these issues a couple of times above.

smrgeoinfo commented 3 years ago

@dr-shorthair -- that distinction between sample and specimen is new to me, but I think it works (sorry... what is GLAM context?). If I understand, the idea is there can be samples that are not specimens, and specimens that are not samples. Some specific examples would help, particularly for a 'specimen' whose utility is not dependent on its relationship to some context (feature of interest) in the world.

dr-shorthair commented 3 years ago

GLAM - Galleries, Libraries, Archives and Museums.

This clarification of 'specimen' was given to me by Dimitris Koureas.

deepreef commented 3 years ago

In my mind, many of these words do not have clear definitions, and so I lump them (including "specimen" and "sample") as 1:1 synonyms with MaterialSample. I think there may be value in establishing specific subclasses of MaterialSample, but there are various ways they can be framed. The distinction between 'specimen' and 'sample' described by @dr-shorthair would parse instances of MaterialSample according to their completeness as a "thing", such that something like 'whole specimen' would be branded as a 'specimen', and something like 'tissue sample' (a small sampling from a whole specimen) would be branded as a 'sample'. I think that's a step in the right direction, but there are other considerations as well.

First of all, we'd need to add a third subclass along the lines of 'aggregate', to represent those instances of MaterialSample consisting of multiple 'things' (e.g., a lot of multiple whole specimens). However, other instances of MaterialSample might represent aggregations of 'samples' (e.g., eDNA samples). Still others might represent a mixture of 'things' and 'parts of things' (e.g., a water sample that might contain whole-organism plankton and larvae, as well as fragments of tissue and DNA).

Second, there are other axes around which instances of MaterialSample might be parsed. (e.g. dwc:preparations). It's not clear to me whether these might represent another set of subclasses for MaterialSample, or would be better represented in some other way using different properties to represent the different characteristics along these axes of each MaterialSample instance.

Third, the word 'specimen' has different meanings that run counter to the distinction of 'specimenvs. 'sample. For example, most botanical specimens are called "specimens", but they fall squarely into the definition of 'sample` as stated above. Or... maybe I misunderstand the scope of what things are larger things, vs. extracted bits of larger things?

dr-shorthair commented 3 years ago

I suppose pretty much every 'specimen' in a museum or gallery is representative of an artists body-of-work, or a school-of-art, or a civilization, or similar, and are likely to be catalogued in this way.

In SSN/SOSA the predicate is sosa:isSampleOf and this is pretty much the only required property of a sosa:Sample. Of course in RDF under the OWA a property instance may be missing from a specific dataset or graph, even though the Ontology says there must be one. That might be because the thing that it is a sample of is not yet clear.

I'm not at all saying that you should abandon your current terminology - I know that sample/material-sample/specimen are often used synonymously, and I used the word 'Specimen' as a synonym for 'material sample' in ISO 19156 (O&M v2) (I would change that if I had a do-over). I'm just suggesting that it is worth recognising that sampling and curation are separate concerns, and that thinking about these concerns separately might help with some of the definitional challenges.

deepreef commented 3 years ago

I'm just suggesting that it is worth recognising that sampling and curation are separate concerns, and that thinking about these concerns separately might help with some of the definitional challenges.

I wholeheartedly agree with this, and I think this is in many ways the crux of the discussion that needs to follow with respect to the Task Group. I had written a summary of my take on the key issues related to the Task Group scope and agenda (#358), but I apparently either posted it in the wrong place, or perhaps failed to post it entirely.

smrgeoinfo commented 3 years ago

From usage point of view, the questions might be A. What is the thing that is getting bound to the identifier ( I hope the intention of 'MaterialSample'). Might be:

An original object as it was collected in the field ('sample') (object + collection event+ sampledFeature ) (see SpecimenType for proposed iSamples Categories);
A physical object that has been curated (preserved, mounted, given an identifier...) and accessioned to a repository for long term preservation ('specimen'); (object + 'preparation'(curation, preservation) event(s), with or without collection event + sampledFeature);
A data object that is a representation of a physical object that might be 1 or 2 from above. (what MIDS is working on).

Is there something else (besides sampling procedures, responsible parties, preparation procedures, locations) that we need to identify in this domain?

There is a different (but overlapping) information model for each of these 'resources' ( kinds of things). B. What are the required information elements to document these resources (#1, 2, 3 above)

elywallis commented 3 years ago

Having (eventually) read (most) of the contributions to this discussion, and agreeing with many of them, I'd just like to come back to users and what will result at the front end of aggregators where these data wind up. Starting with @deepreef 's reminder that:

DarwinCore began as a way for the Museum community to share data about preserved specimens

I'd like to bring us back to some real questions we get from users of aggregators or staff at our institutions curator A: "How do I just see the specimens?" curator B: "I don't want any observation records, and no environmentalDNA records either" curator C: "Dr X (researcher) rang me up and wants to know what tissue samples we have for [insert taxon name here] so that they can request some for DNA analysis. How does Dr X do that?"

Discussions like these (whilst interesting) can easily drift into such complexity that we lose sight of the fairly straightforward ways in which users may want to interact with the data. The "I just want to ... " usecase.

Regardless of how this discussion ends up, can I put in a plea for the least-skilled user here? Or even for the user who is, perhaps, an ecologist or a population modeller or a government bureaucrat who "just wants some data". They might not know to look for "material sample" when they just want to get data about some specimens in a particular collection; or might not be too concerned about whether there's a boundary between an organism and a taxon because they just want to find that feather, and they know the record is there somewhere. Or for the collection manager who isn't going to sit and read 128 comments to find out how to map the things they call "specimens" out of their collection management system and into DarwinCore?

We might not be there yet, but could we please eventually circle back to how will real-life users navigate these concepts in GBIF, or ALA, or wherever, so that users can intuitively find what they're looking for?

deepreef commented 3 years ago

Very well said, @elywallis !!! I completely agree. Indeed, a large part of my interest in this topic (MaterialSample) is to address exactly the sorts of real questions you pose. I think the answer to all three questions (and many others) rests with some sort of logical alignment of properties like preparations, disposition (both of which I have argued should be organized as properties of the MaterialSample class), materialSampleType, and parentMaterialSampleID (among others).

While I have probably been more guilty than anyone else in terms of endlessly waxing philosophical/conceptual, my real motivation here is driven by very practical needs. We at Bishop Museum are in the early stages of an informatics renaissance, including the harmonization digitization and data management among several major natural history collections as well as cultural and library/archives collections.

A great deal of what I hope the Task Group will accomplish is to help me get to a place where our data management system can easily answer exactly these kinds of real-world questions.

One final note in defense of the philosophical/conceptual perspective, though: I have learned over the years that constructing solutions optimized for solving real-world problems that are right in front of us sometimes (often?) accomplishes short-term gain in exchange for long-term pain. Having endured a great deal of that pain, I can see how thinking this stuff through carefully can lead to the development of systems that not only easily answer the questions that are right in front of us, but also the many countless questions we haven't even thought to ask (...yet). [That is...short-term pain in exchange for long-term gain...]

Jegelewicz commented 3 years ago

I woke up in the middle of the night thinking about how DarwinCore is NOT structured for collection management or sample discovery. The two comments above make me feel that even more.

dshorthouse commented 3 years ago

I woke up in the middle of the night thinking about how DarwinCore is NOT structured for collection management or sample discovery. The two comments above make me feel that even more.

Very true! And the irony is, when creating a collections management system from scratch, one of the (many) considerations designers try to accommodate is, "How does this translate to Darwin Core?" Rolling-up relational concepts into a philosophical Occurrence is not always an easy thing to do, especially when there are layers of derivative objects not all of which participated directly in a collecting event. The other tension is the very common situation where derivative objects have determinations that differ from that of the parent (eg barcode BIN applied to the DNA derived from a severed leg vs identification by a person applied to the once whole beetle using morphological characters, both of which may be independently shared by different organizations as components of what looks like two Occurrence records).

tucotuco commented 3 years ago

Darwin Core is not structured. When we give courses on Darwin Core we try to hammer this in. It is a bag of terms that we hope to define well enough so that it can be reused in lots of contexts (including to define fields in databases). The confusion arises because the most popular (but not only) way to share data "in Darwin Core" is through Darwin Core Archives and the very few structures supported through "cores" an "extensions" defined by XML files on rs.gbif.org.

Jegelewicz commented 3 years ago

When we give courses on Darwin Core

I think I need one of these....

tucotuco commented 3 years ago

I think I need one of these....

I would begin at Chapter 0 on https://github.com/tdwg/dwc-qa/wiki/Webinars.

deepreef commented 3 years ago

With the caveat from @tucotuco that DwC is not intended to be structured (not entirely true, or it wouldn't have defined classes)...

Rolling-up relational concepts into a philosophical Occurrence is not always an easy thing to do, especially when there are layers of derivative objects not all of which participated directly in a collecting event.

So... even though DwC was born in the context of sharing specimen data, I think the community at the time felt that the most valuable output from sharing/aggregating specimen data was the ability to document the occurrence of organisms in space and time (specifically, via collecting event data associated with those specimens). In that context, the large body of unvouchered observational records (especially birds) represented an opportunity to add additional "meat" to these patterns of organisms occurring in space and time. Thus, the notion of a "specimen" as the basis of record became Occurrence.

I think that was a step in the right direction. However, the part that never sat well with me was the notion of Specimen=Occurrence, which seemed to pervade TDWG-land for a number of years, and underpins the point raised by @dshorthouse quoted above.

Later, with the near-simultaneous introduction of MaterialSample and Organism classes, I think DwC took another step in the right direction towards dis-entangling assertions about the presence of organisms in space and time from the "evidence" used to support those assertions. I've rambled on enough about that earlier in this thread, so no need to repeat here.

Unfortunately (but predictably), there was a period of several years when most TDWG-folk weren't quite sure how best to implement the concepts represented by these two new DwC classes/concepts (especially given that one of them -- MaterialSample -- was actually intended for a slightly different function than what it ended up being defined as).

This very unusual year (for a few reasons) in DwC is, I think, in part a result of some sort of "awakening" among a critical mass of data content providers in how the "power" of these new classes (MaterialSample and Organism) can be leveraged through an evolution in how data exchange mechanisms built on DwC can function. Basically, this comes back to that goal of disentangling Occurrence instances from the evidence that supports them.

The other tension is the very common situation where derivative objects have determinations that differ from that of the parent (eg barcode BIN applied to the DNA derived from a severed leg vs identification by a person applied to the once whole beetle using morphological characters, both of which may be independently shared by different organizations as components of what looks like two Occurrence records).

Indeed, Identification does play a critical role in this -- because historically TDWGers have used Taxon as a proxy for Organism; when in fact, Taxon is a non-objective property of an Organism. My contention is that Organisms participate in Occurrences, and these Organisms are asserted to represent members of a particular Taxon through one or more Identification instances. And, as I've expounded previously in this thread, "evidence" underpins both instances of Occurrence and instances of Identification.

So, in the example from @dshorthouse above, a single Organism might have more than one derived MaterialSamples -- e.g., the "parent" whole specimen (linked to a collecting event, and thus serving as evidence of an Occurrence), and the "child" tissue sample. The "parent" MaterialSample might serve as evidence to apply an Identification of one Taxon to the Organism from which the whole specimen was derived, and the "child" tissue sample might result in a DNA sequence that serves as evidence in support of an Identification to a different Taxon. If the link between these two MaterialSamples (either the parent/child relationship, or the fact that both are derived from the same Organism) is not clearly maintained, then as @dshorthouse notes these will likely end up as two distinct Occurrences, rather than (correctly) a single Occurrence based on an Organism with more than one asserted Taxon Identification....

phew... OK, I obviously allowed my philosophical/conceptual waxer out of his cage for longer than I probably should have, but there you go.

deepreef commented 3 years ago

The previous post was way too long, so I decided to break this out as a follow-up.

I wanted to call out a different "tension" that I think underpins a lot of the conflict/confusion on this issue: The tension between the needs of collection object managers, and the needs of biodiversity researchers. Obviously, many of us are primarily focused on managing physical objects in a collection (MaterialSample-centric perspectives), and many of us are primarily focused on the biodiversity research aspects of DwC data (Occurrence/Identification-centric perspectives). A few of us have one leg firmly planted in each camp. Two main obstacles must be overcome to achieve success for both camps using the same data exchange standard: 1) We need to understand how the data should be modelled in ways that support both sets of needs; and 2) We need information management software that leverages such a robustly functional data model.

My sense is that these and other recent discussions are getting us close to (1), which is what all the philosophical/conceptual mumbo jumbo is about. After that reaches some sort of stable(ish) asymptote, then the real challenge of evolving our information management software (2) begins. My sincere hope is that we can make real progress on (1) without breaking existing software and protocols -- but I've often been accused of being overly optimistic.

Jegelewicz commented 3 years ago

@deepreef this is exactly what I have been thinking - see https://github.com/ArctosDB/arctos/issues/3630#issuecomment-868693446

afuchs1 commented 3 years ago

@deepreef

I wanted to call out a different "tension" that I think underpins a lot of the conflict/confusion on this issue: The tension between the needs of collection object managers, and the needs of biodiversity researchers. Obviously, many of us are primarily focused on managing physical objects in a collection (MaterialSample-centric perspectives), and many of us are primarily focused on the biodiversity research aspects of DwC data (Occurrence/Identification-centric perspectives). A few of us had one leg firmly planted in each camp.

A great summary of the issue :)

tucotuco commented 1 year ago

This issue has been superseded by https://github.com/tdwg/dwc/issues/451