tdwg / material-sample

A Task Group of the Observations and Specimen Records (OSR) Interest Group
2 stars 0 forks source link

Primary Deliverable - PreservedSpecimen definition #3

Closed Jegelewicz closed 1 year ago

Jegelewicz commented 2 years ago

Current Definition

http://rs.tdwg.org/dwc/terms/PreservedSpecimen

A specimen that has been preserved.

Please suggest changes/improvements in this issue.

See also https://github.com/tdwg/material-sample/blob/main/primary_deliverable/PreservedSpecimen.markdown

Jegelewicz commented 2 years ago

The current definition indicates that this is a sublcass of specimen. A term NOT included in Darwin Core.

The BCO indicates that this is a subclass of both specimen and its equivalent term Material Sample which is associated with Darwin Core MaterialSample.

Given this, it seems to me that PreservedSpecimen is a subclass of MaterialSample and a more appropriate definition would be:

A MaterialSample that has been preserved.

But are subclasses really necessary or should we apply preparation?

PreservedSpecimen = MaterialSample + preparation

dagendresen commented 2 years ago

Might the emergence of the DigitalSpecimen (and ExtendedSpecimen) concept also deserve consideration here?? How would a relationship between a MaterialSample+preservation and a DigitalSpecimen be modelled?? (and possibly maybe be easier with keeping a PreservedSpecimen concept??)

Jegelewicz commented 2 years ago

First I think we need a definition of DigitalSpecimen.

dagendresen commented 2 years ago

My question was if developing a definition for DigitalSpecimen (which maybe seems to be happening?) might be easier with a PreservedSpecimen concept declared. Not claiming that it is so, only asking.

tucotuco commented 2 years ago

I want to steer clear of misconceptions as much as possible in these issues, so I expect I am going to end up sounding pedantic or repetitive, or both. I don't mean to be, but since this whole exercise is for clarity, I am just trying to be rigorously correct insofar as possible.

There are no subclasses in the Darwin Core namespace. The only terms that are subclasses are terms adopted from Dublin Core. The word "specimen" in the current definition does not refer to a class defined in any vocabulary - it is just a word in the English language. The same is true for 'preserved'.

We might actually need a definition for 'Specimen' if that part of labels for the terms 'PreservedSpecimen', 'LivingSpecimen', and 'FossilSpecimen' is supposed to mean something. It would be great if we could re-use a good existing definition. @Jegelewicz mentioned 'specimen' from the Biological Collections Ontology (BCO), in which interpretations have already been made about what the relationships are between OBI:specimen, dwc:MaterialSample, dwc:PreservedSpecimen, dwc:FossilSpecimen, and dwc:LivingSpecimen.

The 'specimen' referred to in BCO is from the Ontology for Biomedical Investigations (OBI). The term's definition (in English) is, "A material entity that has the specimen role."

The 'material entity' referred to in the formal definition of 'specimen' is from the Basic Formal Ontology (BFO). Its definition (in English) is, "An independent continuant that is spatially extended whose identity is independent of that of other entities and can be maintained through time."

The 'specimen role' referred to in the formal definition of 'specimen' is defined (in English) as, "A role borne by a material entity that is gained during a specimen collection process and that can be realized by use of the specimen in an investigation."

The 'specimen collection process' referred to in the formal definition of 'specimen role' is defined (in English) as, "A planned process with the objective of collecting a specimen."

The 'specimen collection objective' referred to in the formal definition of 'specimen collection process' is defined (in English) as, "A [sic] objective specification to obtain a material entity for potential use as an input during an investigation."

The 'investigation' referred to in the formal definitions of 'specimen role' and 'specimen collection objective' is defined (in English) as, "A planned process that consists of parts: planning, study design execution, documentation and which produce conclusion(s)."

With some effort we finally get to the conclusion that a specimen is something (definitely) material that was (definitely) collected with the objective that it have the potential to be used in an investigation. Though the objective at the time of collection may not have been for the material entity to potentially be used in an investigation, if it ends up in a data store used for research that specimen objective is realized anyway, so for our purposes we can probably avoid worrying about the tricky part of original intention. If we can agree on this key point then we can avoid problematic (for Darwin Core) arguments about the panda in the zoo (a Darwin Core LivingSpecimen) not being an OBI:specimen because the collection objective was to make money, not to do any investigation. The fact that the panda CAN be used in an investigation is more important than the fact that that wasn't why it was collected.

A dwc:MaterialSample is defined as "A physical result of a sampling (or subsampling) event. In biological collections, the material sample is typically collected, and either preserved or destructively processed." Here we don't have any of the ontologically-defined terms to get at the serious semantics, but it sure sounds like a OBI:specimen except for the part "typically collected" as opposed to definitely collected. If we decide to align dwc:MaterialSample with OBI:specimen, we'll have to fix that.

More problematic are Organisms that were never collected. According to BCO, they definitely can not be 'specimens'. So a tree in the wild that is visited yearly for the purpose of taking measurements is not a 'specimen'. A mouse trapped to take blood samples and then released is not a 'specimen', but the blood samples are.

Back to the central topic of this issue, PreservedSpecimen. The rest of the problem is, "What is 'preserved'?" Given all the things we are calling PreservedSpecimen in practice, we apparently don't actually have to do anything special to preserve a PreservedSpecimen except to isolate it from its original context and put it in a collection (collect it). Happily, OBI:specimen doesn't concern itself with preservation either. Again, in practice, we use PreservedSpecimen to distinguish them from those collected things that are alive (LivingSpecimen) and those that are looong-dead (FossilSpecimen). Well, and also from those things that aren't physical (HumanObservation, MachineObservation, MaterialCitation).

So it looks like, with a little effort, we could align MaterialSample with OBI:specimen, make PreservedSpecimen and LivingSpecimen subclasses of MaterialSpecimen, and make FossilSpecimen a subclass of PreservedSpecimen, all with corresponding modifications to their definitions.

A DigitalSpecimen presents multiple problems under a OBI:specimen alignment scenario. First, it can't be an OBI:specimen because it is not a material entity. Second, it is not a concept in Darwin Core. Is it really relevant to the scope of this task group?

dr-shorthair commented 2 years ago

Thanks for chasing that down @tucotuco - a few circularities in the chain, but I think you got there in the end. So, in science, a 'specimen' is a material thing that is managed in order to be the subject of observations. I'm comfortable with that.

(It is a bit different to the definition I was given by Dimitris Koureas for 'specimen' in the context of (GLAM) collections, which focussed on its curation status, but so be it.)

As you know, I'm also interested in the term 'sample', whose definition is often assumed to overlap, but I suggest has a distinct semantics. The verb 'to sample' implies the selection of a part which is in some sense intended to be representative of a larger whole. While common use of the noun 'sample' does not necessarily always preserve this aspect, I think it is useful. It also helps us align with the use of the term in the social sciences, where a 'sample' is explicitly designed to be representative of a population.

So I wonder if we could be a bit more careful in keeping clear the distinct, both useful, semantics of 'specimen' - something managed to allow observations to be made - and 'sample' - something designed to be representative of a larger thing. I see them somewhat conflated above.

baskaufs commented 2 years ago

I am going to express an opinion here that may be considered heretical, unpopular, or perhaps both. I do not believe that we need to be committed to anchoring our definitions to BCO or other child ontologies of the BFO (Basic Formal Ontology). I am not opposed in principle to anchoring Darwin Core or TDWG terms to terms in other ontologies when that is convenient (and I've done it before), but I only think it should be done when the external term has a clear definition with the meaning that we intend.

There are two issues that I have with deriving the definition of "specimen" through the chain of logic that @tucotuco used above. The first issue is that the definitions that were used in that chain involve definitions that I consider questionable. I don't believe that a specimen must be collected (more on this in a separate comment) and I also don't believe that if it is collected that it must be used as an input during an investigation (too narrow of a restriction in my view). The second issue I have is the unnecessary complexity of having to go through a whole chain of defining roles and processes before you can define a class. Yes that's the way it's done in BFO, but I fail to see the value in it.

I have a much more practical view of how we should define classes. I take the position that the real utility of assigning resources to classes is because they share a common set of properties, not because they fit into some conceptual idea of what we think the class "means". This fits into my notion that what we come up with here needs to be use-case driven, not driven by a philosophical discussion. We care that instances of the same class share properties because we like to have tables for classes, and the properties appropriate for that class essentially define what we put in the columns of that table. That isn't to say that every instance of a class must have a value for every property that we imagine for that class, but if the way we define a class results in most of the instances having values for different sets of properties, then we need to re-think the way we are categorizing the instances into classes.

So I would like to see us focusing on what we need these classes for, how we would divide up properties among them, and use the simplest definitions that would allow an uninitiated user to have a relatively clear idea of what types of resources would fit in those classes (i.e. what kinds of things would logically fit together in a table represented by that class).

baskaufs commented 2 years ago

In my last comment, I asserted that I did not believe that being collected is a requirement for something to be a specimen. I'm going to paste in the result of a thought-experiment that I first did and wrote about in an email thread years ago. I articulated it in an email thread last April (quoted below) when the topic of specimens, organisms, and material samples came up again. The TLDR for what follows is:

I think the thing that has been most helpful to me in thinking about this kind of thing was trying to understand the distinction between an organism and a living specimen. The Bicentennial Oak as an organism that lives on Vanderbilt's campus: http://bioimages.vanderbilt.edu/vanderbilt/7-314 It is also a living specimen and has the catalog number 2-691 in the Vanderbilt Arboretum's database. Here is a pin oak tree that is clearly an organism: http://bioimages.vanderbilt.edu/baskauf/15823 However, it is not a specimen of any sort. What makes the Bicentennial Oak a specimen? Not because it was collected -- it predates the university by probably 100 years. Not because it was planted or prepared in any sort of way -- no one knows how it got there (it may have just been a tree in the native forest that didn't get cut down when the farmland was cleared). These characteristics (not planted nor collected by humans) are the same as the pin oak tree in the woods in Illinois. It isn't the act of assigning the tree an identifier either because I've assigned IRI identifiers to both of them. I believe that what fundamentally makes something a specimen is the act of accession: claiming that the specimen is part of a collection that someone takes responsibility for documenting and tracking. If you ask me about the current state of the Bicentennial oak, I could call up Facilities Information Services (who manages the tree database at Vanderbilt), ask them if the Bicentennial oak was still there, and get some kind of answer. On the other hand, if you asked me about the current state of that pin oak in Illinois, I would have no idea whether it was still there or not, or what kind of condition it was in. No one could, because it was not accessioned into a collection.

So I feel like I have a pretty clear idea what a dwc:LivingSpecimen and dwc:PreservedSpecimen are. They are both specimens as defined above, with living specimens being alive and preserved specimens being dead and somehow preserved. In my view (but not included formally in the definition) another necessary condition for something to be a Darwin Core specimen is that it must have been derived in some way from an organism. It might be the whole organism (dead or alive) or part of the organism (as in a twig with leaves on an herbarium sheet). I think that it is also implied that the specimen somehow physically includes part of the organism (or a transformed part of the organism in the case of fossils), although that would be blurred a bit by fossil specimens like caprolites, fossilized burrows, foodprints, etc. In Darwin-SW, there is an assumed "derivedFrom" relationship between a specimen and an organism. In Darwin-SW, a specimen can be evidence for either occurrences of the organism or identifications of the organism.

A specimen record may be maintained in a collection even if the item is no longer extant. I can still tell you things about http://bioimages.vanderbilt.edu/vanderbilt/6-173, like it was once a state champion tree, even though it was cut down in 2008. The situation is similar with herbaria that were flooded or burned down. Records may exist of specimens even if the specimens no longer exist. Presumably the managers of those collections could tell you that they were destroyed.

OK, so what is a MaterialSample then? I am much more fuzzy about this. It seems that the two necessary conditions are being a material thing (e.g. images don't qualify), and being sampled from something. There is no assumption that it is derived from an organism as air or water samples free of organisms could be material samples. I guess that it has something similar to the accession component that I used to define specimens, although I'm not sure about that. If the material is not destructively sampled, the DwC definition implies that it should be preserved, although I'm unsure that is the case for every material sample, e.g. ones that may be thrown out after measurements or documentation is complete. There are also samples that were obviously sampled for the purpose of being destroyed - in my mind that is a difference from specimens since I don't think specimens are generally intended to be destroyed intentionally. So a material sample can be derived from an organism, but doesn't have to be. A material sample can be a specimen, but doesn't have to be. A specimen does not have to be a material sample -- clearly the Bicentennial Oak was never the result of a sampling event. A material sample might be preserved but doesn't have to be. Honestly, the definition of MaterialSample is so fluid that it is hard for me to see why it is useful to assert that something is an instance of it.

dr-shorthair commented 2 years ago

I suggest that we move the second half of @baskaufs response here into a separate thread. It is about methodology, which deserves its own space for discussion. Yes, it was triggered by the PreservedSpecimen issue, but there is a risk of swamping the core of this issue.

dr-shorthair commented 2 years ago

the key characteristic of a sample is that it was collected.

As I argued above there is an additional layer to the key characteristic of a sample, that it is representative of the thing from which it was collected.

ghwhitbread commented 2 years ago

I think this is also a key characteristic of most herbarium specimens, especially good specimens, which are selective samples of an individual or population (size dependent), that are representative of the thing from which they are collected. We could say that a specimen is always the result of a sampling event, even Steve’s oak. Maybe the distinction is more one of purpose or fate: synthesis of specimens; analysis of samples. One persistent the other ephemeral (unless it’s turned into a specimen).

Jegelewicz commented 2 years ago

@baskaufs you had me at accession. This is something that a lot of biological collections are woefully bad at understanding/doing. I have spent the last six years attempting to get biological collections all over the map to properly accession their objects with minimal success. Although I have issues with the word "specimen" I do think your argument for defining a specimen (whether living, preserved, fossilized or created - think nests) makes it very clear what is being discussed.

I still vote that we lose the "subclasses" and stick with one thing - specimen with context (preserved, living, fossil) provided elsewhere.

If this were true - then catalogNumber would be applied to specimens, which is exactly what museums do right now IMO.

jmacklin commented 2 years ago

Sorry, a little behind now in the thread. I started earlier and just finishing now...

I want to weigh in my support for what Steve has nicely outlined above. Our group has been doing a lot of thinking about this over the past while in the context of data modeling for the DINA collection management system we are currently developing. The definitions of these terms and the relationship they have to each other is fundamental. We have taken a "samplistic" approach to our model where the materialSample is the core object and not the specimen. It is also important to state that our model deals with both living and preserved samples as well as a great variety of "mixed" environmental samples. For us, the sample is what is collected and then a process may define it as a specimen (living or preserved) or otherwise. Samples can be sub-sampled ad infinitum and ultimately be destroyed. Certain samples are typed and we are still in a process of considering how far to extend this. We are trying to abide by the KISS mantra as much as possible ;-) An interesting use case is a common practice in botany which illustrates a few of the challenges. Steve highlighted this with his tree example. We collect one to many samples of a tree in the field, sometimes even resampling through time. The tree "parent" is never collected but pictures of it may be taken as evidence of its existence (an observation). The samples once processed back in the lab may be accessioned into the herbarium and become specimens. These samples now specimens have relationships that are maintained between the "parent" and "children". The philosophical part here though is the connection back to the asserted parent based on evidence provided by the image(s), or in many legacy cases with no evidence. We could also imagine taking a cutting or fruit and growing it up in an Arboretum as a living specimen this sample would relate back to the observed tree also.

There are many more use cases that challenge the definitions some more edgy than others but I think it is important to collect these as has been suggested to help discuss the terminology and definitions.

RogerBurkhalter commented 2 years ago

I very much agree with @baskaufs and the elegant way presented. In paleontology we do much the same. When collecting I gather samples that may be split into accessioned specimens. Portions of the original sample may be sent for geochemical analysis for dating or environmental proxies (becoming observations/interpretations), portions may go to micropaleontology or palynology (becoming additional specimens in different collections). We do not collect the entire rock outcrop, much as botanists do not collect the entire tree, differences are only a matter of scale.

dshorthouse commented 2 years ago

The only subtlety I'd add to @baskaufs's note above about Classes or instances and also mentioned by @jmacklin is the importance of relationships in defining how to use and interpret a MaterialSample. In other words, it depends who is the stakeholder asking the questions because this dictates how to disentangle the bits or roll together the bits so as not to produce spurious responses.

Some MaterialSamples participate directly in a collecting event whereas the collecting event for derivative MaterialSamples must be deduced by walking up the provenance chain of its child:parent relationships. Think honey bee on a pin vs leg plucked from the bee vs DNA extracted from the blood from the leg vs pollen lifted off the pollen basket from the leg. All four are nodes that can be called PreservedSpecimen (or MaterialSample sensu lato) and the directed edges between them are verb-based processes that led to their existence. They could all have values for common properties because they are instances of a Class MaterialSample (eg materialSampleID, parentMaterialSampleID, perhaps catalogNumber). However, this is all rather pedestrian. The whole point to doing this is to declare what MaterialSample bears the responsibility for the joins to other concepts as required by a stakeholder. That responsibility may span organizations and as such, presupposes much more coordination between ourselves than presently exists. A classic example is Occurrence records generated by organizations whose sole responsibility it is to generate DNA barcodes from tissue samples extracted from specimens owned by and accessioned by other organizations. An Occurrence record generated directly from a DNA barcode MaterialSample is nonsensical and does harm to science whose methods include species distribution models when the once whole honey bee now (or later shared) more logically participates in an Occurrence record. The once whole bee in this scenario has (or more likely had) a direct association with a collecting event. The DNA barcode has none. This is also true for the pollen on the leg, though we can quibble about how we deduce who is the recordedBy for the pollen on the leg for the parent MaterialSample that bears the association with the collecting event – the bee or the collector of the bee. In entirely different scenarios, tissues can have direct relationships with a collecting event (eg bat wing punch).

So...

I believe part of what we need to do is not limit ourselves to properties but to additionally challenge ourselves to include the edges, or relationships, in how we come to define PreservedSpecimen and other ilk we've traditionally dumped in basisOfRecord. We want to avoid spurious responses to questions at all costs. Alternatively, is it useful to consider subclasses, CollectedMaterialSample, DerivedMaterialSample in an attempt to help denormalize the relationships and communicate to stakeholders what they can expect to see attached to a MaterialSample, a collecting event (or not) in this case.

dr-shorthair commented 2 years ago

I still see sample and specimen being used interchangeably in this discussion. Can we please clarify that these words relate to different roles and intentions:

A sample is not necessarily a material thing, social science samples are often not. A sample might not be accessioned (particularly if it will be destroyed as part of some analytical process).

I think specimens are always material things.

dshorthouse commented 2 years ago

@dr-shorthair I do not know where on the spectrum between sample and specimen lies one of the 6 legs removed from a bee to get the blood to get the DNA barcode. The bee is destructively sampled and the leg is destructively sampled. "Sampled" here is the typical vernacular for such an action. There are moments in time throughout a workflow like this when intentions may not be clear, especially when scale comes into play. It is more economical to completely & destructively grind-down a chalcid wasp to get the blood to get the DNA barcode than it is to completely & destructively grind-down a giant stick insect to get the blood to get its DNA barcode. Nonetheless, many will say the chalcid was sampled as was the stick insect – the former in its entirety, the latter most of the tarsi on one leg. If the whole leg were removed from the giant stick insect in pursuit of blood, that seems wrong and so, best make a new specimen stored in close proximity to the once whole specimen but rarely would anyone call this part of the leg a specimen. So, what do you call it?

If we look outside our community to seek clarity between sample and specimen (eg medical community), I seriously doubt we'll reach any sort of epiphany. Might it be better to think of these as states and processes than to affix subjective, fluid intentions that vary with technologies and questions?

dr-shorthair commented 2 years ago

There is no spectrum. These are orthogonal concerns. An individual can be both sample and specimen, or either, or neither. It depends on the roles and intentions.

dshorthouse commented 2 years ago

Do we really need to invoke the circular, self-imploding definitions as outlined by @tucotuco above? What if instead of roles and intentions for our use cases, we invoke relations to executed actions or processes that altered the state of the MaterialSample so that we remain steadfastly neutral in our approach? Intentions are nebulous, between-the-ears of one or many people, whereas an executed action is an explanatory statement, preferably a verb expressed in the past tense that is underpinned by evidence. What did we do to a MaterialSample that permits us to call it a PreservedSpecimen and that differentiates it from all other generic MaterialSamples? At the risk stepping in front the bus, why do we need PreservedSpecimen at all? Is there functional harm in using PreservedMaterialSample in its place or some other denormalizing prefix that alludes to an executed action comparable to the CollectedMaterialSample I used above? Alternatively, a MaterialSample may have many deductive tags to describe the incoming and/or outgoing edges whose presence or absence define what we can do with the MaterialSample or what we're likely to find joined to it: wasDerived, wasCollected or somesuch.

deepreef commented 2 years ago

@tucotuco:

A dwc:MaterialSample is defined as "A physical result of a sampling (or subsampling) event. In biological collections, the material sample is typically collected, and either preserved or destructively processed." Here we don't have any of the ontologically-defined terms to get at the serious semantics, but it sure sounds like a OBI:specimen except for the part "typically collected" as opposed to definitely collected. If we decide to align dwc:MaterialSample with OBI:specimen, we'll have to fix that.

Trust me, I love it when you (or anyone else, for that matter) talks pedantic . One of the joys (for me) of being an ICZN Commissioner, I suppose...

With regard to the quote above, I don't think there should be much resistance in DwC-land for altering the definition of MaterialSample to eliminate the word "typically" from just before "collected". I've always thought of the "Material" part of MaterialSample to represent an object that is (definitely) physical in nature (not digital or conceptual), and the "Sample" part to imply "taken from" (as in, "collected"). Whether it's a whole (or multiple) organism "taken from" nature, or a piece of an organism "taken from" a whole organism, or a mixture of organisms, pieces of organisms, and biological and non-biological substances "taken from" the environment and placed in a jar -- I'm struggling to think of an example of a dwc:MaterialSample instance that was not arguably "collected" at some point in its pathway to become a MaterialSample. Perhaps someone else can provide an example?

So a tree in the wild that is visited yearly for the purpose of taking measurements is not a 'specimen'. A mouse trapped to take blood samples and then released is not a 'specimen', but the blood samples are.

Those are good examples, but in both cases I'm happy to regard the tree and the mouse as instances of dwc:Organism, and only regard the physical portion(s) of the tree/mouse that were/are "collected" to represent instances of dwc:MaterialSample. In fact, this helps clarify in my mind where that boundary is between dwc:Organism and dwc:MaterialSample (one of my long-standing open questions on this issue).

Having said that, I fully agree with @baskaufs on:

I do not believe that we need to be committed to anchoring our definitions to BCO or other child ontologies of the BFO. And also: the real utility of assigning resources to classes is because they share a common set of properties, not because they fit into some conceptual idea of what we think the class "means".

But I should point out that in order to understand what properties apply to what "things", there needs to be a robust conceptual underpinning to what the "thing" is. A recent example (still open to debate) is whether an instance of dwc:Taxon can participate directly in an instance of dwc:Occurrence, or whether only instances of dwc:Organism participate in instances of dwc:Occurrence, and dwc:Taxon instances are only secondarily linked to the dwc:Organism through instances of dwc:Identification. Without robust "conceptual idea[s] of what we think [these classes] 'mean'"... we'll never have a clear sense of what "things" share which properties.

As for the (very good) commentary about "accession" being the definitive criterion for a "specimen", vs. "collected" being the basis of "sample", I have to say I'm pretty squeamish about this perspective. We don't have a definition for "accession", and in our Museum we're re-thinking what that word means. Traditionally I have always thought of it as representing some sort of claim of ownership. In that sense, I feel it would create more ambiguities in DwC-space than it would resolve. It's also a very Museum-centric perspective.

Personally, I think the term "specimen" and all of its qualified versions create more problems for using DwC than they solve. It seems to me that it's a legacy term carried forward from the early days of DwC (pre-Occurrence). Going forward, I'd be happy to deprecate all xxxSpecimen terms, and instead focus on the distinction between Organism and MaterialSample. I need to think more about @dr-shorthair comments regarding sample as a representative of a larger thing, but I'm also mindful of the need for allowing MaterialSample to accommodate non-biological materials as well (hence unrelated to Organism). In our implementation, we use the term Individual to represent a superclass of both Organism (biological individuals) and non-living physical units, so it's easy to represent instances of MaterialSample as both biotic and abiotic (and mixtures of both, such as cultural objects made from parts of organisms).

OK, I'm rambling. Sorry.

Back to the comment from @baskaufs : I'm perfectly content in modelling the Bicentennial Oak as an instance of dwc:Organism, and any collected samples derived therefrom as instances of dwc:MaterialSample, but I'm much less comfortable treating the unsampled organism in nature as itself an instance of dwc:MaterialSample, even if happens to be claimed by Vanderbilt through its own particular accessioning system. How is this different from, say, a shark that is tagged in nature and tracked over time? Should that also be treated as a MaterialSample? Or would it depend on whether our Museum Registrar decided to claim ownership of it through a record in the Accessions database?

OK, so what is a MaterialSample then?

In my mind, it's pretty close to what @tucotuco outlined for the OBI "specimen".

@jmacklin :

We collect one to many samples of a tree in the field, sometimes even resampling through time. The tree "parent" is never collected but pictures of it may be taken as evidence of its existence (an observation).

This is super easy to deal with, once you have a system that manages instances of Organism (or, more generally, Individual) separately from MaterialSample. I wouldn't force the tree still living in nature to be represented as an instance of MaterialSample (or "specimen") simply to accommodate a parent-child relationship in a data model. That may help from the perspective of KISS, but feels more like a shoe-horn solution that ultimately disempowers the data.

That said, I'm still not entirely clear in my mind on when an Organism ends and a MaterialSample begins. I'm also unsure about whether the same collection of matter can be both at the same time (most obviously in the cases of what we think of now as instances of LivingSpecimen) I could go along with that as long as the living organism is itself maintained in a curated context, which may or may not be the case for the Bicentennial Oak that Steve refers to.

OK, I'm rambling again. Let me sum up this way:

Organisms, in my mind, are pretty-well defined (though not perfectly). They are conceptual in scope, but are manifested by physical material. That physical material often begins when a sperm fertilizes an egg (or two cells divide, in asexual organisms), and persists until the organism is no longer living, or physically disintegrates. This can extend to sets of individual organisms belonging to the same taxon. A broader definition is needed to accommodate abiotic instances of "individual" (like I said, not perfect), but I can see a solution here. Properties of an Organism apply to that organism throughout its entire lifespan.

Instances of MaterialSample are aggregates of physical material that are extracted ("collected") from the natural environment, and held in the custody of humans. Following the suggestion of @baskaufs that instances of a class should be defined by shared properties, these are physical items that may be preserved or destroyed, curated or accessioned, borrowed and loaned, subsampled or aggregated to yield new instances of MaterialSample, and otherwise cared for and/or maintained in some way by humans.

In this context, things like PreservedSpecimen, FossilSpecimen and LivingSpecimen are not helpful as distinct classes, but represent various property values in further classifying/documenting instances of MaterialSample. That's why I think we should retire these three "pseudo-classes" in DwC, and reframe them in the context of properties of instances of MaterialSample.

At least that's how I see it.

deepreef commented 2 years ago

Sorry... I see more posts after the ones I read when formulating my previous post. I'll try to keep this one shorter and less arm-wave-y.

@dshorthouse :

Some MaterialSamples participate directly in a collecting event whereas the collecting event for derivative MaterialSamples must be deduced by walking up the provenance chain of its child:parent relationships.

That depends on how you define "collecting event". We've been tracking actions like "extracting a tissue sample from a collected specimen" as dwc:Event instances, so every MaterialSample is associated with an explicit collecting event. Of course, this doesn't negate your point, which is the very practical question of "where was the organism from which this tissue sample was extracted living at the time it was encountered in nature?" -- which may have absolutely nothing to do with the place and time when the tissue sample was extracted from the parent MaterialSample instance. So you do have to walk up the provenance chain. I've learned to embrace recursive hierarchical relationships, but I may be an anomaly here.

But more fundamentally, it's forced me to consider that MaterialSample instances intersect directly with Event instances, but not the same way that instances of Organism intersect with Event instances (aka Occurrence). This is where my brain starts to (re)turn to mush.

But most importantly:

At the risk stepping in front the bus, why do we need PreservedSpecimen at all?

Indeed! I've made my answer to that question pretty clear.

Jegelewicz commented 2 years ago

why do we need PreservedSpecimen at all?

I don't think we do. See my very first comment.

baskaufs commented 2 years ago

A response to @deepreef's comment "there needs to be a robust conceptual underpinning to what the 'thing' is.": I think this depends on what "robust conceptual underpinning" means. In the example you gave, you essentially laid out the position of "Taxon" in a graph: can it be connected directly to Occurrence or must it be connected to an Identification, then Organism, then Occurrence? That's different than having a philosophical argument about Taxa (a popular sport in TDWG). Rather, it is a statement about how one would create a Linked Data graph or a relational database ER diagram in order to satisfy some use case we have for organizing our data.

With respect to ambiguity of what "accessioning" means, I'd take a practical approach. If we apply a catalog number to a thing, we have accessioned it and it's a specimen. That has an implication that if we create a table of specimens, it will have a column for catalog number. The pin oak tree I photographed in the woods doesn't have a catalog number, therefore it's not a specimen. It probably wouldn't fit in a table of living specimens because it wouldn't share other columns that I might have for living specimens like when they were last fed/fertilized, what section of the zoo/arboretum they are located in, etc. Do we need to differentiate between living specimens and preserved specimens? Well, do they have the same column headers? Some may be the same (catalog number) but others may not. Preserved specimens won't have the columns I just mentioned -- they will have columns like "preparations", "lot", etc.

The point of what I'm getting at here goes back to what I've said about making this use-case driven. Rather than deciding up front precisely how to define the classes and what properties should be associated with them, set up some scenarios: if we defined the class this way, we'd assign these properties and there would be these relationships (parent/child, superclass/subclass) with other classes.

Jegelewicz commented 2 years ago

Well, do they have the same column headers?

Does that really matter? We are discussing an exchange standard, not database columns. We need to keep in mind that the terms we are creating/amending are there to be used by ALL biological collections and that they should remain interchangeable no matter the "type" of collection. My understanding is that these terms should be defined sufficiently so that both the zoo and the herbarium understand what is expected in catalog number. If they have nothing that fits there, they put nothing.

baskaufs commented 2 years ago

One other point that I think is important to recognize here is that the task we have undertaken here, essentially figuring out what the appropriate controlled values are that should be used for dwc:basisOfRecord, is more difficult and complicated than what is typically the case for developing controlled vocabularies. In a "normal" controlled vocabulary, such as the pathway CV, the terms in the controlled vocabulary are skos:Concepts and their purpose is primarily to help human users decide on what concept is the best for categorizing the described resource so that it can be screened or searched for. dwc:basisOfRecord is nearly unique among terms requiring a controlled vocabulary in that its values are classes. In this way, it's similar to dcterms:type or rdf:type.

Because of this unusual situation, we have the additional burden of when creating the controlled vocabulary of creating terms that not only are useful for screening and searching for the described resource, but also that can be used to assert the type (i.e. class membership) of the described resource. That is why this conversation is devolving into a much more complicated conversation than we had when creation other controlled vocabularies, such as those for dwc:establishmentMeans, ac:variant, etc.

The reason I mention this is that it may seem that we are going beyond the scope set for the task group, i.e. to simply come up with definitions for terms in a "bag of terms". The problem here is that the implications go beyond what's normal for a controlled vocabulary since they may influence stuff like how we design and inter-relate database tables, etc.

Jegelewicz commented 2 years ago

I'll add to the above that there should not be ambiguously similar terms because then the zoo will record its gorilla as an "Organism" and the botanical garden will record its tree as a "LivingSpecimen".

baskaufs commented 2 years ago

@Jegelewicz I'm not sure if it matters. We have been looking towards the eventual task of organizing DwC terms within classes where they best fit. That's effectively telling people something about organization of tables and columns.

I think the question of whether we just need "specimen" or if we need to subclass it into living, preserved, and fossil specimens depends on the degree of overlap of the properties we'd imagine for instances of those classes. Imagine I'm an aggregator and I want to aggregate records of all three kinds of specimens. If most of the properties overlap, then I'd just use one "specimen" table and just leave columns blank if they don't apply to a particular kind of specimen. However, if it turns out that almost none of the columns overlap, then I'd be better off having three separate tables.

baskaufs commented 2 years ago

Why would the zoo record its gorilla as an organism and not a LivingSpecimen? You lost me there.

Jegelewicz commented 2 years ago

How are they different?

Organism = A particular organism or defined group of organisms considered to be taxonomically homogeneous.

LivingSpecimen = A specimen that is alive.

Having dealt a lot with attempting to "de-specimen" terms in Arctos, I know there are collections that see the term "specimen" as inappropriate. Referring to Huerfanita as a "specimen" would not sit well with a lot of people and I could see a zoo just refraining from the use of "LivingSpecimen" and even "Material Sample".

deepreef commented 2 years ago

A response to @deepreef's comment "there needs to be a robust conceptual underpinning to what the 'thing' is.": I think this depends on what "robust conceptual underpinning" means. In the example you gave, you essentially laid out the position of "Taxon" in a graph: can it be connected directly to Occurrence or must it be connected to an Identification, then Organism, then Occurrence?

Yes, that was one example, following the lead of @dshorthouse in characterizing things by their relationships with other things. I should have also provided a properties-based example, such as whether dwc:sex is a property of an Occurrence or an Organism. Because this property is not fixed within the scope of a single instance of Organism, it should not be organized in the Organism class; whereas organizing it within the Occurrence class helps clarify that instances of the latter represent an instance of an Organism in 4-dimensional space (i.e., in the context of an Event). At least, that's how I think of having definitions of classes emerge from the set of shared properties (and vice-versa?)

With respect to ambiguity of what "accessioning" means, I'd take a practical approach. If we apply a catalog number to a thing, we have accessioned it and it's a specimen.

That's not how it works in our Museum. Accessioning (and associated Accession numbers) are issued & assigned by the Registrar, whereas Catalog Numbers are issued & assigned by collection managers. In theory, we shouldn't assign any catalog numbers to material that is not accessioned, and all accessioned things should be assigned a catalog number -- but of course in practice, this is most definitely not the case. Also, in most cases the relationship of Catalog Numbers to Accession numbers is many-to-one (i.e., one Accession number encompasses multiple cataloged units). Thus, in our Museum at least, Accession is a very poor representation of "Specimen".

The point of what I'm getting at here goes back to what I've said about making this use-case driven. Rather than deciding up front precisely how to define the classes and what properties should be associated with them, set up some scenarios: if we defined the class this way, we'd assign these properties and there would be these relationships (parent/child, superclass/subclass) with other classes.

I agree with this in principle, but it's a little bit circular. For example, a preserved specimen would have some set of columns for preparation properties. If those properties are restricted to preservation methods, then they might not be relevant to a LivingSpecimen. On the other hand, if the living specimen has a microchip inserted into it, or its claws are trimmed, or a venom sac of a snake has been surgically removed, then maybe those columns would (potentially) be relevant to a LivingSpecimen.

More generally, not all instances of a class of thing will have values for all properties. Some instances of Occurrence will have legitimate values for reproductiveCondition or behavior; others not. Does that mean we should parse them out into different (sub)classes? A strict properties-based delineation of classes and subclasses is not necessarily practical for most use cases.

On the basisOfRecord issue, I think the problem is that the "pseudo-classes" (my term) of LivingSpecimen, PreservedSpecimen, and FossilSpecimen feel like attempts to overload information into the basisOfRecord values. I see all of these as slight variants of MaterialSample, and should be distinguished from each other through other properties besides basisOfRecord. Put another way, seeing values of "Occurrence" vs. "Taxon" vs. "Location" vs. "GeologicalContext" (etc.) in basisOfRecord clearly indicates what "kind" of record we're talking about, where each "kind" has no (or almost no) overlapping properties. These are fundamentally different kinds of things. Declaring basisOfRecord as "PreservedSpecimen" vs. "LivingSpecimen" seems to me to be attempting to delineate "MaterialSample representing an organism that is now dead and preserved" vs. "MaterialSample representing an organism that is still alive but in captivity". Those aren't fundamentally different kinds of things -- most of the properties will be shared across each of them, with only a few properties not shared.

In an analogous way, HumanObservation and MachineObservation don't strike me as fundamentally different classes of thing (e.g., Taxon vs. Location), but rather as "Observation primarily recorded by a human" vs. "Observation recorded by a non-living mechanical or electronic device". The distinction between these two things is not a class-level distinction in my mind -- it's just a difference in the nature of value included in the recordedBy property of an Occurrence. These should not be framed in DwC as classes, or represent legitimate values for basisOfRecord. Rather, they should both be instances of Occurrence, with differing values of recordedBy. [Yes, I know it's not quite that simple, but I hope my basic point is clear.]

smrgeoinfo commented 2 years ago

Interesting discussion. Here's an analysis of what I've gleaned from the discussion here. I agree that the important question is what are necessary properties of the sample/specimen types that are useful for TDWG

Properties

Jegelewicz commented 2 years ago

The Task Group has decided to deprecate this term and perhaps integrate it into controlled vocabulary for materialSampleType. This is a draft of the dwc issue to submit.

Term change

Current Term definition: https://dwc.tdwg.org/list/#dwc_PreservedSpecimen

Proposed attributes of the new term version (Please put actual changes to be implemented in bold and ~strikethrough~):

Jegelewicz commented 1 year ago

Based upon requested change in the MaterialSample definition change and taking the opportunity to make this definition more than an extended version of the term. I will add a revised definition for this term to the review package.

Current Definition

PreservedSpecimen - A specimen that has been preserved.

Proposed Definition

PreservedSpecimen - A material entity that represents an entity of interest in whole or in part that is the preserved remains, impression, or trace of any once-living thing from the current geological age.

smrgeoinfo commented 1 year ago

that sounds like the definition of fossilSpecimen...

Jegelewicz commented 1 year ago

that sounds like the definition of fossilSpecimen...

PreservedSpecimen - A material entity that represents an entity of interest in whole or in part that is the preserved remains, impression, or trace of any once-living thing from the current geological age.

RogerBurkhalter commented 1 year ago

@Jegelewicz, I get the comments and see your point of being absolutely consistent. I do, however, think current geological age could be misinterpreted as it actually points to the actual (defined) current geologic age, which may soon change based on discussions at the ICS (International Commission on Stratigraphy, https://stratigraphy.org/) which is scheduled to vote this year on the addition of the Anthropocene Series/Epoch (dating to perhaps mid-20th century). Note this is the same rank as the Holocene and will be placed after the Holocene. If agreed upon, the term would be formalized and placed on the geologic timescale. This could cause confusion in collections as pre-mid-20th century biological collection entities would be considered fossils by a strict interpretation of the proposed definition, i.e. something collected in the 1930s is now considered a "fossil"? Geologic time is precisely defined into global units that are internationally recognized. Pinning something to a geological age may cause problems in the long term. Something more generic such as "recent past", "modern age", or something less defined.

smrgeoinfo commented 1 year ago

Without reviewing this long thread, I am still thinking that 'preservedSpecimen' should be a specimen that has had some preservation activity applied by humans, irrespective of how old it is. If the definition of fossil specimen is like https://w3id.org/isample/vocabulary/specimentype/0.9/fossil "Specimen is the remains of one or more organisms preserved in rock; includes whole body, body parts (usually bone or shell), and trace fossils. An organism or organism part becomes a fossil when it has undergone some fossilization process that generally entails physical and chemical changes akin to diagenesis in a sedimentary rock. Trace fossils are manifestations of biologic activity preserved in a rock body (typically sedimentary), without included preserved body parts"

Possibilities: biological specimen-- no human preservation activity, no natural chemical or physical changes that preserve the specimen preserved specimen -- a specimen that has undergone some human engineered treatment to preserve it in its current (collected) state. The specimen might be a fossil... Fossil specimen -- (see above)

baskaufs commented 1 year ago

I am going to resist writing a long comment here since I've already done that in an email to Teresa and others. So I will be brief and just say that I think the current definition for this (as well as the other specimen types) is inadequate because being an "entity of interest" does not make something a specimen. I have "entities of interest" (fossils and plant specimens) sitting in boxes in my garage. In at least one case, it's been photographed and are recorded at GBIF as an occurrence. But that doesn't make them specimens because they haven't been accessioned into a institutional collection.

The definition needs to include "in a collection" or something similar in order for it to actually define a specimen.

smrgeoinfo commented 1 year ago

sounds like we're in agreement with DISSCo "In this context, a specimen is 'an object that is designated by an owning/holding institution to be part of a collection, that has or will be assigned an id that is unique in that collection'" (interpret this condition as equivalent to 'isCurated'), and with https://github.com/tdwg/material-sample/issues/3#issuecomment-908726472

All of the specimen types should inherit this condition.
Specimen and materialSample overlap but are not equivalent. 'entity of interest' (or sampledFeature) is not a condition for 'specimen', it is for materialSample. Accession is not a condition for materialSample, it is for specimen. By the DISSCo definition a 'specimen' is not necessarily a materialEntity. Is that different from TDWG/DWC intention?

Does that work?

baskaufs commented 1 year ago

@smrgeoinfo yes, this is very much in line with my concern. The one thing I'd add is that assigning an identifier is not a key piece in the distinction because even destructively sampled entities might be assigned a globally unique identifier. The collection part is what's critical.

smrgeoinfo commented 1 year ago

wouldn't a destructively sampled entity be a sample (maybe a materialSample), not a specimen? Or can something that no longer exists be part of a collection (more like the DISSCo usage), in which case it would have to be identified in some fashion.

baskaufs commented 1 year ago

wouldn't a destructively sampled entity be a sample (maybe a materialSample), not a specimen? Or can something that no longer exists be part of a collection (more like the DISSCo usage), in which case it would have to be identified in some fashion.

Yes, that's my point. It has an identifier but isn't a specimen. So it could be classified as either a MaterialSample or MaterialEntity, but not as any specimen type.

smrgeoinfo commented 1 year ago

I see-- 'part of a collection (and materialEntity?)' is a sufficient condition for specimen. Is 'has or will be assigned an identifier' a necessary condition for specimen?

maybe 'has or could be assigned an identifier' is the necessary condition-- implication being that there is some criteria that can establish the identity of a specimen.

Jegelewicz commented 1 year ago

The Task Group participants today decided that changing this definition will raise issues that are beyond the scope of the Task Group. Ideally, the next step will be a Task Group to more fully develop Material terms with an extension.

Suggest removing this from the review package.