tdwg / material-sample

A Task Group of the Observations and Specimen Records (OSR) Interest Group
2 stars 0 forks source link

Primary Deliverable - MaterialSample definition #2

Closed Jegelewicz closed 4 months ago

Jegelewicz commented 2 years ago

Current Definition

http://rs.tdwg.org/dwc/terms/MaterialSample

A physical result of a sampling (or subsampling) event. In biological collections, the material sample is typically collected, and either preserved or destructively processed.

Please suggest changes/improvements in this issue.

See also https://github.com/tdwg/material-sample/blob/main/primary_deliverable/MaterialSample.markdown

See also MaterialSample terms Google Sheet

tucotuco commented 2 years ago

Additional related commentary in https://github.com/tdwg/material-sample/issues/3#issuecomment-903305155.

Jegelewicz commented 2 years ago

From https://github.com/tdwg/material-sample/issues/3#issuecomment-904297225

OK, so what is a MaterialSample then? I am much more fuzzy about this. It seems that the two necessary conditions are being a material thing (e.g. images don't qualify), and being sampled from something. There is no assumption that it is derived from an organism as air or water samples free of organisms could be material samples. I guess that it has something similar to the accession component that I used to define specimens, although I'm not sure about that. If the material is not destructively sampled, the DwC definition implies that it should be preserved, although I'm unsure that is the case for every material sample, e.g. ones that may be thrown out after measurements or documentation is complete. There are also samples that were obviously sampled for the purpose of being destroyed - in my mind that is a difference from specimens since I don't think specimens are generally intended to be destroyed intentionally. So a material sample can be derived from an organism, but doesn't have to be. A material sample can be a specimen, but doesn't have to be. A specimen does not have to be a material sample -- clearly the Bicentennial Oak was never the result of a sampling event. A material sample might be preserved but doesn't have to be. Honestly, the definition of MaterialSample is so fluid that it is hard for me to see why it is useful to assert that something is an instance of it.

dr-shorthair commented 2 years ago

Also see https://github.com/tdwg/material-sample/issues/3#issuecomment-905030133:

A sample is not necessarily a material thing, social science samples are often not. A sample might not be accessioned (particularly if it will be destroyed as part of some analytical process).

I think specimens are always material things.

Jegelewicz commented 2 years ago

https://github.com/tdwg/material-sample/issues/3#issuecomment-905108910

Instances of MaterialSample are aggregates of physical material that are extracted ("collected") from the natural environment, and held in the custody of humans. Following the suggestion of @baskaufs that instances of a class should be defined by shared properties, these are physical items that may be preserved or destroyed, curated or accessioned, borrowed and loaned, subsampled or aggregated to yield new instances of MaterialSample, and otherwise cared for and/or maintained in some way by humans.

deepreef commented 2 years ago

Thanks for bringing that here! (And sorry for not thinking to do so myself). Just to be clear, though, I was not intending to propose a formal definition; but rather I tried to capture my own thinking of what a MaterialSample is, in "plain English"(ish).

Jegelewicz commented 2 years ago

But it is good!

smrgeoinfo commented 2 years ago

a materialSample is an object separated from the material world, intended to be representative of some sampled feature.

Samples are typically collected with the intention of making measurements/observations on the sample that will characterize the sampled feature. A sample might undergo some curation process and become a specimen (as well as a sample). A MaterialSample might be an aggregation of material unified by containment in some container, e.g. rock powder in a bag, water in a bottle, blood in a syringe. A MaterialSample might be a self-connected object like a leaf from a tree, or a piece of rock from an outcrop. MaterialSamples can be derived from other material samples, e.g. the legs from a grasshopper, or zircon crystals from a rock sample. The sampled feature can be hierarchical; e.g. a material sample might be a leaf from a particular oak tree (organism) considered the sampled feature, or the sampled feature might be considered the taxon to which that individual oak tree belongs.

dr-shorthair commented 2 years ago

A sample might undergo some curation process and become a specimen (as well as a sample).

Maybe: "A sample might undergo some curation and accession process and become a specimen (as well as a sample)."

RogerBurkhalter commented 2 years ago

I recall a previous comment about accession and asked our Registrar about the legal meaning, which is, in part, legal ownership. Some samples and specimens we can never own, i.e. fossils/archaeological remains from US Federal lands, other countries have similar laws, but we do reposit them. I suggest "accession or reposit".

Jegelewicz commented 2 years ago

Related requests for new terms that we should not lose sight of:

New Term - materialSampleType New Term - parentMaterialSampleID New term - environmentalMaterial New term - organismPart New term - preservationMethod

deepreef commented 2 years ago

Wouldn't the controlled vocabulary examples listed for environmentalMaterial also be among the controlled vocabulary examples for materialSampleType?

Or have I misunderstood the purpose & function of environmentalMaterial?

Jegelewicz commented 2 years ago

Wouldn't the controlled vocabulary examples listed for environmentalMaterial also be among the controlled vocabulary examples for materialSampleType?

I would think so.

Jegelewicz commented 2 years ago

If we are going to really flesh out a "Material" class in Darwin Core, the first step should be defining the class. We have MaterialSample to begin with, but I think we have agreed that the definition is not working for everyone. While some seemed opposed to it, I think the broadest possible definition for a Darwin Core "Material" class would be the Dublin Core PhysicalObject:

Term Name:  PhysicalObject

Label description
URI: http://purl.org/dc/dcmitype/PhysicalObject
Label: Physical Object
Definition: An inanimate, three-dimensional object or substance.
Comment: Note that digital representations of, or surrogates for, these objects should use Image, Text or one of the other types.
Type of Term: Class
Member Of: http://purl.org/dc/terms/DCMIType
Version: http://dublincore.org/usage/terms/history/#PhysicalObject-003

For me, this also removes the problems of human and machine observations (images, etc) from our discourse. The next question for me is are we only thinking about "curated" objects in Darwin Core? If that is true, then perhaps the best definition for MaterialSample might be:

All or portions of physical objects (as defined in Dublin Core) that are extracted ("collected") from the natural environment, and held in the custody of humans. Modified from https://github.com/tdwg/material-sample/issues/2#issuecomment-905117921

The problem I see in this definition has to do with LivingSpecimen, which may not really be "extracted" from the natural environment. So how about

All or portions of physical objects (as defined in Dublin Core) that may or may not be extracted ("collected") from the natural environment but are managed or curated by humans.

dr-shorthair commented 2 years ago

Is 'inanimate' a problem?

Jegelewicz commented 2 years ago

Is 'inanimate' a problem?

I would think so - a gorilla in the zoo, a tree in the botanic garden. Thanks for pointing that out! So now what.....I need to have a weekend!

deepreef commented 2 years ago

We already have a "Material" class in DwC (MaterialSample), so I assume you're not proposing we change the term itself, but just provide a better definition - correct? The term itself seems fine to me: "Material" refers to matter (physical). "Sample" implies that it's the subset of all physical things that humans capture or care for in some way.

I think it's beyond the scope of DwC to be defining terms that apply to literally everything that is a physical object (atoms? galaxies?). I think what we're interested in is the subset of physical objects that we humans handle or maintain or process in some way. I think dc:PhysicalObject could be indicated as the superclass of dwc:MaterialSample, and that classification could be part of the definition for the latter.

As for wording, I would favor something like:

Any physical object (as defined in Dublin Core), or discrete portion of a physical object, or aggregate set of physical objects, that is/are collected, processed, analyzed, managed, or curated by humans.

This encompasses objects, their derivatives, and aggregates, and also avoids potential ambiguities about "natural environment" (which might get a bit squirrelly if we want to accommodate other kinds of objects, like geological samples or cultural artefacts). We can probably remove some of the verbs (e.g., eliminate "analyzed", as it may be implied by "processed"?)

dr-shorthair commented 2 years ago

"Sample" implies that it's the subset of all physical things that humans capture or care for in some way.

I don't find that very helpful. It is also not very consistent with the various bits of discussion above.

The sentence quoted conflates these concerns in a rather confusing way.

I believe the concern here is to recognize that, if 'sample' and 'specimen' are both roles, and are somewhat independent of each other, then we need to identify the parent class of 'material things', some of which are also samples, some of which are specimens, and some of which are both.

http://purl.org/dc/dcmitype/PhysicalObject would be fine, except for the 'inanimate' qualifier :-(

@tombaker do you know why dctype:PhysicalObject must be 'inanimate' ?

dr-shorthair commented 2 years ago

I've raised an issue about 'inanimate' over on the DCMI issue tracker.

deepreef commented 2 years ago

I don't find that very helpful. It is also not very consistent with the various bits of discussion above.

  • things that are captured and cared-for are specimens
  • things that are subsets, or are in some other way representative, of some other identifiable discrete thing are samples. The sentence quoted conflates these concerns in a rather confusing way.

Fair enough -- that sentence was written hastily -- which is why I was a bit more careful in the wording of the definition text:

Any physical object (as defined in Dublin Core), or discrete portion of a physical object, or aggregate set of physical objects, that is/are collected, processed, analyzed, managed, or curated by humans.

So... replace "humans capture or care for in some way" with "collected, processed, analyzed, managed, or curated by humans". Not sure if that is any better, though.

I believe the concern here is to recognize that, if 'sample' and 'specimen' are both roles, and are somewhat independent of each other, then we need to identify the parent class of 'material things', some of which are also samples, some of which are specimens, and some of which are both.

I think we get way too hung up on the semantics of "specimen" (as a noun?) and "sample" (as a verb? noun?). Both of these terms have different meanings to different people, and different definitions in different contexts. Of the two (specimen and sample), my sense is that "sample" probably carries less misinterpretation-potential baggage. But maybe that's just me?

In any case, the good news is that we don't need to define "specimen", and we don't need to define "sample", because neither of those terms, by themselves, is a DwC term. What we do need to do is define MaterialSample as a term. If either "Material" or "Sample" as part of that term are so misleading and problematic that they create excessive confusion, then perhaps we need to come up with a new term. Personally, I think the costs of establishing a new term are greater than the costs of potential misinterpretation of pre-conceived notions of what "Material" or "Sample" somehow implies, so my preference, still, is to keep the term "MaterialSample".

So... I agree... the phrase "humans capture or care for in some way" was unhelpful. But I'm curious: what do folks think of the actual wording I proposed for the definition of MaterialSample (above)?

http://purl.org/dc/dcmitype/PhysicalObject would be fine, except for the 'inanimate' qualifier :-(

I agree that "inanimate" is problematic, but I think a bigger problem is the scope. I do not think that dwc:MaterialSample should adopt a definition that defines the scope as broadly as dc:PhysicalObject. I do see instances of dwc:MaterialSample as representing a subset (subclass) of dc:PhysicalObject, but I don't see the two concepts as congruent. Why? because not all instances of dc:PhysicalObject are "collected, processed, analyzed, managed, or curated by humans"; and my sense is that we would like to confine dwc:MaterialSample to that more limited scope of physical things.

dr-shorthair commented 2 years ago

Actually I think the verb 'to sample' is pretty clear, and helpful. My concern is exactly that your definition slides immediately over into the curation and handling aspect, which I understood to be associated with specimens, but not with all samples. That is confusing.

If the 'inanimate' qualifier could be removed from the Dublin Core class, then

dwc:MaterialSample rdfs:subClassOf dctype:PhysicalObject . 

We could also perhaps see an additional class

dwc:AccessionedThing rdfs:subClassOf dctype:PhysicalObject . 

to support the collections folk more explicitly. And then some individuals might be both -

my:Individual987 a dwc:MaterialSample , dwc:AccessionedThing . 

and implicitly also a dctype:PhysicalObject of course.

deepreef commented 2 years ago

Ok, yes -- that sounds right to me. What are some examples of AccessionedThing that are not also instances of MaterialSample? If there are none, then wouldn't this additional class be:

dwc:AccessionedThing rdfs:subClassOf dwc:MaterialSample

?

dr-shorthair commented 2 years ago

Sorry - I already edited my previous comment since I saw that our agreement was less complete than I'd originally thought. But I think your response is undamaged by that.

The key property of a Sample is its relationship to the identifiable, discrete thing which is sampled - the isSampleOf relation. Until you know what that is then it could be argued that an object in a collection - an AccessionedThing - is not (yet) a Sample, but merely a curiosity.

Furthermore, some artefacts can be simultaneously samples of more than one thing - taxa of more than one rank; an ecosystem; the set of specimens in a single collection, etc. But there would often be a scientific purpose for collecting a sample, which would point to its 'primary' source.

Note that by being a bit careful about the definition of 'Sample' in general, we can see MaterialSample as also having siblings in the social science world:

dwc:MaterialSample rdfs:subClassOf science:Sample .

socsci:PopulationSample rdfs:subClassOf science:Sample .

socsci:PopulationSample rdfs:subClassOf science:Population .

Of course the PopulationSample would never be accessioned by a museum, and arguably is not a PhysicalObject either.

stanblum commented 2 years ago

I've done a little remedial reading this weekend -- a couple introductions to making ontologies. I found both of these helpful.

Best Practices of Ontology Development. Rudnicki, Smith, Malyuta, and Mandrick, 2016-10-25.

Ontology Development 101: A Guide to Creating Your First Ontology. Noy & McGuiness, 2001-03.

In particular, the first one strongly recommends anchoring domain ontologies to a high-level ontology (i.e., the basic formal ontology, BFO). "Material-sample" entered DwC discourse from the BCO paper, where it was defined as a subclass of "Material Entity" produced by a sampling process. I think both words, "material" and "sample", were used as commonly understood; i.e., as given in a dictionary. Wikipedia has the term "Sample (material)", and the pertinent text includes:

"[...] a limited quantity of something which is intended to be similar to and represent a larger amount of that thing(s)." "The things could be countable objects [...] or an uncountable material. Even though the word "sample" implies a smaller quantity taken from a larger amount, sometimes full biological or mineralogical specimens are called samples if they are taken for analysis, testing, or investigation like other samples. They are also considered samples in the sense that even whole specimens are "samples" of the full population of many individual organisms."

I think we can distill something from that -- separating definition from explanation.

I am not bothered by the fact that museum specimens are collected and accessioned (made part of a permanent collection) to serve many purposes and without knowing any particular purpose beyond systematics. They characterize organisms, populations, species, higher taxa, and the environments they have lived in. Some are also used for education and display (engagement). Fitness for use is assessed later and changes with time and technology.

The Noy and McGuiness whitepaper talks about subclasses; logical conditions that must hold, how many to create, etc. They also mention multiple inheritance; when a subclass has properties of two higher-level classes. Fossil-specimens might fit that situation, being representative of both biological and geological processes.

dr-shorthair commented 2 years ago

'Multiple-inheritance' is a bit clunky - I find set-intersection a better way of thinking of it. image

Samples are things that are representative of some other (larger, or more abstract) thing, which are intended to support observations.

Material-thing ~= PhysicalObject

Intersection of Material Thing with Sample = Material Sample?

dagendresen commented 2 years ago

I only want to add that specimens (collection items) have a very special role in our community as also demonstrated with (current) representation in Darwin Core by no less than three classes (PreservedSpecimen, FossilSpeciman, and LivingSpecimen). If we want to suggest deprecating all these three to be merged into MaterialSample, we might want to make sure to include the word specimen somewhere in the definition or at minimum least in the usage comments. To reduce any confusion museum collection curators might get into. (I personally struggle to see any major harm in maybe consider keeping a PreservedSpecimen class, or similar, because this type of thing is so central in our community.)

dr-shorthair commented 2 years ago

In my diagram and comments above I used the term 'AccessionedThing'. I think that is your 'Specimen', but this distinction between specimen and sample is the issue that I've been trying to bottom out in this thread. I might have it wrong, so trying to be very clear and consistent in my explanations, pseudo-code and diagrams.

I am happy with the label 'Specimen' for accessioned, curated or reposited thing (and sub-classes for Preserved, Fossil and Living ones). But uncomfortable if 'Specimen' is just used as a synonym for 'Sample'.

dr-shorthair commented 2 years ago

@Jegelewicz I reached out to @tombaker from DCMI again, and the suggestion was to look at dcterms:PhysicalResource rather than dctype:PhysicalObject. This is partly because of the unfortunate (and probably unintentional) 'inanimate' qualifier, but mostly because the DCMI Type vocabulary is a bit of a historical dead-end, compared with the classes in the main DC-Terms vocabulary.

Jegelewicz commented 2 years ago

My concern is exactly that your definition slides immediately over into the curation and handling aspect, which I understood to be associated with specimens, but not with all samples.

@dr-shorthair There are TONS of "curated and handled" samples - I know of over half a million vials of tissue at the Museum of Southwestern Biology alone.

I think we get way too hung up on the semantics of "specimen" (as a noun?) and "sample" (as a verb? noun?). Both of these terms have different meanings to different people, and different definitions in different contexts. Of the two (specimen and sample), my sense is that "sample" probably carries less misinterpretation-potential baggage. But maybe that's just me?

@deepreef The statements above lead me to think that we need to eliminate both specimen and sample from the term. While creating a new term might be difficult, continuing along with a term that will mislead some segment of the user population seems worse.

What are some examples of AccessionedThing that are not also instances of MaterialSample?

Caution? - not all "samples" or even what some people refer to as "specimens" are "accessioned". My experience is that Natural History Museums/collections are not very good at the process of accessioning (some do not even know what that is), but they ARE curating and handling things. Accessioned seems like another buzz word that is perhaps left out of the mix?

In particular, the first one strongly recommends anchoring domain ontologies to a high-level ontology (i.e., the basic formal ontology, BFO). "Material-sample" entered DwC discourse from the BCO paper, where it was defined as a subclass of "Material Entity" produced by a sampling process. I think both words, "material" and "sample", were used as commonly understood; i.e., as given in a dictionary. Wikipedia has the term "Sample (material)", and the pertinent text includes:

"[...] a limited quantity of something which is intended to be similar to and represent a larger amount of that thing(s)." "The things could be countable objects [...] or an uncountable material. Even though the word "sample" implies a smaller quantity taken from a larger amount, sometimes full biological or mineralogical specimens are called samples if they are taken for analysis, testing, or investigation like other samples. They are also considered samples in the sense that even whole specimens are "samples" of the full population of many individual organisms."

@stanblum this could make me take back my reluctance to keep "sample" in the term...

Samples are things that are representative of some other (larger, or more abstract) thing, which are intended to support observations.

Material-thing ~= PhysicalObject

Intersection of Material Thing with Sample = Material Sample?

@dr-shorthair this makes all kinds of sense to me, which means I am probably missing something....

I only want to add that specimens (collection items) have a very special role in our community as also demonstrated with (current) representation in Darwin Core by no less than three classes (PreservedSpecimen, FossilSpeciman, and LivingSpecimen). If we want to suggest deprecating all these three to be merged into MaterialSample, we might want to make sure to include the word specimen somewhere in the definition or at minimum least in the usage comments. To reduce any confusion museum collection curators might get into. (I personally struggle to see any major harm in maybe consider keeping a PreservedSpecimen class, or similar, because this type of thing is so central in our community.)

I am going to push back here. In Arctos, we manage biological, geological, palaeontological, and cultural collections. We have decided to drop "specimen" from all of our interfaces and documentation because it is offensive to cultural collections. We can all learn to use new terminology, in fact I think we must if we are going to remain relevant to as may as possible.

look at dcterms:PhysicalResource rather than dctype:PhysicalObject.

@dr-shorthair Huzzah! This seems to fit the bill perfectly as the root of our ontology tree?

Jegelewicz commented 2 years ago

So,

Material-thing ~= PhysicalResource

Intersection of Material Thing with Sample = Material Sample?

deepreef commented 2 years ago

Samples are things that are representative of some other (larger, or more abstract) thing, which are intended to support observations.

I'm a little queasy about this. I'm not sure what physical (or conceptual) thing is not a representative of some other (larger, more abstract) thing. As far as we know, everything that exists is a representative of the known Universe; and even the known Universe is (at least, conceptually) a representative of the Multiverse. In other words, what's an example of something that is not a "Sample" in this sense? I guess you can focus on the "which are intended to support observations", but I'm not sure we want to limit the scope of instances of MaterialSample to only those things conforming to this intent.

If we want to suggest deprecating all these three to be merged into MaterialSample, we might want to make sure to include the word specimen somewhere in the definition or at minimum least in the usage comments.

I would avoid using the word "specimen" in the definition of MaterialSample. As illustrated by this and other related discussions, this word tends to have different meanings to different people. I think it's certainly appropriate to use this word in the examples (to help ground curators used to thinking of their objects as specimens). If it is included in the definition, then I would include it only as an example, alongside other words that characterize things that fall within the scope of MaterialSample.

The statements above lead me to think that we need to eliminate both specimen and sample from the term. While creating a new term might be difficult, continuing along with a term that will mislead some segment of the user population seems worse.

Yeah, maybe you're right. I'm much less bothered by the term itself than I am by the current definition. It was originally proposed to accommodate aggregate samples more explicitly (e.g., water or soil samples), but I think it came to be adopted by many (including me) as a class of instances we used to call "CollectionObjects", plus other things that aren't necessarily parts of "collections" but are collected, processed, analyzed, managed, or curated by humans in some way.

Caution? - not all "samples" or even what some people refer to as "specimens" are "accessioned".

Agreed! But I wasn't asking whether there are any MaterialSamples that are not accessioned (PLENTY of those). I was asking whether there were any accessioned items (in our context) that were not also instances of MaterialSamples. From my perspective, we don't need to worry about things that are accessioned in this discussion -- that's just another set of properties and implications that apply to some particular subset of MaterialSamples, but don't really factor into the discussion about defining dwc:MaterialSample (or its replacement term).

@dr-shorthair Huzzah! This seems to fit the bill perfectly as the root of our ontology tree?

+1!

I definitely can get on board with: dwc:MaterialSample rdfs:subClassOf dcterms:PhysicalResource

tombaker commented 2 years ago

@dr-shorthair Thank you for raising this!

the DCMI Type Vocabulary is a bit of a historical dead-end, compared with the classes in the main DC-Terms vocabulary

The DCMI Type vocabulary was first drafted in 1999 by Rebecca Guenther, a MARC cataloging expert at the Library of Congress. The first-draft definition was actually "a non-human object or substance" (see also https://github.com/dcmi/usage/issues/101#issuecomment-945451133). Note that this first-draft vocabulary was a controlled list of strings. The RDF notion of Class was still being sorted out in the RDF Working Group.

To admit the obvious, DCMI classes were coined more out of pragmatism than principle. Like @dr-shorthair I think in terms of set intersection, which can be awkward or messy.

I do not recall anyone ever shining such a bright light on dcterms:PhysicalResource and dcterms:PhysicalObject, so if you see issues that lie within our remedy (ie, within the bounds of the DCMI Namespace Policy), please do let us know.

Jegelewicz commented 2 years ago

HMMMMM. Now I am going to muddy the water! This makes me think that perhaps we are lumping things together which shouldn't be? Maybe LivingSpecimens are never also MaterialSamples? Perhaps, they should be Agents http://purl.org/dc/terms/Agent just as discussed here?

deepreef commented 2 years ago

HMMMMM. Now I am going to muddy the water! This makes me think that perhaps we are lumping things together which shouldn't be? Maybe LivingSpecimens are never also MaterialSamples? Perhaps, they should be Agents http://purl.org/dc/terms/Agent just as discussed here?

This gets back to a point I've made repeatedly int he past, and what I still see as the most important subtlety this task group needs to sort out, which is the boundary between dwc:Organsim and dwc:MaterialSample. I've written about this extensively in one of the earlier GitHub discussions (can't remember which issue, but can look it up if needed).

In my mind, instances of dwc:Organism are not inherently physical objects, although they do have an evolving physical manifestation (starting when a sperm fused with an egg, or a single-cell organism cleaved; ending either when the organism ceased to be alive, or when its material composition disintegrates). We think of them as physical objects, because at any given moment in time we perceive them as such.

So... I'm still a little fuzzy on where an Organism ends and a MaterialSample begins (in the case of MaterialSample instances that are biological in nature); but the Venn diagram would show overlap (i.e., there are Organisms that never result in instances of MaterialSamples, and there are MaterialSamples that are not derived from Organisms, but there are also MaterialSamples that are derived from Organisms). As I've suggested elsewhere when this comes up, I see no reason why a LivingSpecimen cannot both be an Organism and a MaterialSample at the same time.

But the important thing is that Organism != MaterialSample, as each has a different essence/definition, and different properties and relationships (e.g., dwc:Identification instances apply to dwc:Organism instances, not dwc:MaterialSample instances). So there are some properties and relationships of an example of LivingSpecimen that would be represented through an instance of MaterialSample, and other properties & relationships represented through an instance of Organism.

Put another way, whether or not the Organism associated with an instance of MaterialSample is currently alive or dead/preserved or mineralized in stone (i.e., LivingSpecimen vs. PreserevedSpecimen vs. FossilSpecimen) doesn't really affect whether it should be framed as one or the other (Organism or MaterialSample). It can be both at the same time.

Also, as I have asserted in other discussions, I'm a big fan of thinking of Agents as either synonymous with Organism, or as a subclass of Organism -- but in either case, certainly not limited to instances of Organism that are identified as Homo sapiens.

dr-shorthair commented 2 years ago

@deepreef when you say Organism != MaterialSample do you mean (a) the classes are disjoint; the set-intersection is empty, there are no Organisms that are also MaterialSamples (b) the classes are not identical; there may be Organisms that are not MaterialSamples, and there may be MaterialSamples that are not Organisms, and there may be things that are both

?

deepreef commented 2 years ago

Sorry -- I was hasty in my wording.

Basically, I seem them as different (non-overlapping) classes, with different properties, semantic relationships with other classes, etc. -- which I guess is your option (a). My Venn-diagram comments were probably unhelpful, but I was trying to suggest than an organism (lower-case "o") that is both alive and curated/captive/cared-for (e.g., LivingSpecimen) can be described with terms belonging to both classes (dwc:Organism and dwc:MaterialSample).

But from a semantic sense, I think they are disjoint classes. It's confusing -- and this is the root of my desire to define the boundary between Organism and MaterialSample -- because we conceptualize a living organism as a physical object -- which we associate with MaterialSample. But while there is certainly a well-defined relationship between MaterialSample and Organism (though I'm not really sure exactly how to characterize this relationship), I think they are as distinct from each other as, say, dwc:Identification and dwc:Taxon.

I hope that helps (but fearful that it may not...)

dr-shorthair commented 2 years ago

I fear we are getting tangled up on the labels which is obscuring the fully clarification of the descriptions and roles. I think we have been talking about the following classifications (not exhaustive). I've attempted to break them down into fairly primitive concerns. These are NOT all mutually disjoint (though some pairs are):

(i) immaterial things (ii) material things (iii) dead things (iv) living things (v) things with agency (i.e. that are active participants in processes and activities) (vi) things that are intended to be representative of some larger or less accessible thing (vii) things that are managed as part of an ongoing collection or archive (viii) preserved things

(xi) events

We each use specific names to denote specific classes. The named classes are often intersections of two or more of these concerns, and usually assume some specific context or frame. The same class (or class intersection) may be denoted by different names in different communities, even within the TDWG orbit.

I'll give my names in a separate comment, so as not to bias other responses.

dr-shorthair commented 2 years ago

Simon's provisional names:

Sample = (vi), subset of (the union of (i) and (ii)) Material Sample = intersection of (ii) and (vi) Specimen = (vii), subset of (ii) (are there non-material specimens?) People during their lifetime = intersection of (iv) and (v)

Drug testers:

Specimen = intersection of (vi) and (ii)

Social Scientists:

Sample = intersection of (vi) and (i)

The breakdown of concerns, and the names given to the classes and class intersections is not absolute. It is framed by your application. Attempts at universal classification systems have been attempted going back to the Greeks, and continue today particularly in so-called 'upper' or 'foundational' ontologies, such as DOLCE and BFO. Personally I find some of them useful, some of the time, some not so much, none perfect. But I'm a utilitarian, not an idealist.

deepreef commented 2 years ago

I fear we are getting tangled up on the labels which is obscuring the fully clarification of the descriptions and roles.

I agree!

I'll need to think a bit more before commenting on your second post, but I wanted to map the items in your list to existing DwC classes, as I see them:

I confess I'm not 100% sure yet how best to represent MaterialCitaion in the list above, as I haven't spent much time thinking it through yet.

With a somewhat strictly defined interpretation of "intended" in (vi), I would go with all the DwC classes that already do (or should) include a "Parent" property (Taxon, Event, MaterialSample, etc.), and maybe a few that don't have a parent, but may be modelled hierarchically (like Location).

stanblum commented 2 years ago

@dagendresen wrote: < I only want to add that specimens (collection items) have a very special role in our community [...]. If we want to suggest deprecating all these three to be merged into MaterialSample [...].

Our specimens, and tissue samples and DNA extracts, etc. are kinds of material-sample. I do NOT support deprecating or subsuming those subclasses into material-sample. I think every kind of biodiversity specimen is a kind of material-sample.

I don't think I agree with Rich's earlier assertion that no material sample is an organism; that they are disjoint sets. A living organism can be collected for use in scientific study and thus meet the critical criterion of material-sample. Does being dead make something NOT an organism? If a fish is a kind of organism, and I tell you this thing is a dead fish, isn't it (still) a kind of dead organism?

deepreef commented 2 years ago

Our specimens, and tissue samples and DNA extracts, etc. are kinds of material-sample. I do NOT support deprecating or subsuming those subclasses into material-sample. I think every kind of biodiversity specimen is a kind of material-sample.

So I guess the way I see it is that things like LivingSpecimen, PreservedSpecimen, FossilSpecimen, "EnvironmentalSample", "TissueSample", etc. are better framed as entries in a controlled vocabulary, as values for something like a materialSampleType property, rather than subclasses with their own specific/unique properties and relationships. I don't know enough about LOD/Semantics to understand the implications of treating them as values in a controlled vocabulary for a property as opposed to subclasses of MaterialSample, so I may be wrong about this. But just do be clear, I didn't mean that the terms had no value; I just meant that they should be represented as values in a controlled vocabulary, instead of distinct classes in DwC.

I don't think I agree with Rich's earlier assertion that no material sample is an organism; that they are disjoint sets.

I think that depends on the meaning of "is an" in the quoted text above -- and it also underscores my long-standing uncertainty about the boundary between MaterialSample and Organism. And to be clear, I think an individual (e.g., a living tiger in a zoo) can simultaneously have MaterialSample properties and Organism properties -- so in that sense, the Tiger is both a MaterialSample and an Organism. But my point is that the properties of an Organism and the properties of a MaterialSample are non-overlapping. Whether or not that means the two classes are "disjoint sets", or something else, is a question that exceeds my understanding of the terminology of this space.

The way I understand it, the properties that apply to the Tiger as an instance of Organism are properties that are true at all moments of the existence of the Organism -- from conception until death or disintegration. These are not related to the physical being of the tiger, because the physical being changes dramatically over the course of its life. So things like taxonomic identity and gene sequences and blood type and other things like that are properties of the Organism instance. Stuff that applies to the physical manifestation of the Tiger, like condition reports, or its participation in transactions with other zoos, etc. seem, to me, to represent properties of the Tiger as an instance of MaterialSample.

A living organism can be collected for use in scientific study and thus meet the critical criterion of material-sample.

Absolutely! Which is why I think LivingSpecimen should be included among the controlled vocabulary values for materialSampletype.

Does being dead make something NOT an organism?

That's a key part of the question I've been asking for a long time now (spoiler alert: I don't have a good answer). I would say that the Organism does not exist until either a sperm fertilizes an egg, or an asexual organism splits into two, or whatever reproduction mode applies. But does that mean that, once created, the Organism continues to exist into all eternity from that moment forward? I don't think so. After the last molecule that had comprised the physical being of the Organism at the time of its death has completely dissociated, I don't think we would continue to think of that set of dissociated molecules as still being the "organism". So eventually it ceases to be. But I'm not sure when that cessation of being an Organism happens. I would say certainly not before it dies, and certainly not after it completely disintegrates -- so I would say that an Organism stops being an Organism at some point between those two points in time.

If a fish is a kind of organism, and I tell you this thing is a dead fish, isn't it (still) a kind of dead organism?

Sure (maybe?) But if that same fish is eaten by a shark, and some of its molecules are absorbed into the shark's body through digestion, and other molecules are excreted over time -- would you still call that dissociated set of molecules scattered over miles of reef and ocean water to collectively still be a fish? I'm guessing not. So... somewhere between the point at which it stopped living, and the point at which its molecules are dissociated and dispersed, I would say it stopped being an Organism.

I could wax on about this for hours, but I think that wouldn't be helpful for the task at hand. The core task is to come up with a definition for MaterialSample (or its replacement term) that works for the needs of the TDWG community (and beyond). Part of that definition should help define the boundary between instances of the MaterialSample class and the Organism class.

I think @baskaufs has suggested (and I agree), that a more practical way to arrive at these definitions and distinctions is by figuring out which properties go with which class, and from those respective sets of properties the boundaries of the classes should emerge. I have a pretty clear idea which properties I would assign to each of these two classes, but I've already consumed too much bandwidth on this discussion, and I need to get some sleep before TDWG starts again (1am Hawaii time... ouch). So I'll end it here for now.

stanblum commented 2 years ago

Thanks for those clarifications, Rich. I think we agree. Not all organisms become material-samples, and not all (biodiversity) material-samples are (whole) organisms, so the one-to-one correspondence that can exist in some cases is not a class-subclass relationship. I also want to argue that organism and (biodiversity) material-samples should be recognized as distinct things because our samples infer the existence (or former existence) of organisms and their properties. Samples tell us about organisms, and by inference populations and taxa.

deepreef commented 2 years ago

Thanks, @stanblum - yes, we definitely agree! I apologize that my endless ramblings don't always capture my points clearly.

But I would like to focus on this a bit more:

Not all organisms become material-samples, and not all (biodiversity) material-samples are (whole) organisms, so the one-to-one correspondence that can exist in some cases is not a class-subclass relationship. I also want to argue that organism and (biodiversity) material-samples should be recognized as distinct things because our samples infer the existence (or former existence) of organisms and their properties. Samples tell us about organisms, and by inference populations and taxa.

... because this gets to the heart of not only the definition of MaterialSample and the boundary between that class and Organism, but also helps clarify the nature of the relationship between instances of these two classes.

First of all, I should explain that in our implementation "Organism" is itself a subclass of something we call "Individual". The latter is broader in scope and includes all manner of non-biological things. So for us, the relationship between MaterialSample and "Individual" is maintained for both biological (biodiversity) and non-biological stuff (part of my preference for maintaining the definition of MaterialSample broad to allow non-biological things).

But even if we focus only on the biological/biodiversity subsets of these two classes [Organism and MaterialSample(biodiversity)], the issue is the same: what is the semantic nature of this many-to-many relationship between Organism and MaterialSample?

Again, I don't have a clear answer, but I think we should explore this as a way to refine the definition of dwc:MaterialSample.

At the heart of this is your point that "...our samples infer the existence (or former existence) of organisms and their properties. Samples tell us about organisms, and by inference populations and taxa."

I think there is some consensus that instances of MaterialSample have hierarchical relationships with other MaterialSample instances (hence the proposed new term, parentMaterialSampleID). For example, starting with a "whole organism" (e.g., a dead fish) that is curated in a museum collection, we have one MaterialSample instance representing the physical entity of that whole fish, which is preserved in some particular way. Then we may have one or more tissue samples removed from the fish, which is/are preserved in some other particular way. Then we may combine that fish with several others identified to the same taxon and collected through the same Occurrence/Event into a single "lot". This yields something like this:

MSID parentMaterialSampleID materialSampleType Comment
1 - lot Aggregate set of three fish specimens sharing the same taxon and collecting event occurrence instance
2 1 whole organism First of the three fish in the lot
3 1 whole organism Second of the three fish in the lot
4 1 whole organism Third of the three fish in the lot
5 2 tissue sample A tissue sample extracted from the "First" of the three fish in the lot
6 2 tissue sample Another tissue sample extracted from the "First" of the three fish in the lot

[Side note: I'm imagining that the example values above for materalSampleType are subtypes of PreservedSpecimen.]

Separately, we'd track each of the Organisms comprising the lot of specimens:

OID Comment
7 Organism instance of the "First" fish
8 Organism instance of the "Second" fish
9 Organism instance of the "Third" fish

There are three examples of one-to-one correspondence between Organism and MaterialSample, that could be represented like this:

OID MSID
7 2
8 3
9 4

Perhaps that's all we need in this example, because we can infer/derive the relationships between instances of MaterialSample and Organism for MSID 1, 5 & 6 through their respective parentMaterialSampleID relationships. But this only works if you actually have the WholeOrganism instances, which may not always be the case. So there may need to be one-to-many Organism-to-MaterialSample relationships:

OID MSID
7 2
7 5
7 6

Similarly, there may need to be many-to-one Organism-to-MaterialSample relationships:

OID MSID
7 1
8 1
9 1

I'm not trying to divine an implementation data model; rather I'm trying to get at the nature of the relationships both among instances of MaterialSample (via parentMaterialSampleID) and between instances of MaterialSample and instances of Organism. In other words, what are the predicates? And how many do we need (both within MaterialSample via parentMaterialSampleID, and between MaterialSample and Organism)? If we can get a handle on this, I think it will help clarify the boundaries between the two classes, and by extension, the definitions of both terms.

Jegelewicz commented 2 years ago

Side note: I'm imagining that the example values above for materalSampleType are subtypes of PreservedSpecimen.

Calling "whole organism" a subtype of PreservedSpecimen seems pretty darn confusing!

stanblum commented 2 years ago

Back on Oct 17, 2021 I mentioned that I think "Material Sample" entered the DwC discourse through the BioCollections Ontology (BCO). A change in BCO I wasn't aware of until yesterday is that BCO has now deprecated the "Material Sample" class (made it an obsolete class), and instead have adopted a term/class from a larger ontology, the Ontology for Biomedical Investigations (OBI)(!):

obi:specimen: A material entity that has the specimen role.

This combines several of the notions we've been discussing: a material entity that is the result of a material sampling process and has been taken (collected and understood) to represent some larger entity (thing, population, community) in further study or analysis.

Also deprecated in BCO were the subclasses of Material Sample, including: preserved-, living-, and fossil-specimen.

I thought it was noteworthy that having taken materialSample from BCO to create a superclass for all the different kinds of things we manage in the biocollections community, the DwC is now (still) using "material sample," while the BCO is now using "obi:specimen." Should we follow? Would it be appropriate for us to 1) incorporate the obi:specimen term in DwC, or 2) mint our own specimen term, dwc:specimen, and paraphrase their definition while including a "crosslink", like:

dwc:specimen : is "same-as" or "comparable-to" : obi:specimen

The argument for the second option being that DwC is currently a bag of terms and doesn't support reasoning, which OBI (an OWL ontology) does. In other words, they aren't the same kinds of standards, so incorporating an OBI term in DwC isn't the right thing to do. The better practice might be just to reference obi:specimen in some appropriate way. I'll defer to others with more experience.

Or, given that we also want to include environmental samples in DwC (for metagenomic analysis), should we just retain the term "material sample", because most people wouldn't think of an environmental sample as a "specimen."

smrgeoinfo commented 2 years ago

looks like OBI still has 'material sample', defined as 'A material entity that has the material sample role', which is a subclass of specimen, 'A material entity that has the specimen role.'. I don't see anything about deprecation (Last uploaded: January 10, 2022). You'd be hard pressed to distinguish specimen from material sample given their definitions, so I can see why they'd get rid of one of them.

deepreef commented 2 years ago

Thanks, @stanblum!

Does OBI define the scope of “specimen role”? And what other kinds of material entities (in the sense of OBI) are outside that scope?

I’m not in favor of changing dwc:MaterialSample to dwc:Specimen if they have essentially the same definition (for reasons articulated by @baskaufs at an earlier zoom meeting).

baskaufs commented 2 years ago

I want to add a bit of historical perspective on the relationship between dwc:MaterialSample and OBI. Adding MaterialSample was sort of a "test case" for aligning Darwin Core terms with terms outside of TDWG, particularly terms in formal ontologies. Discussion of the proposal was extensive -- for those interested, it is archived in the tdwg-content listserv archive between 2013 April (term proposed) and 2013 October (term ratified) and most particularly in 2013 May.

In the end, the adopted class was defined to be a subclass of http://purl.obolibrary.org/obo/OBI_0100051, which I believe at the time had the label "material sample", but whose label has now been changed to "specimen". Declaring a TDWG term by its relationship to a non-TDWG term other than those in Dublin Core was a new thing to TDWG. Eventually, the decision was made and codified in Section 4.4.2.2 of the Vocabulary Maintenance Specification(SDS) that assertions that generate machine-computable entailments should not be included in the core metadata about a term, but rather in an "extension term list" layered on top of the basic "bag of terms" layer.

As a result, the subclass declaration for dwc:MaterialSample was stripped out of the defining metadata for the term and there was no effort to assert it in any official extension term list. Because of the SDS guidelines, the subclass property was dropped from the metadata history table, so one actually can't discover that it was ever there unless you read the old tdwg-content emails (principally this one. But this history should inform our understanding of what has happened in the past to lead us to our current circumstance.

There are two important issues that are raised by Stan's comment. The first is the importance of differentiating between term labels and the terms themselves. There is no such thing as obi:specimen, OBI uses opaque numeric identifiers. As I noted, I'm pretty sure that the label of obi:0100051 has changed from "material sample" to "specimen" since 2013 (one would need to dig through the OBI history to find out for sure and the demise of Google Code doesn't help in investigating the email thread). That does not mean the term itself has changed. To know that, we would need to compare the definition in the past with the definition today. That is the danger of conflating labels with "terms" or their immutable IRI identifiers. Changing a label does not change a term.

The second issue, which is currently very relevant is the mechanism by which we make connections between TDWG terms and terms defined outside of TDWG. This has been a topic of discussion for years, without resolution. Some suggestions, like using the SKOS relationship terms like skos:exactMatch are problematic because they do come with undesirable entailments. The Audubon Core Maintenance Group has thrown down the gauntlet and suggested a solution to this problem, which you can read about in this proposal, which is now under public review. In the proposal, we were very transparent about the fact that this is a precedent-setting proposal. I've talked about it with John Wieczorek, and if the proposal goes through, the Darwin Core Maintenance Group will probably follow the precedent and use sawsdlrdf:modelReference in circumstances where we want to create a machine-followable link to a term outside of TDWG without generating machine-computable entailments. That would be the case in several proposals where OBO ontology terms have been suggested as values for controlled vocabularies. If you don't like this solution, then you'd better comment on the proposal in the next three and a half weeks or it's going to be a fait accompli. If you don't like it, explain why you don't like it and propose a better solution.

Jegelewicz commented 2 years ago

Term change

Current Term definition: https://dwc.tdwg.org/list/#dwc_MaterialSample

Proposed attributes of the new term version (Please put actual changes to be implemented in bold and ~strikethrough~):

deepreef commented 2 years ago

Sorry I missed the last session on this. One question and one comment:

Question: In the proposed new definition, is there a difference between "physical object" and "physical entity"?

Comment: The proposed examples seem a little animal-centric -- maybe a plant and a bacteria example would be good to add? Also, maybe a better/more intuitive example of an "undetatched" instance of MS would be a fossil aggregate represented by a single physical rock with multiple embedded organisms.

Also... I hadn't considered the "undetatched" potential within a single organism in MaterialSample examples. Certainly examples of multiple organisms represented as a single colelctive object (aforementioned fossil; hermit crab+shell+anemone; etc.). But I'd not considered the possibility of branding undetatched subcomponents of the same individual Organism as distinct MS instances. I guess that means that any given MS instance of a single organism could have near-infinite potential child instances, without any disarticulation action happening to the whole. I don't have a problem with this, but the non-normative documentation should probably explain this a bit more, with an explanation that MS instances are minted when there is an informatic need to do so, and also including examples where there is an informatic need to track undetatched subcomponents of an object (e.g., the undetached leg of a dog).

cboelling commented 2 years ago

A physical object that represents a physical entity of interest in whole or in part.

I understand that the notion of "representing" in the proposed definition is to convey the notion that an object that is the subject of collecting or observing (e.g., a goose, a swarm of geese, a fossil bearing rock small enough to be lifted, a small twig from a tree) subsequently to its collecting or observing, often is used to drive inferences about about a larger whole that it is part of or relates to in a specific way (the Swedish population of geese, the mountain range the rock originates from, the entirety of shrubs belonging to the same species).

I think the use of two different terms ("physical object" - "physical entity") can be defended to reflect that a subject of collection or observation will as a matter of necessity be spatially more confined than physical entities in general (the latter including, for example, all geese in the world, the taiga, Earth's atmosphere).

It would be good though, to clarify this in the documentation around the definition.

Apart from this I agree with @deepreef's conclusions about instances of dwc:MaterialSample containing other instances of dwc:MaterialSample as proper parts and the need to explain that and add other examples. Note that composition can also go the other way: each of a dinosaur skeleton's bones in a collection can be an instance of dwc:MaterialSample as can be the set of bones, even if they aren't physically associated in the collection. If that set of bones is incomplete, also the set of bones that potentially make up the whole skeleton could, IMO, be minted as another instance of dwc:MaterialSample satisfying certain informatic needs.