tdwg / material-sample

A Task Group of the Observations and Specimen Records (OSR) Interest Group
2 stars 0 forks source link

`dwc:occurrenceID` in the context of `dwc:catalogNumber` review #21

Closed cboelling closed 2 years ago

cboelling commented 2 years ago

Originally posted by @tucotuco in https://github.com/tdwg/material-sample/issues/6#issuecomment-903191602

An occurrenceID is meant to be, ideally, a resolvable (IRI that returns metadata when requested) global unique identifier for an assertion that an Organism was present or absent, or a that no Organism identifiable as a member of a Taxon was present at a particular place at a particular time following some protocol for detection. It does not identify a material entity or a digital entity - it is an identifier for instances of an abstract concept of an exemplar of a Taxon having been present or absent, possibly backed by evidence in the form of material or digital entities.

From the current definitions of dwc:Occurrence and dwc:OccurrenceID I understand that a dwc:occurrenceID is intended an identifier for a dwc:Occurrence (the being present of one or more individuals of taxon X (somewhere) within geographical location L (at some time) during time interval T), which is, while less tangible than a preserved specimen, a real thing and exists independently of what biodiversity researchers think or do.

The concept described in the quote is different and on the level of assertions (i.e. what a human agent thinks about an ocurrence for a given X, L, T) including assertions of absence, i.e. that a given occurrence, specified by X, L, T, did not occur.

I just would like to make sure I understand the existing terms correctly or if there are new requirements for those. I created this as a new issue to keep the discussion in the original issue #6 close to its core topic.

deepreef commented 2 years ago

I tend to think that our community might have painted itself into a corner and that maybe accepting that dwc:Occurrence has become predominantly used as the Evidence of might maybe be a possible least bad way out.

I think the real corner we painted ourselves in, years ago, was the (mis)interpretation that specimen=Occurrence. This group and the class MaterialSample offer us a pathway out of that corner by separating the physical things we deal with (preservation methods and subsampling and loans and such) from the research value of those things (traditionally focused on information about the where and when of its extraction from nature). Most of it, I think, is pretty straightforward. The main fuzzy part (in terms of both idealized conceptualization and practical implementation) is this business of the boundary between dwc:Organism and dwc:MaterialSample, and the respective lifespans of each. Depending on how we lock in those boundaries and lifespans, and whether and to what extent they overlap in space and time, we may (or may not) have another question of how to manage instances of "An existence of a MaterialSample (sensu however we end up defining it) at a particular place at a particular time."

In my mind, the semantics of an "occurrenceNumber" is already exactly covered by dwc:recordNumber.

Agreed!

I also think that we are in agreement of the value of "Occurrence" - and that we agree (as is the motivation for this task group) that we need ANOTHER concept to describe objects in collections (PreservedSpecimen, MaterialSample, ...). It is the latter need I intended to express by "the occurrences of things is a rather poor proxy for describing the things themselves".

Ah! OK, understood. In that case, we agree -- and I think the convergence on MaterialSample (and its definition, scope, and properties) -- which is what this Task Group is focused on -- is the path to sorting that stuff out (i.e., once and for all dispelling the "specimen=Occurrence" issue). But I don't think we need to mess around with dwc:Occurrence, except to steal a few of the properties that are organized in that class, so they are instead organized in the MaterialSample class.

dagendresen commented 2 years ago

Is not

the (mis)interpretation that specimen=Occurrence

just a subset of the larger (in scope) misconception that Occurrence = any evidence of an organism-occurrence? (as in effect treating specimens only as such evidence)

RogerBurkhalter commented 2 years ago

I Occurrence data as one of the main paths forward in paleontology as, especially human observations in the form of measured section notes as a primary information source of new finds and new studies. Often, a researcher with a bias towards collecting a particular taxon type, for example, Devonian gastropods. While documenting the section, they happen upon an occurrence of ostracods. The gastropod researcher may have no interest in those ostracods, but note the occurrence. Later, when another researcher is seeking Devonian ostracods for research, having that occurrence documented and findable is a major plus. There are literally tens of thousands of detail documented measured sections, published and unpublished, in museum collections and other repositories (like the USGS) that have hundreds of thousands of human observations of similar type occurrences. These are very important and under-documented resources that could certainly influence the future of study.

deepreef commented 2 years ago

Is not "the (mis)interpretation that specimen=Occurrence" just a subset of the larger (in scope) misconception that Occurrence = any evidence of an organism-occurrence? (as in effect treating specimens only as such evidence)

I guess you could look at it that way, but the history more or less boils down to:

So, yeah, "specimens"/MaterialSample were certainly part of the "Evidence" conversation, but the conflation of Specimen=Occurrence predates that by quite a bit.

However, I guess it is fair to say that "Specimen=Occurrence" is something of a subset of Occurrence=Evidence -- and this is also supported by cases where the same Occurrence instance is represented separately for the specimen and for the image of the specimen. But that's often another set of issues because in many cases, the image wasn't taken at the same place and time where the specimen was collected; rather some time later at a different location (e.g., in a lab). So in those cases, the image doesn't even represent evidence of the same occurrence that the specimen represents.

In any case... my original point is that we should not re-define "Occurrence" as being the Evidence (as here). In other words, the specimen and the image and the field notebook are not the Occurrence -- the Occurrence was the presence of the organism at the place and time.

Jegelewicz commented 2 years ago

on one side hand we have a name and a label for a concept - a shortcut or alias, and on the other side we have a definition (canonically in English) to convey the better understanding of what it means. The definition has to stand on its own, and can not rely on the label for further edification.

@tucotuco that may be, but clearly we don't have a common idea of what "organism" means.

It is if there is an understanding of what "organism" (not dwc:Organism). If there isn't, then that needs to be added to the definition or the usage notes to clarify what meaning of "organism" in English is being used.

So I think a clarification is needed, because until we can disentangle dwc:Organism from dwc:MaterialSample, I don't think we can move on.

The main fuzzy part (in terms of both idealized conceptualization and practical implementation) is this business of the boundary between dwc:Organism and dwc:MaterialSample, and the respective lifespans of each. Depending on how we lock in those boundaries and lifespans, and whether and to what extent they overlap in space and time, we may (or may not) have another question of how to manage instances of "An existence of a MaterialSample (sensu however we end up defining it) at a particular place at a particular time."

Jegelewicz commented 2 years ago

Depending upon our clarification for "organism", the second wrench in the works that I think we need to address is how are dwc:Organism and dwc:LivingSpecimen different?

dagendresen commented 2 years ago

Would something along the lines of ... be useful:

stanblum commented 2 years ago

On Sun, Nov 14, 2021 at 6:54 AM Teresa Mayfield-Meyer < @.***> wrote:

Depending upon our clarification for "organism", the second wrench in the works that I think we need to address is how are dwc:Organism and dwc:LivingSpecimen different?

A living specimen is, of course, an organism. I think the key distinction between the two concepts is that LivingSpecimen is a kind of MaterialSample, whereas the DwC Organism class is intended to represent an organism that is inferred to exist or to have existed (past tense). The critical role for the Organism class is that the concept and in particular the property dwc:OrganismID ties together multiple occurrence records that derive from the same organism. Other properties derive from the Organism class (most importantly what taxon the organism represents), but in our "shorthand" practice they are commonly recorded as properties of something that has a 1:1 relationship with organism, i.e., the Occurrence or the whole-animal MaterialSample.

deepreef commented 2 years ago

A living specimen is, of course, an organism. I think the key distinction between the two concepts is that LivingSpecimen is a kind of MaterialSample, whereas the DwC Organism class is intended to represent an organism that is inferred to exist or to have existed (past tense).

I agree, and was going to make similar points in the as-yet-unwritten "Chapter 6" of my unsolicited dissertation.

To me, the two core properties of a MaterialSample are: 1) It consists, in essence, of physical matter; and 2) It is under the direct control and care of humans.

These need to be fleshed out more (as I had intended within my concluding "Chapter 7"), and I'm still getting my head around whether I agree that the "sample" necessarily requires it to be some subset of a larger thing and/or whether the verb part of "sample" is definitive.

The critical role for the Organism class is that the concept and in particular the property dwc:OrganismID ties together multiple occurrence records that derive from the same organism.

I would consider that "a" critical role; not "the" critical role. Certainly it was the original critical role (sensu the old dwc:individualID within the Occurrence class). But I think the other roles you alluded to:

Other properties derive from the Organism class (most importantly what taxon the organism represents), but in our "shorthand" practice they are commonly recorded as properties of something that has a 1:1 relationship with organism, i.e., the Occurrence or the whole-animal MaterialSample.

... which I would summarize these as: 1) The bridge between a dwc:MaterialSample and an dwc:Identification; and 2) The bridge between a dwc:MaterialSample and an dwc:Occurrence.

... are actually more directly relevant to this Task Group, and in our modern thinking of representing DwC as more than just a bag of terms loosely organized into Classes.

dr-shorthair commented 2 years ago

To me, the two core properties of a MaterialSample are:

  1. It consists, in essence, of physical matter; and
  2. It is under the direct control and care of humans.
  1. It is a sample of something
stanblum commented 2 years ago

@deepreef wrote:

I agree, and was going to make similar points in the as-yet-unwritten "Chapter 6" of my unsolicited dissertation.

Maybe I've started your Chapter 6? In any case, I've been working one myself, which I posted on the Wiki home page.

I think we can start working towards definitions, and analyzing scenarios (not really full use cases) to come up with recommendations about how the resulting records should be published and interpreted. I've been struggling a little with formatting and how to represent the critical concepts and data structures. So please make edits or add new representations if what I've done is unclear.

dr-shorthair commented 2 years ago

By which I mean

deepreef commented 2 years ago
  1. It is a sample of something By which I mean
    • it was obtained by an act of sampling
    • there is an intention that it be representative of something bigger, which should be identifiable now or later

Yeah, I get that. But here's why I'm still wrestling with it:

So if I collect a specimen of a bird and put it in a collection, what bigger thing is it a representative of? A flock? A population? A species? A vector of a disease? Ok, let's say one of those works, and it doesn't matter which. What, then, is an example of something physical that is not a representative of something bigger? I mean, if it's made of matter, then isn't it ultimately a representative of the universe?

I guess my question is: what are some examples of physical things that would not fulfill this third criterion? If it doesn't help us understand what is not in scope, then what purpose does it serve in the definition?

EDIT

OK, maybe when you say "is a sample of something", you mean the same thing that I mean when I say "It is under the direct control and care of humans"? That is, "it was obtained by an act of sampling" means the same thing as "it was taken into custody by humans". If those two cancel each other out, then that leaves the criterion that it must be a subset of something larger. To which I refer back to the rest of this post above.

deepreef commented 2 years ago

@stanblum: Thanks for the link to the Wiki page! Maybe I should have captured my "Dissertation" in that sort of template, rather than a series of Issue posts? I can reformat accordingly.

stanblum commented 2 years ago

I think the wiki formatting tools are too limited and too hard to edit. I think we should switch over to GoogleDocs. Do we have a folder already?

dr-shorthair commented 2 years ago

Q. Why do you collect and manage a sample? A. So that you can make observations on it. Q. Why are the observations interesting? A. In a science context: Because they tell us something about the taxon/population/ecosystem ... In a non-science context: Because we want to describe the artefact in its own right.

I think we are doing science, right? It is certainly true that the artefact may represent more than one thing, in context of different observations. It is also true that for some samples we don't know what they represent at the time they are collected and catalogued. But if we are doing science, then the path is from the particular to the general, and we should keep the general in view from the beginning.

smrgeoinfo commented 2 years ago

@deepreef -- examples of 'physical things that would not fulfill this third criterion' (not samples): The rocks I've picked up in the desert to bring home to use for landscaping the yard The wine glasses in my kitchen cabinet The boxes of laundry detergent on the shelves at the grocery store.

They are just things-- yes they can be categorized, but there is no intention of using them to learn anything about the world, they are just attractive or useful.

I assume the bird that is collected and preserved to put in a museum is not just a decoration-- there is some intention to learn something about the world from it...

deepreef commented 2 years ago

I think we are doing science, right?

Probably, but I don't think our definitions should hinge on intent. I think these things should focus on capturing facts, regardless of whether we want to do science with the information, or just look at pretty dead bugs. I mean, pretty dead bugs in a non-scientist's personal/private collection still function as evidence of occurrence -- assuming the data are accurate.

The rocks I've picked up in the desert to bring home to use for landscaping the yard

If you recorded the kind of rocks they were, and where they came from, then wouldn't that still be potentially useful information? How is it different for rocks in your yard vs. scientific specimens that are lost or destroyed after they are collected. In both cases, the Occurrence data are still valuable, and during the period of time when the samples (noun) were in possession /control of a Human, I would still consider them to be candidates for instances of MaterialSample.

As for the wine glasses and laundry detergent, these are out of the TDWG scope (non-biological), but I wouldn't automatically rule them out of scope for non-biological data nerds. Imagine I was a collector of rare wine glasses and found a dead insect in one of them. From my perspective, the insect would be worthless, but to an Entomologist, it might represent a new geographic record.

I realize I'm stretching things here, but I guess my point is that motivation/intent should probably not be among the criteria for defining the scope of Occurrences and MaterialSamples. What should matter is that someone took the time to record and document the information, and to share the information -- whatever their motivation was.

Jegelewicz commented 2 years ago

motivation/intent should probably not be among the criteria for defining the scope of Occurrences and MaterialSamples.

Agree! I think there are a lot of things currently recorded in museum catalogs that were collected because they were pretty or unique without any intent to study them. That doesn't make them less valuable or unable to be used as MaterialSamples now especially if there is some data to go with them (but for some forms of study, data isn't even that important).

dr-shorthair commented 2 years ago

On the contrary - I suggest that motivation/intent is central here. We do science. We deliberately design a sampling and observational program, in order to describe the world in a systematic way. This is not random.

stanblum commented 2 years ago

Natural existence versus human intention: maybe the compromise is to acknowledge that nearly infinite organism-space-time intersections have existed in nature, from the origin of life to now, but we can't/don't document them all. They enter our world of "stuff we care about and document as data" when we "sample" them or observe them. They cross the threshold into our information space. Acknowledging the similarity between biodiversity specimens and other material samples lets us "play nice" with the rest of the Organization for Biomedical Ontologies (OBO) world. I don't think accepting that subclassing scheme imposes a cost or an impediment. While I don't know what the logical implied benefits might be (thinking ontologies and reasoning), it seems worth it.

stanblum commented 2 years ago

It's also probably uncontroversial that our specimens/samples enable us to discover and document the characteristics of the biological systems they were drawn from. The systems represented don't have to be declared at the time of collection. The systems represented can be determined later from the documentation of context.

deepreef commented 2 years ago

@stanblum : Agreed!

@dr-shorthair :

On the contrary - I suggest that motivation/intent is central here. We do science. We deliberately design a sampling and observational program, in order to describe the world in a systematic way. This is not random.

I see where you're coming from, and it reminds me of a debate I had a while back with an esteemed anthropologist. His point was that you need to design science projects (and data models for capturing results) around hypotheses, so you need to know in advance why you're gathering the data, so your sampling design (and data model) allows you to properly tests your hypothesis.

I agreed, but countered that the mark of a good data model is that it allows you to answer questions you never even thought to ask when you were gathering the data.

I think both of these are in play here. I suspect that the vast majority of specimens in Museums (fodder for MaterialSample) were captured/killed/preserved with scientific intent. But when I record an observation of a fish on a reef using my video camera, I may have no idea at the time that it represents a depth record or a geographic range extension. So my intent in recording the video doesn't change the scientific value of the Occurrence record that it documents. This is true even if I am taking video of another diver, and the fish just happens to swim into frame. This is why I think intent (at the time of documenting an Occurrence record) isn't a prerequisite to capturing useful information. Obviously, if I killed the fish and put it in a Museum as an instance of MaterialSample, it's more likely that scientific intent was in play. But what if the fish was regurgitated from the stomach of a larger fish that I caught for dinner? I still think that counts as useful data destined for MaterialSample records to be shared with GBIF, even if there was zero scientific intent when the MaterialSample was obtained.

Jegelewicz commented 2 years ago

closing for focus on MaterialSample and properties