tdwg / material-sample

A Task Group of the Observations and Specimen Records (OSR) Interest Group
2 stars 0 forks source link

Primary Deliverable - MaterialSample definition #2

Closed Jegelewicz closed 4 months ago

Jegelewicz commented 2 years ago

Current Definition

http://rs.tdwg.org/dwc/terms/MaterialSample

A physical result of a sampling (or subsampling) event. In biological collections, the material sample is typically collected, and either preserved or destructively processed.

Please suggest changes/improvements in this issue.

See also https://github.com/tdwg/material-sample/blob/main/primary_deliverable/MaterialSample.markdown

See also MaterialSample terms Google Sheet

deepreef commented 2 years ago

I think the use of two different terms ("physical object" - "physical entity") can be defended to reflect that a subject of collection or observation will as a matter of necessity be spatially more confined than physical entities in general (the latter including, for example, all geese in the world, the taiga, Earth's atmosphere).

OK, thanks! That makes sense to me. But it wasn't immediately obvious (to me, at least) from the wording of the definition. I think the wording of the definition can stay as it is, as long as the non-normative explanatory comments help folks understand the implications of the distinction (physical object vs. physical entity) -- as you suggest.

Note that composition can also go the other way: each of a dinosaur skeleton's bones in a collection can be an instance of dwc:MaterialSample as can be the set of bones, even if they aren't physically associated in the collection.

Yes! Definitely. I think it's clear that aggregates of multiple disconnected MS items can be collectively bundled into a single umbrella/parent MS instance. Whether or not those individual component items came from the same instance of Organism, or multiple Organism instances, shouldn't make any difference.

If that set of bones is incomplete, also the set of bones that potentially make up the whole skeleton could, IMO, be minted as another instance of dwc:MaterialSample satisfying certain informatic needs.

I guess so... but this almost sounds like advocation for accepting hypothetical/inferred physical objects within scope. I don't immediately have anything against that, but I worry that it might be flirting with the edges of the MS scope a bit. I'm thinking of the type specimen for Nessiteras rhombopteryx. But in that case, the physical object is not hypothetical -- it's just that it might be an Organism, and it might (probably) be a rock.

RogerBurkhalter commented 2 years ago

In October 2020, we had a meeting of the Paleo "Happy Hour" on the topic of clusters/fossils on a slab and otherwise instances of what I refer to as "loanable objects" (cannot loan one without loaning all objects in or on a "container"). The list we came up with may be useful. We sub-divided the list into Natural Accumulations and Artificial or Anthropogenic Accumulations:

Natural Accumulations: Fossil bearing rock rich in abundance

Artificial or Anthropogenic Accumulations: Palynomorph slides (strew samples) Diatom slides (strew samples) SEM stubs - Single taxon

The antithesis of these examples for anthropogenic modification is serial thin sections or peels of an individual fossil, which would fall under the current definition of MaterialSample. What we discussed briefly are examples of display fossils that are composits of several individuals, usually vertebrate fossils, sometimes invertebrate or plant fossils. The point is that a wide variety of natural and anthropogenic objects are possible. This list was assembled from the Invertebrate Paleontology and Paleobotant collections at one museum.

RogerBurkhalter commented 2 years ago

I left these comments last night after two long weeks of interviewing Curator candidates for the collection I am CM for. I left these as examples of paleontology collection objects and the difficulties of fitting our collections into existing CMS and DwC models. Paleontology objects are rarely "as found", they routinely require some effort at initial preparation to further expose fossilized biological objects and then some means to preserve not only the object but the relationship between it and other associated objects found within the original sample (i.e. dwc:associatedOrganisms). However, consider for example a palynology sample: a quantity of rock is collected from a locality, transported to a lab, the rock is broken into smaller peices and placed in a jar for reserve and a subset is separated and further reduced to a coffee ground consistancy and stored, a subset of the ground sample is then placed in an acid resistant beaker and processed with HF for a day or more, washed and centrifuged. The processed "residue is then stored in a vial and a subset of the now processed residue further cleaned and pipetted onto a microscope slide cover slip, dried and flipped onto a microscope slide. When the slide is examined, you finally can see the objects collected, along with hundred to thousands of other co-occuance objects (dwc:associatedOrganisms). What is the MaterialSample, the original collected rock, the ground residue, the acid prepared residue, the microscope slide itself, or the pollen grain on the slide (that has a Linnean name and coordinates from an England finder)?

The original collected sample may have other fossil forms embedded within that would be destroyed by HF processing. Subsets may be processed by other acids (Formic, dilute HCl, Acetic, etc.) or reducuced and hand-picked under a binocular microscope to produce Conodonts, Foraminifera, Calcareous Nanofossils, or megafossils, etc.. Derivative samples may also be processed for non-biological data such as isotopes of strontium, carbon, oxygen, boron, etc. and/or radiometric dating. As such. I would see the original collected sample as the parentMaterialSample to maintain at least some relationship to all of the derivative biological and non-biological entities, with the processed samples as (what?) and the (identified and named) biological objectas the MaterialSample. Seems a lot of processed derivative samples either carry the same designation as the original parentMaterialSample, or are absorbed into dwc:preparations (that do not really fit as not a preservation method), and making the link between the parent and child somewhat muddy. This is also very true of commercial CMS and why I use a database I created to keep these relationships and results discoverable. Mapping to DwC has always been a challenge (need more coffee).

Sorry for the long comments.

tucotuco commented 2 years ago

@RogerBurkhalter This kind of detailed use case is immensely useful. It highlights both the value of the concept of a parentMaterialSample (see https://github.com/tdwg/dwc/issues/344) and its limitations. By limitations, I mean, "What does it mean to be the parent?" I suspect we need a much richer way to relate materials, with something at the level of a ResourceRelationship where the nature of the relationship can be specified. In the Diversifying the GBIF Data Model work, the model anticipates the relationships "part of" and "derived from" as well as a separate mechanism to establish membership in a material group that was developed for the OBIS Community Measurement use case, but that would also work for other purposes.

deepreef commented 2 years ago

@RogerBurkhalter :

What is the MaterialSample, the original collected rock, the ground residue, the acid prepared residue, the microscope slide itself, or the pollen grain on the slide (that has a Linnean name and coordinates from an England finder)?

My answer: The MaterialSamples (plural) are: whatever units of physical material(s) warrant identification and associated metadata from an informatics perspective.

In other words, the decision of whether to mint a new materialSampleID value (=establish a MaterialSample instance) should be driven by a specific need to track information related to a particular unit of physical material.

Some use cases in my (non-fossil) world: Whole fish is collected, fin-clip is removed and preserved separately (for DNA analysis), scale falls off body and is lost: 1) Whole specimen (voucher) 2) Tissue sample (parent=1)

In this case, I would not bother assigning a separate MS instance to the fish before its fin clip was removed (or scale lost); and I would not bother assigning a separate MS instance to the lost scale, because I have no informatic need to track either of those separately from the two MS instances I do mint.

Whole bird is collected, put in freezer, and accessioned/catalogued. Later, the skin is removed and prepared dry, the internal organs are preserved in alcohol, the skeleton is processed with the aid of dermestid beetles. Later still a subsection of tissue is removed from the preserved organs for DNA analysis. 1) Whole organism as frozen/accessioned/catalogued 2) Skin (parent = 1) 3) Internal organs in alcohol (parent = 1) 4) Skeleton (parent = 1) 5) Tissue for DNA (parent = 3)

In this case, I have an informatic need to track the whole organism prior to dissociation of parts (object that is accessioned and catalogued), so I do assign an MS instance to this. I do not bother assigning MS instances to the blood and other tissue that ended up in the waste basket, nor the tissue consumed & digested by the dermestid beetles, because I don't have an informatic need to track them. Because the tissue sample was subsequently removed from the alcohol-preserved organs, I treat it as a child of that MS (3), rather than a a direct child of the whole (1). That way, the curation history of the tissue sample is more precisely/completely represented in the chain of preservation processes (e.g., in case the internal organs where first fixed in formalin, so I would then know that the derived tissue sample is not fit for purpose for DNA sequencing).

These are pretty straightforward examples in my mind. Another straightforward example is if a feather is plucked from the skin and used for some purpose/preparation/whatever, in which case I would mint: 6) Feather (parent = 2).

But here's where it gets interesting, relative to the earlier discussion on "undetatched" MS children. Suppose I photograph just the wing of the mounted skin. Would I have a need/desire to mint a new MS instance for the wing, even though it is still physically part of the whole skin? That way, I could make the subject of the image the wing alone, rather than the whole skin preparation. But is that really good practice? I honestly dunno.

In any case, I think the same basic logic ("Do I have an informatic need to track properties or relationships of a particular aggregate/unit of physical material?") would apply in the example you gave for which bits get distinct instances of MS.

deepreef commented 2 years ago

@tucotuco :

I suspect we need a much richer way to relate materials, with something at the level of a ResourceRelationship where the nature of the relationship can be specified.

I've come around to applying that logic to all relationships within DwC. In other words, whenever there is an xxxID term/property within a DwC Class, I'm leaning towards representing those values not as direct properties of the root instance, but as instances of ResourceRelationship.

I think of this as a "semi-serialized" approach. That is, literal values are treated as direct properties of DwC class instances (e.g., property "fields" to the class "tables"), but all "foreign key" property values are captured as an "octuple store" (eight terms organized in dwc:ResourceRelationship).

I have no idea whether this quasi-hybrid relational model/serialized model is practical or sensical, but it feels like a potentially practical middle-ground between the two different ways of representing data (i.e., tables & fields vs. triple-store).

Jegelewicz commented 1 year ago

Attendance at the 2022 working session included a lot of people who are not members of the Task Group and their primary concern was with the baggage that might be associated with "sample".

Mathias Dillen: This definition includes a physical photograph and a physical drawing of an organism or fossil, right? Carlos Martínez: I wouldn't consider those as part of a material sample, e.g., I don't sample a photograph, I take a soil sample in a bag Carlos Martínez: photographs and illustrations are "works" in the sense of the ZooCode Carlos Martínez: Samples come from sampling, e.g., collecting material things during a field trip / sampling event. A digital photograph is not a material sample. I think that we are confusing "collection objects" with material samples in the sense used by field biologists. I am a biologist and calling a photograph a material sample is counterintuitive (read wrong) to me. Deborah Paul: A herbarium sheet, sometimes only has an image on it. Carlos Martínez: A herbarium sheet is not a material sample, the plant on it is. If there is no plant and there is just a picture, there is no material sample on the sheet and the sheet is just a collection object. Carlos Martínez: Things that are included in a material sample: the three main materials upon which scientific names of animals are based: 1) specimens, 2) fossils that are substitutions (replacements, impressions, moulds and casts) for the actual remains of animals, and 3) the fossilized work of animals (ichnofossils).

It was clear to me that people were looking for something that could encompass any physical material whether it was a "sample" or not if we hope to allow collections to use DarwinCore to share their objects. There was also discussion about the use of the term sample when associated with human remains. As I have first-hand experience attempting to remove "specimen" from everything in a CMS, I completely understand the concern. Notes from the session include this: “Sample” is problematic, consider “catalogueRecord”, “object”, “entity”, “unit”

Mariel Campbell: If we change the term to Material Object or Entity, and define it as a physical object, then an herbarium sheet with a physical photograph in it is indeed a Material Object. It can be barcoded and loaned. John Wieczorek: “Entity” might not be so easy to understand. But then maybe it will force people to read the definition. ;-) Mariel Campbell: It also avoids calling a part or representation of a human or named animal as an "object" which is an issue

It was also discussed that a "material" class should start with a "High-level distinction between material and information artefact" as this would mesh with the LatimerCore baseTypeOfCollection

So - should we really be starting with the class MaterialEntity? Would this be equivalent to the Dublin Core PhysicalResource?

Term PhysicalResource
URI http://purl.org/dc/terms/PhysicalResource
Label Physical Resource
Definition A material thing.
Type of Term Class

I know this feels like a step backward.

Steve Baskauf: I feel like we are plowing ground that was plowed a year ago in this group...

BUT as LatimerCore is currently in expert review and they cover a lot of things that crossover into material, I think we need to think deeply about this.

Jegelewicz commented 1 year ago

Changes as suggested in #37 added to review package - https://github.com/tdwg/material-sample/blob/main/review%20package/MaterialSample.md

Jegelewicz commented 1 year ago

Term Change submitted - https://github.com/tdwg/dwc/issues/451

Jegelewicz commented 4 months ago

change complete - https://github.com/tdwg/dwc/issues/451