tdwg / mids

11 stars 7 forks source link

MaterialType - definition and controlled vocabulary needed. #5

Open hardistyar opened 3 years ago

hardistyar commented 3 years ago

Object type versus preparation/preservation method are not sufficiently well differentiated in present practical usage.

An opportunity exists to introduce a controlled vocabulary for a new MaterialType information element to tidy this up.

This spreadsheet contains examples of how different kinds of specimens have been mapped to various existing terms/fields.

Source: CETAF Digitization Working Group, 7th December 2020.

only1chunts commented 3 years ago

the Ontology for BIoBanking (OBIB)has a start of an ontology for material types, but does not extend to the non-biological specimens that we would need.

RBGE-Herbarium commented 3 years ago

Links to #14

hardistyar commented 3 years ago

the Ontology for BIoBanking (OBIB)has a start of an ontology for material types, but does not extend to the non-biological specimens that we would need.

@only1chunts These seem mainly to relate to tissue preparations of internal body organs for biobanking purposes. I'm not sure how relevant that is for natural science collections.

Having said that, the Smithsonian NMNH Biorepository is an example of where we might want some new materialtypes such as 'silica dried' and 'liquid nitrogen preserved' - I picked these up from https://doi.org/10.3897/BDJ.5.e11625. There might be more. According to those published guidelines these terms are likely to be used in conjunction with some other material descriptor such as 'leaf' or 'flower'. Perhaps materialType needs to be two part information element in which we say what is preserved and how it is preserved.

For reference dwc:preparations is just a list with separate values separated by a vertical bar e.g., fossil, cast, photograph, DNA extract, skin | skull | skeleton, whole animal (ETOH) | tissue (EDTA).

jmacklin commented 3 years ago

There may be useful concepts in the GGBN standard. They have terms for preparation and/or preservation and some vocabs that may be relevant although with a molecular bent. There is a current mapping activity going on between GGBN and DwC...

https://wiki.ggbn.org/ggbn/GGBN_Data_Standard_v1

smrgeoinfo commented 3 years ago

The approach that's been used in the originally geoscience focused IGSN system is
MaterialType -- what is the specimen itself composed of? (distinct from any preservation media). Here's a decision tree for our current Material type draft for iSamples. You'll see its pretty high level, for faceting search results across a registry of samples from geoscience, archaeology, biology in current work. ObjectType -- what kind of thing is the specimen. This has been renamed SpecimenType in our current iSample work, here's the decision tree for Specimen Type for the current vocabulary draft

After reviewing sample description datasets from various sources in the iSamples work, we're thinking a third facet of basic sample categorization would be useful; currently labeled 'sampledFeature type', intending capture in broad terms the kind of thing the sample represents. Here's the decision tree for the Sampled Feature Type current draft.

For a cross domain sample registration system, some convergence on these basic, common facets for categorizing samples is critical for interoperability. The vocabularies for these facets should be relatively small, in the range of ~20 terms to keep it manageable. User interfaces for this kind of system will need to present users with terminology they are familiar with, probably using categorization schemes that are more granular; in the back end, these domain-specific schemes will need to map to the high level categories for interoperability, while maintaining the granular domain terminology for local usage.

hardistyar commented 3 years ago

@wouteraddink made a proposal at Sixth TG meeting, 6th May based on six questions:

  1. What kind of object is it?
  2. Under which discipline was it described?
  3. What is it made of?
  4. What does it look like? (in the sense of how it is prepared)
  5. How is it fixed/preserved?
  6. Is it a whole, a part or a lot?

My rough spreadsheet tries to compare against this proposal from perspectives of TDWG CD, iSamples, DwC, ABCD/EFG and then to make a MIDS proposal, summarized in the table below and derived from the proposal by @wouteraddink.

For MIDS there are several issues:

  1. Which elements should be included at MIDS level 1 and at MIDS level 2? At level 1 this should be enough to support discovery of relevant specimens and to aid further digitization. The table proposes three elements are needed at level 1 and that other more specific information should appear at level 2.
  2. What should be the name of the information elements corresponding to each of the six questions above? To what extent should MIDS (and openDS) maintain alignment with pre-existing term names/labels in other standards, especially when there are several options? What implications might this have elsewhere in MIDS and other new standards? Generality towards/across a range of disciplines (biology/zoology/botany, geology, archaeology) might be becoming more important than maintaining alignment to, for example Darwin Core. The table contains current suggestions.
  3. What should be the controlled vocabulary associated with each of the six information elements? Here we have a tension between terms and existing familiar lists of words used in a loose and fragmented manner and new, more controlled lists of words being proposed by, for example iSamples. This is to be discussed further (t.b.d.). One suggestion might be to adopt short controlled vocabularies (<20 words) for each of the two coarsest levels (Q1, Q3) e.g., as has been suggested by iSamples; and to prepare longer controlled or partially controlled lists of preparationTypes (Q4) and preservationMethods (Q5) lower down. A tree structure can help.
Question MIDS level Element name Vocabulary
1. What kind of object is it? 1 materialSampleType or objectType (latter is preferred) t.b.d.
2. Under which discipline was it described? 2 discipline t.b.d.
3. What is it made of? 1 materialType t.b.d.
4. What does it look like? (in the sense of how it is prepared) 1 preparationType t.b.d.
5. How is it fixed/preserved? 2 preservationMethod t.b.d.
6. Is it a whole, a part or a lot? 2 ?? Is it needed? t.b.d.
Rindiser commented 3 years ago

I think Question 6 is needed, as certain part of the collection will be digitized as a lot, Element name: MaterialSample?

smrgeoinfo commented 3 years ago

iSamples specimenType vocabulary includes classes that cover the 'whole', 'part', 'lot' distinction I think. 'lot' is named 'aggregation' in the current draft. I think 6 and 1 can be combined.

smrgeoinfo commented 3 years ago

I think the list is missing an important facet-- what kind of thing does the sample represent (what kind of sampledFeature).

only1chunts commented 3 years ago

For MIDS level 1, I agree with Alex's table above, i.e. objectType, materialType and preparationType. I think concentrating on getting those 3 defined with appropriate controlled vocabularies should be the focus. Then when we come to look at level 2 terms we can address the others.

ramonawalls commented 3 years ago

I also suggest combining 1 and 6, as material sample type covers where it is whole, part, or aggregate.

Strongly agree with Alex's comment about a tree structure. For iSamples, we are only defining a top level, and we expect domain or discipline specific details to fall below them in one or more trees.

wouteraddink commented 3 years ago

Thinking about digitisation, I think 6 is an additional level of detail to 1, where in a digitisation street the object type may be scored for all objects in a batch at once, but whether it is a whole, a part or a lot needs to be scored for each object individually, so I would not combine them and keep 6 in MIDS lvl 2

wouteraddink commented 3 years ago

@smrgeoinfo: important data, but the feature that is sampled seems not relevant in digitisation of a specimen, I see it more as metadata that is part of the sampling event not as part of the specimen metadata (it will be the same for all specimen collected in the sampling event).

wouteraddink commented 3 years ago

I would put discipline in mids 1 as it is easy to score in batch during digitisation and I think it makes sense to use discipline in e.g. monitoring progress in digitisation, assigning digitisation priorities and perhaps also in digitisation policies alignment. Monitoring progress by objectType seem to make less sense, looking at the values of iSamples that would lead to a dashboard where you compare progress in digitisation in e.g. organism parts vs organism products and biome aggregrations rater than progress in botany collections vs zoology invertebrates collections

cboelling commented 3 years ago

Looking at the current version of the spreadsheet comparison I think that what the proposed terms are expected to represent needs to be clarified more (together with an update on the proposed definitions). There are a number of cases where an element turns up as example for different attributes in the different schemes compared (e.g., fossil). While orthogonality between the attributes is perhaps not necessary, it would be nice to have a more clearly defined complementarity especially among objectType, materialType and preparationType.

If (1) and (3) are to operate as a hierarchical tandem I think it will be necessary to determine the desired level of granularity in each case (and reflect it in corresponding definitions) even if the actual categories / the controlled vocabulary will be determined later.

Regarding (6) I think that there are many examples where a collection object can be legitimately considered a whole and a part at the same time (e.g., a bone of a skeleton, a mineral on a larger piece of rock). I would favor to represent these relations between physical specimens (and the downstream relations between their digital representations) as attributes connecting object IDs.

matdillen commented 3 years ago

I wonder if we could trim this down to:

  1. What does the object represent? objectType (MIDS1)
  2. What does the object look like? preparationType (MIDS1)
  3. How is the object preserved? preservationMethod (MIDS2)

(+ preservationMode for fossils) (MIDS2)

With these three, we keep track of what the specimen was initially (1), what it is now (2) and what happened to it along the way (3). This does imply that the current vocabularies need to be modified, so the distinction between objectType and preparationType is more clear. A lot of fishes in a jar does not represent a lot of fishes in a jar, but a lot of (whole) fish. The lot was preserved with a certain fluid and is now kept in a jar.

discipline seems much more suitable to be defined as a taxonomic term, at least for the biological specimens. If we're looking at this from a more general, curatory perspective, we should probably be looking at collectionCode (currently MIDS2). Connecting a specimen to a scientific discipline is much less obvious. Is this the intent of the collector? How do we determine that? What if there are multiple scientific disciplines that may apply?

Maybe we want an explicitly taxonomic term in addition to the name (dc:title analogue) in MIDS1? But I'm not sure how that works for nonbiological specimens.

I'm not sure what the added value is of materialSample (what is it made of), when compared to the information present in the three concepts listed above. In what scenario is the content of this field useful? Can it not always be deduced from the other ones? Are we not always going to say that a soil sample is made out of soil, an animal out of organic material and a rock out of rock?

As other have suggested, I would include the whole, lot, part distinction in an ontology for objectType.

ramonawalls commented 3 years ago

@smrgeoinfo: important data, but the feature that is sampled seems not relevant in digitisation of a specimen, I see it more as metadata that is part of the sampling event not as part of the specimen metadata (it will be the same for all specimen collected in the sampling event).

I disagree @wouteraddink. Since a material sample is always a sample of something, it is very important to know what it is a sample of. Yes, this is also linked to the sampling event, but since we are not building an ontology here that fully describes the sampling event, I think it is very important to include the sampled feature as part of the metadata for the sample. Now if you want to build an ontology... :)

ramonawalls commented 3 years ago

I wonder if we could trim this down to:

  1. What does the object represent? objectType (MIDS1)
  2. What does the object look like? preparationType (MIDS1)
  3. How is the object preserved? preservationMethod (MIDS2)

(+ preservationMode for fossils) (MIDS2)

With these three, we keep track of what the specimen was initially (1), what it is now (2) and what happened to it along the way (3). This does imply that the current vocabularies need to be modified, so the distinction between objectType and preparationType is more clear. A lot of fishes in a jar does not represent a lot of fishes in a jar, but a lot of (whole) fish. The lot was preserved with a certain fluid and is now kept in a jar.

discipline seems much more suitable to be defined as a taxonomic term, at least for the biological specimens. If we're looking at this from a more general, curatory perspective, we should probably be looking at collectionCode (currently MIDS2). Connecting a specimen to a scientific discipline is much less obvious. Is this the intent of the collector? How do we determine that? What if there are multiple scientific disciplines that may apply?

Maybe we want an explicitly taxonomic term in addition to the name (dc:title analogue) in MIDS1? But I'm not sure how that works for nonbiological specimens.

I'm not sure what the added value is of materialSample (what is it made of), when compared to the information present in the three concepts listed above. In what scenario is the content of this field useful? Can it not always be deduced from the other ones? Are we not always going to say that a soil sample is made out of soil, an animal out of organic material and a rock out of rock?

As other have suggested, I would include the whole, lot, part distinction in an ontology for objectType.

Again, if you were building an ontology, you could probably deduce what a sample was made of by its sample type, but you are not building an ontology. Therefore, you need to capture all of the information that could be used to build one.

emhaston commented 3 years ago

At RBGE, we have looked at the RBGE Herbarium collection data to determine how the 6 questions proposed by Wouter could be implemented.

In doing this, we considered the kinds of objects that we have in the collections, the use cases for the categorisation and the purpose and outcome of categorising the objects. I think that considering the use cases for categorising could be helpful for discussions.

The high level use cases we identified are:

  1. Curation
  2. Digitisation
  3. Research
  4. Exhibitions

Within each of these there are additional use cases for categorising objects including:

  1. Location (findability)
  2. Digitisation method (equipment and pipelines)
  3. Research discipline, expertise, techniques & equipment
  4. Storage requirements (environmental conditions, space etc)
  5. Access (physical, virtual, including loanability)

For each of these use cases, the following aspects were considered to be important:

  1. Size
  2. Shape
  3. Container
  4. Physical and chemical structure
  5. Preservation method

This was a start, and I'm sure we will be missing things. Just to reiterate that we were focussing very much on Herbarium collections.

As part of this exercise, we started with the following list of objects:

Herbarium sheet Carpological specimen Spirit Microscope slide Silica-dried TLC Extracted DNA Destructive sample Wood samples Seed sample Spore prints Photographic slides Photographs Photographs of specimen Illustrations Soil sample Herbarium packet SEM stubs Air/silica dried material SEM images Soil/water sample/ cultures

We then started pulling this together into a framework, working through each item, moving them into either object type or preservation method or structure, or part of organism, etc. We refined this process during several iterations. We have now gone through the RBGE list of objects and mapped them to the following categories:

  1. Object Type
  2. Preparation Type
  3. Preservation method
  4. Structure
  5. Format
  6. Whole organism / what part of organism

When we looked at these in terms of the Element Issue format and mapping we then came up with the following examples:

image

image

image

image

This is just to contribute to the discussion.

smrgeoinfo commented 3 years ago

Analysis of object types from https://github.com/tdwg/mids/issues/5#issuecomment-853762856

Herbarium sheet

The object is a mounting sheet.

Carpological specimen

The object is ??? what???

Spirit

The object is what???

Microscope slide

Object is a glass mounting sheet

Silica-dried

This is a preservation process??? What was dried??

TLC

????

Extracted DNA

Object is...? probably some kind of container with the DNA in it. DNA is a material, not an object

Destructive sample

?? I guess this means the material has been consumed in some analytical process? What was the object?

Wood samples

object (probably) a peice of wood, or a bag of peices of wood (hopefully from the same plant)

Seed sample

object (probably) a seed, or a bag of seeds (hopefully from the same plant)

Spore prints

Object is a print (??? peice of paper, or other imaging material???). Analogous to photograph of specimen?

Photographic slides

Object is a piece of film

Photographs

Object is a piece of paper

Photographs of specimen

The photo is a related resource about the specimen, not the specimen

Illustrations

The illustration is a related resource about the specimen, not the specimen

Soil sample

Object is (likely) a bag of granular material

Herbarium packet

Object is a packet (container) containing plant fragments (from the same plant??, or from individuals of same species?)

SEM stubs

Object is a stub (some kind of mounting object, analogous to herbarium sheet or microscope slide)

Air/silica dried material

?? object is what??? granular aggregate of plant parts, individual plant part? whole plant?

SEM images

The image is a related resource about the specimen, not the specimen

Soil/water sample/ cultures

Object is some kind of culture container?

smrgeoinfo commented 3 years ago

I wonder if we could trim this down to:

  1. What does the object represent? objectType (MIDS1)
  2. What does the object look like? preparationType (MIDS1)
  3. How is the object preserved? preservationMethod (MIDS2)

(+ preservationMode for fossils) (MIDS2)

I would modify somewhat:

  1. what kind of object is it (objectType)

  2. what is it composed of (materialType)

  3. what does it represent (sampled feature)

  4. how is it preserved (preservationMethod)? is not applicable to kinds of non-biological specimens that don't need preservation.

  5. preservation mode (taphonomy) is very specific to fossil specimens, and should be an additional property required in a fossil specimen profile.

wouteraddink commented 3 years ago

I can live with inclusion of sampled feature. What I get from the discussions though is that already at the very minimal level of MIDS1 we seem to have different needs for different classes of objects: preservation mode only for fossil specimens, material type for non-biological specimens (earth samples), preservation method only for preserved specimens. I think we therefore need different metadata profiles for these different classes, which we should perhaps treat as different digital specimen (sub-)types: preserved biological specimen, fossil biological specimen, living biological specimen, earth sample specimen, recorded specimen (e.g. sound recordings, drawings, photos). These should be extendible with classes for non-natural history specimens in the future if there is a need for it.

smrgeoinfo commented 3 years ago

I think that profiles are a good idea, and would suggest that the specimenType property could be the basis for determining the profile. Likely there would be a hierarchy of profiles, e.g. 'Whole organism' profile might have child profiles for 'preserved', and 'living'. In our iSamples thinking, drawings, or photos of physical specimens are related resources, not physical specimens. Sound recordings (e.g. bird sounds) are an interesting case; my off the cuff reaction is that the recording is a kind of dataset that is linked to a physical thing (the bird) in the world.