tdwg / mids

11 stars 7 forks source link

MIDS Element - MaterialType #14

Open RBGE-Herbarium opened 3 years ago

RBGE-Herbarium commented 3 years ago
MIDS information element MaterialType
Definition The material the object is composed of.
DwC term (latest, 2014-11-08)
ABCD term name (3.0)
Applicable standard(s)/recommendation(s)
Element identifier
Required Yes
Repeatable No
Constraints Controlled vocabulary
Examples To be added
Element specification status agreed; accepted in specification
Notes Definition of controlled vocabulary is needed.
RBGE-Herbarium commented 3 years ago

CETAF DWG Discussion:

Preparation not searchable in GBIF. Element required to find specimen in Institute. Element also used to determine digitisation cost and pipeline

RBGE-Herbarium commented 3 years ago

GBIF data in gbif_export_20200611_v2

SELECT count(*) FROM elspeth-mids-gbif-280011.gbif_export_20200611_v2.occurrence as oc where oc.v_preparations is not null

Result: 66,648,857 records

SELECT oc.v_preparations, oc.institutionCode, count(oc.v_preparations) FROM elspeth-mids-gbif-280011.gbif_export_20200611_v2.occurrence as oc where oc.v_preparations is not null and oc.institutionCode in ("W","MNHN","NHMD","TAMZ","TU(M)","MZH","MfN","BGBM","snmb","STU","ZFMK","National Museum of the Czech Republic","HNHM","MNHNL","Naturalis Biodiversity Center","NHMO","MIZPAN","MNCN","MA","GNM","GB","GBG","G","MHNG","NHMUK","National Museums Scotland","E","K","SAV") group by oc.institutionCode, oc.v_preparations

bq-results-20210120-122837-ydhq0a99j5dl.xlsx

RBGE-Herbarium commented 3 years ago

Recommendations for controlled vocabulary and standards

We could consider creating a definition of this element that aligns with the work of DiSSCo and iDigBio and GBIF as much as possible and provide recommendations for standards and controlled vocabularies such as the one below. This links with the work of the TDWG CD Group (https://github.com/tdwg/cd).

Join the Dots and the Collections Digitisation Dashboard use the list below as a draft. There isn't a consensus around definitions and e.g. object type vs preservation method.

preservationMethod list at the moment from CDD:

RBGE-Herbarium commented 3 years ago

RBGE currently submit the following to GBIF:

vpreparations | institutionCode | f0

wood sample | E | 60 other | E | 85 photograph of unspecified type (including photocopy) | E | 4586 DNA sample | E | 430 liquid-preserved material | E | 6855 herbarium specimen of unspecified type | E | 887309 seed | E | 36 fruit collection/cone collection (unmounted) (carpological collection) | E | 3310 chromosome / cytological specimen | E | 3 medicine/drug (prepared or semi-prepared sample used medicinally) | E | 21 bark | E | 52 herbarium sheet | E | 6

I think that the comment above about the confusion between the object type vs preservation method is significant. This needs to be discussed by the CD groups.

In terms of use cases the following are relevant for this element:

Finding the specimen - specimens are filed by taxonomic group, object type and preservation method. Thus a conifer specimen may be filed as:

Determining the suitability of the specimen for research (eg DNA extraction)

However, I think that RBGE could look at using a system that would either follow the list above or be easily mapped to it.

emhaston commented 3 years ago

GeoCase Specimen Type

image

falkogloeckler commented 3 years ago

SpecimenTypes in GeoCASe

Please be aware that these terms are not yet based on controlled vocabulary. The list is just a facette of indexed raw terms as provided by the different institutions.

only1chunts commented 3 years ago

Could be split into 3 terms:

MIDS level1: collection type - e.g collectionType from NCD: Archival | Art | Audio | Cell Cultures | Electronic | Facsimiles | Fossils | Genetic | Geological | Herbarium | Living | Manuscripts | Mineralogical | Observations | Preserved | Products | Specimens | Texts | Tissue | Visual

MIDS level1: preservation type - e.g. the way in which the material has been preserved There must be some controlled vocabularies for this somewhere?

MIDS level2: material type - e.g. material entity from OBIB https://www.ebi.ac.uk/ols/ontologies/obib/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FBFO_0000040&viewMode=All&siblings=false

smrgeoinfo commented 3 years ago

I'm working on a cross domain physical specimen (sample) metadata scheme for the iSample project. Scope includes earth/environmental science, archaeology/anthropology and biology. After studying various sample description schemes from these domains, we're focusing on a core metadata scheme with high level categories for specimenType, materialType, and sampledFeatureType with a controlled vocabulary of 10-20 classes for each type. The design then accommodates domain specific categories extending these type for more granular searching. SpecimenType is concerned with the kind of object (specimen)-similar to MIDS level 1 proposed above, and materialType with what the object (specimen) is composed of-- simlilar to MIDS level 3 proposed above. SampledFeature categories serve to identify broad context for the collection event. This is a work in progress, and we're interested in as much alignment as possible with TDWG work.

Preservation type is pretty specific to biological samples. I looked at the GBIF oc.v_preparations from the query in the comment above, and found quite a variety of things there, many of which I'd suggest are specimen types, roughly: object, biological specimen, tissue, animal, whole animal, animal part, tooth, bone, bird, bird part, plant, plant part, seed, tree part, DNA, human remains, bird nest, egg, bird nest with egg, cast, slide (microscope?), drug, image

wouteraddink commented 3 years ago

Hi Stephen that is interesting. The approach based on decision trees may help in providing guidelines or automated identification of the term needed. The current schema lacks detail however for biological specimen, for example feature of interest can also be a product of an organism such as a birds nest, and besides sampling of whole plants or leaves also other things can be sampled like pollen.

smrgeoinfo commented 3 years ago

@wouteraddink yes, things like bird's nest are a problem; seems like that would be in a sampled feature category, maybe with other things like cocoon, spider web, . Stuff made by human organisms is already accounted for (specimen type artifact). Also problematic is the material type for bone, egg shell, mollusc/clam/snail type shells. I was thinking pollen could be considered a part of a plant (specimen type organism part)?

smrgeoinfo commented 3 years ago

I added a new category on Sampled FEature for 'animal product' (Sampled feature is the product of an animal other than human being, e.g. bird nest, egg, cocoon, fecal matter, dung ball.) to account for Birds nest etc.

hardistyar commented 3 years ago

It's been pointed out by @mswoodburn that I missed the material property from the CD standard work when I was preparing my comparison of terms used by different initiatives.

This is another direct cross-connection into the CIDOC-CRM standard via E57 Material.

smrgeoinfo commented 3 years ago

E57 Material does not seem to have a definition, and inclusion of 'brick' (which by most reckoning would be an Object I suspect) and 'gold' as examples looks to be incoherent.

wouteraddink commented 3 years ago

iSamples version is now: https://github.com/isamplesorg/metadata/blob/main/vocabulary/MaterialTypeDecisionTreev3.pdf IGSN is using http://vocabulary.odm2.org/medium/ which also seems to fit. Most specimens would be 'organism' in that one and in the iSamples version would be 'organic material'. 'Organism' seems closer to ObjectType (countable thing). the iSamples approach seems cleaner, however interoperability with IGSN would be a nice to have. Both are missing a category for meteorites. In geocase there is no field for this, however geocase distinguises between fossils, minerals, rocks and meteorites.

matdillen commented 2 years ago

This type of information is often currently available/known at a dataset or collection level, although not in a machine-readable manner. Specimens are published in sets that have similar material types and these types are considered to be evident (to humans) based on the dataset title, keywords or provenance.

Implementations of the currently in development Latimer Core would enable machine readability and facilitate MIDS calculation using data from the dataset/collection level, rather than individual specimen records. The downside to this approach is that some collections/datasets are heterogeneous and would not be covered, although it could be argued that these sets, in the absence of any material-and-other-type information at record level, are not worthy of MIDS > 0.

Some sets are also mostly homogeneous, but with a few artifacts/curiosities/errors. In this case, the MIDS score will at least be 'mostly' correct - but this may not be where we want to go. Although MIDS has been defined from the start to be agnostic about data quality, so herbarium sheets mislabeled as preserved birds would not violate MIDS conditions.