Open RBGE-Herbarium opened 3 years ago
MIDS information element | MaterialType |
Definition | The material the object is composed of. |
DwC term (latest, 2014-11-08) | |
ABCD term name (3.0) | |
Applicable standard(s)/recommendation(s) | |
Element identifier | |
Required | Yes |
Repeatable | No |
Constraints | Controlled vocabulary |
Examples | To be added |
Element specification status | agreed; accepted in specification |
Notes | Definition of controlled vocabulary is needed. |
CETAF DWG Discussion:
Preparation not searchable in GBIF. Element required to find specimen in Institute. Element also used to determine digitisation cost and pipeline
GBIF data in gbif_export_20200611_v2
SELECT count(*) FROM elspeth-mids-gbif-280011.gbif_export_20200611_v2.occurrence
as oc
where oc.v_preparations is not null
Result: 66,648,857 records
SELECT oc.v_preparations, oc.institutionCode, count(oc.v_preparations) FROM elspeth-mids-gbif-280011.gbif_export_20200611_v2.occurrence
as oc
where oc.v_preparations is not null
and oc.institutionCode in ("W","MNHN","NHMD","TAMZ","TU(M)","MZH","MfN","BGBM","snmb","STU","ZFMK","National Museum of the Czech Republic","HNHM","MNHNL","Naturalis Biodiversity Center","NHMO","MIZPAN","MNCN","MA","GNM","GB","GBG","G","MHNG","NHMUK","National Museums Scotland","E","K","SAV")
group by oc.institutionCode, oc.v_preparations
Recommendations for controlled vocabulary and standards
We could consider creating a definition of this element that aligns with the work of DiSSCo and iDigBio and GBIF as much as possible and provide recommendations for standards and controlled vocabularies such as the one below. This links with the work of the TDWG CD Group (https://github.com/tdwg/cd).
Join the Dots and the Collections Digitisation Dashboard use the list below as a draft. There isn't a consensus around definitions and e.g. object type vs preservation method.
preservationMethod list at the moment from CDD:
RBGE currently submit the following to GBIF:
vpreparations | institutionCode | f0
wood sample | E | 60 other | E | 85 photograph of unspecified type (including photocopy) | E | 4586 DNA sample | E | 430 liquid-preserved material | E | 6855 herbarium specimen of unspecified type | E | 887309 seed | E | 36 fruit collection/cone collection (unmounted) (carpological collection) | E | 3310 chromosome / cytological specimen | E | 3 medicine/drug (prepared or semi-prepared sample used medicinally) | E | 21 bark | E | 52 herbarium sheet | E | 6
I think that the comment above about the confusion between the object type vs preservation method is significant. This needs to be discussed by the CD groups.
In terms of use cases the following are relevant for this element:
Finding the specimen - specimens are filed by taxonomic group, object type and preservation method. Thus a conifer specimen may be filed as:
Determining the suitability of the specimen for research (eg DNA extraction)
However, I think that RBGE could look at using a system that would either follow the list above or be easily mapped to it.
GeoCase Specimen Type
SpecimenTypes in GeoCASe
Please be aware that these terms are not yet based on controlled vocabulary. The list is just a facette of indexed raw terms as provided by the different institutions.
Could be split into 3 terms:
MIDS level1: collection type - e.g collectionType from NCD: Archival | Art | Audio | Cell Cultures | Electronic | Facsimiles | Fossils | Genetic | Geological | Herbarium | Living | Manuscripts | Mineralogical | Observations | Preserved | Products | Specimens | Texts | Tissue | Visual
MIDS level1: preservation type - e.g. the way in which the material has been preserved There must be some controlled vocabularies for this somewhere?
MIDS level2: material type - e.g. material entity from OBIB https://www.ebi.ac.uk/ols/ontologies/obib/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FBFO_0000040&viewMode=All&siblings=false
I'm working on a cross domain physical specimen (sample) metadata scheme for the iSample project. Scope includes earth/environmental science, archaeology/anthropology and biology. After studying various sample description schemes from these domains, we're focusing on a core metadata scheme with high level categories for specimenType, materialType, and sampledFeatureType with a controlled vocabulary of 10-20 classes for each type. The design then accommodates domain specific categories extending these type for more granular searching. SpecimenType is concerned with the kind of object (specimen)-similar to MIDS level 1 proposed above, and materialType with what the object (specimen) is composed of-- simlilar to MIDS level 3 proposed above. SampledFeature categories serve to identify broad context for the collection event. This is a work in progress, and we're interested in as much alignment as possible with TDWG work.
Preservation type is pretty specific to biological samples. I looked at the GBIF oc.v_preparations from the query in the comment above, and found quite a variety of things there, many of which I'd suggest are specimen types, roughly: object, biological specimen, tissue, animal, whole animal, animal part, tooth, bone, bird, bird part, plant, plant part, seed, tree part, DNA, human remains, bird nest, egg, bird nest with egg, cast, slide (microscope?), drug, image
Hi Stephen that is interesting. The approach based on decision trees may help in providing guidelines or automated identification of the term needed. The current schema lacks detail however for biological specimen, for example feature of interest can also be a product of an organism such as a birds nest, and besides sampling of whole plants or leaves also other things can be sampled like pollen.
@wouteraddink yes, things like bird's nest are a problem; seems like that would be in a sampled feature category, maybe with other things like cocoon, spider web, . Stuff made by human organisms is already accounted for (specimen type artifact). Also problematic is the material type for bone, egg shell, mollusc/clam/snail type shells. I was thinking pollen could be considered a part of a plant (specimen type organism part)?
I added a new category on Sampled FEature for 'animal product' (Sampled feature is the product of an animal other than human being, e.g. bird nest, egg, cocoon, fecal matter, dung ball.) to account for Birds nest etc.
It's been pointed out by @mswoodburn that I missed the material property from the CD standard work when I was preparing my comparison of terms used by different initiatives.
This is another direct cross-connection into the CIDOC-CRM standard via E57 Material.
E57 Material does not seem to have a definition, and inclusion of 'brick' (which by most reckoning would be an Object I suspect) and 'gold' as examples looks to be incoherent.
iSamples version is now: https://github.com/isamplesorg/metadata/blob/main/vocabulary/MaterialTypeDecisionTreev3.pdf IGSN is using http://vocabulary.odm2.org/medium/ which also seems to fit. Most specimens would be 'organism' in that one and in the iSamples version would be 'organic material'. 'Organism' seems closer to ObjectType (countable thing). the iSamples approach seems cleaner, however interoperability with IGSN would be a nice to have. Both are missing a category for meteorites. In geocase there is no field for this, however geocase distinguises between fossils, minerals, rocks and meteorites.
This type of information is often currently available/known at a dataset or collection level, although not in a machine-readable manner. Specimens are published in sets that have similar material types and these types are considered to be evident (to humans) based on the dataset title, keywords or provenance.
Implementations of the currently in development Latimer Core would enable machine readability and facilitate MIDS calculation using data from the dataset/collection level, rather than individual specimen records. The downside to this approach is that some collections/datasets are heterogeneous and would not be covered, although it could be argued that these sets, in the absence of any material-and-other-type information at record level, are not worthy of MIDS > 0.
Some sets are also mostly homogeneous, but with a few artifacts/curiosities/errors. In this case, the MIDS score will at least be 'mostly' correct - but this may not be where we want to go. Although MIDS has been defined from the start to be agnostic about data quality, so herbarium sheets mislabeled as preserved birds would not violate MIDS conditions.