Closed baskaufs closed 2 years ago
reference #14
Ping @Jegelewicz
@baskaufs THANK YOU! We can take a deep dive tomorrow?
@baskaufs I will comment on the definition of FossilSpecimen as "A preserved specimen that is a fossil". I advocate removing the term "preserved". First, the process of fossilization is preservation in and of itself. It is also a "natural" process, not an anthropogenic process such as described under most of dwc:preparations (also using the term "preserved", many paleo workers take issue with fossils being in that term as well, for fossils preparation has nothing to do with how the item is stored). The concept of SkinSpecimen or SkeletalSpecimen representing Skin or Skeletal preservation/preparation does not make sense. There may be fringe instances where a recently living entity may be naturally preserved, such as a desiccated mammal or bird, or a frozen Mammoth body that is not otherwise "turned to stone" that is usually envisioned as fossils. Perhaps these fringe instances can be included in PreservedSpecimen with preparation type as naturally desiccated or naturally frozen. They do not represent fossils (although some would argue the Mammoth is a fossil). Speaking of fossil, the examples include items that are not fossils but are instead examples of behavior. These are coprolites, gastroliths, and ichnofossils. These are all forms of ichnofossils and as such cannot (usually) be directly attributed to any particular species, and should perhaps be a form of specimens of a "preserved observation"?
I would like to understand why, in order to specify values for dwc:MaterialSampleType
a new concept scheme with newly defined resources (a.k.a terms) is preferred (including concept scheme infrastructure like name spaces, IRIs). At first glance it seems that what is informative about the newly minted resources can also be expressed with the existing terms, e.g. using http://rs.tdwg.org/dwc/terms/version/LivingSpecimen-2018-09-06 or its associated label ("Living Specimen") or adaptations thereof. Couldn't those be used as values?
Perhaps add a requested term?
Also, GGBN will be concerned looking for "tissue".....
I will comment on the definition of FossilSpecimen as "A preserved specimen that is a fossil". I advocate removing the term "preserved".
I agree with removing preserved from this definition.
However, won't we have some FossilSpecimens that are also PreservedSpecimen? Like these fossil scutes prepared as thin sections? https://arctos.database.museum/guid/NMMNH:Paleo:16545
I guess this also means there will need to be a whole other term for the description of the material "scute"? Should we be looking at that here or passing that down to the next task group?
the examples include items that are not fossils but are instead examples of behavior. These are coprolites, gastroliths, and ichnofossils. These are all forms of ichnofossils and as such cannot (usually) be directly attributed to any particular species, and should perhaps be a form of specimens of a "preserved observation"?
I suggest we probably need a type for this in controlled vocabulary, but I also suggest "trace" rather than "PreservedObservation" which seems like it could also be used for a photograph. Trace could also cover things like scat, molds of footprints and such.
However, won't we have some FossilSpecimens that are also PreservedSpecimen
I think there's a subtle distinction between "preserved" and "prepared" (or "curated"). When you think about it, every physical object is in some way preserved. There are varying degrees of the duration of preservation. Fossils, through mineralization, are preserved for millions of years. Specimens treated with formaldehyde and/or alcohol are preserved for (potentially) centuries. Tissue samples stored in DMSO are preserved for decades(?) A fresh carcass in an air-conditioned room is preserved for days or perhaps weeks.
I guess my point is that the word "preserved" is a bit meaningless and/or implied by the word "specimen". What I think people are actually interested in is the current state and history of "preparations". That is, what sorts of actions have been performed on a physical thing? Some of these actions are intended to extend the duration of preservation (e.g., formalin, alcohol, etc.). Some of them are intended to allow examination or analysis (e.g., thin sections, tissue extractions for DNA, etc.)
I get that we want to distinguished "Preserved" Specimens from "Living" Specimens, but it would seem to me that the alternative of "Living" is actually "Dead", not "Preserved". Yeah, I know what we mean by "Preserved" specimen; but given that we're taking the time to completely restructure how these terms are applied; perhaps now is a good time to rethink how we parse out the different kinds of MaterialSample instances?
One option is to do as we have been doing, which is sort of "overload" a basic term like materialSampleType
to capture clues about preservation method, living vs. dead, mineralized vs. actual biological material, whole organism vs. part of organism vs. aggregate of multiple organisms, vs. which part of an organism it is, etc. I worry that trying to capture all these disparate properties in some controlled vocabulary of terms squeezed into materialSampleType
might make things more complicated, rather than more simple.
Maybe a good topic of discussion for today's chat would be "What parameters are we trying to represent in the values of materialSampleType
?
Maybe a good topic of discussion for today's chat would be "What parameters are we trying to represent in the values of materialSampleType?
Added to the agenda!
https://github.com/tdwg/material-sample/issues/24#issuecomment-1104068355 's question 'What parameters are we trying to represent in the values of materialSampleType?' is important. For a controlled vocabulary, it is very useful to have a clear definition of the use case for the vocabulary, its scope (biological samples, any material sample, Earth Materials....), what are the criteria for differentiating the terms, are the terms hierarchical, do the terms cover the scope (covering), can terms have overlapping meaning (Unique, unambiguous).
Perhaps add a requested term? environmentalSample
I think this is too broad. I would like to use the examples from https://github.com/tdwg/dwc/issues/40 e.g. Examples: envo:soil
, envo:sediment
, envo:saline water
I think being able to distinguish a soil sample vs. a saline water sample vs. a freshwater sample will be important to eDNA data providers.
'What parameters are we trying to represent in the values of materialSampleType?' is important. For a controlled vocabulary, it is very useful to have a clear definition of the use case for the vocabulary, its scope (biological samples, any material sample, Earth Materials....), what are the criteria for differentiating the terms, are the terms hierarchical, do the terms cover the scope (covering), can terms have overlapping meaning (Unique, unambiguous).
In the second meeting yesterday, we discussed this. Those present could see the need for thinking beyond the currently used "GBIF basisOfRecord" terms and @albenson-usgs suggested that we take a step back and start by creating a list of terms we think we might find or want to place in materialSampleType. So, I have started a Google Sheet and I would like everyone to think about what they might place in this vocabulary. Just add your terms to the bottom of "suggested vocabulary". We can then deduplicate the list and start categorizing to see if we can build a more broad and useful vocabulary. In addition, I think it would be helpful for each of us to think about the quote above. What do we expect from the vocabulary for this term?
@cboelling To respond to your question
I would like to understand why, in order to specify values for dwc:MaterialSampleType a new concept scheme with newly defined resources (a.k.a terms) is preferred (including concept scheme infrastructure like name spaces, IRIs). At first glance it seems that what is informative about the newly minted resources can also be expressed with the existing terms, e.g. using http://rs.tdwg.org/dwc/terms/version/LivingSpecimen-2018-09-06 or its associated label ("Living Specimen") or adaptations thereof. Couldn't those be used as values?
The controlled vocabulary as I generated it follows the conventions that have been established within TDWG for ratified controlled vocabularies. One of the goals of that system is to eliminate longstanding confusion between term labels, IRI local names, and the controlled value strings that people should use in spreadsheets or tables. These three things have been badly conflated in the past. That's a problem because TDWG is an international organization and labels are (or should be) available in many languages, whereas there should be a single controlled value string used by everyone as a value for the property. You can see examples under the three existing controlled vocabularies within Darwin Core (for establishmentMeans, pathway, and degreeOfEstablishment), available from the top navigation bar on the Darwin Core website. The intent is for this vocabulary to follow the same pattern. These controlled vocabularies now have some label translations available at https://tdwg.github.io/rs.tdwg.org/ .
The IRI local names are intentionally opaque so that no one is tempted to try to use them as controlled value strings. But since there are IRIs and JSON-LD using them, one can encode SKOS relationships among concepts (such as skos:broader) in a machine-readable way. See https://tdwg.github.io/rs.tdwg.org/cvJson/pathway.json for example.
Coming from GRSciColl and working on describing "Institutions" and "Collections", I added a couple of terms to the end of the list, as well as an additional sheet with the two existing vocabularies for the fields/properties describing "Collection": "Content types" and "Preparation types".
Both input fields don't work, that is, a csv-download of the information stored in GRSciColl shows that both fields are generally empty, or users add information that doesn't make a lot of sense when compared with the rest of the entered information. Obviously they need to be redesigned. Nevertheless, they can provide an idea and perspective about dimensions associated with describing MaterialSampleType and granularity.
For further background, since there is a bit of overlap too, this is my proposal for how to describe "Institution" GRSciColl_Vocabs . Comments are very much welcome, though since out of scope here, please to me directly.
Refer to existing draft controlled vocabulary for organism parts here and organized by organism group here. The terms are intended to be used as values for
ac:subjectPart
, which indicates the part of the organism being photographed, but it could generally refer to organism parts in other contexts.
@baskaufs ... no fungi ... (eg. thallus, fruiting body, vegetative reproductive structure, mycelium, symbiont)
@jbstatgen
no fungi ... (eg. thallus, fruiting body, vegetative reproductive structure, mycelium, symbiont)
We begged people to participate in this task group and no fungi experts joined. So we only have values for organism groups where someone suggested them.
The controlled vocabulary is intended to be extensible, so we'd be happy to add fungi if someone will suggest the terms, test with images, etc.
@dr-shorthair
I'd suggest being more clear about which strings are keys, in what context; and which strings are being stored as 'annotations' related to some prior context.
I don't understand what you are saying. Please refer to the governing specification, Sections 3.3.3.1 ("Controlled value") and 4.5.4 and offer suggestions on how they need to be clarified.
The approach taken there was a compromise between how concept metadata are described in "pure" SKOS thesauri and the actual practice within TDWG of simply using a certain plain text string as a value from a "controlled vocabulary".
Apologies - my comment was intended to be in the context of IDs. I'll try to find the thread I thought I was responding to. We can delete these bits of this conversation so that this issue does not have a confusing sub-thread.
... no fungi ... (eg. thallus, fruiting body, vegetative reproductive structure, mycelium, symbiont) ... The controlled vocabulary is intended to be extensible, so we'd be happy to add fungi if someone will suggest the terms, test with images, etc.
@baskaufs What would it take to add the above terms to your vocabulary?
A) If it is a matter of the amount of information present in this overview and the first two links in your initial post, I could provide this for the above terms and learn along the way about how to construct and publish vocabularies correctly.
B) Though, there wouldn't be any testing and community agreement supporting the contributed terms. For that, the vocabularies need the mycologists and lichenologists eg. from the citizen science initiatives for fungi.
C) This is the Task Group you were mentioning. Your report for 2021 suggests that you are wrapping up and might not want to reopen the process.
Not sure where the balance in all of this is right now.
@dr-shorthair no worries - just copy and repost wherever you want to comment!
I spent some time studying the draft controlled vocabulary (tabular form), and have some thoughts.... First, as a geologist and engineer, I don't know what a lot of the terms mean and didn't have time to look them all up, so this analysis is based on terms I think I understand.
Perhaps a next step here is looking for some more general categories to lump categories into a vocabulary with a manageable number of classes, say on the order of a 100 or so. And make them hierarchical. Maybe something like Organism > plant organism > plant organism part along one branch.
factoring specimen type along the lines of say ... object type, material type, sampled feature, taxonomic class, anatomic class... would allow defining a smaller set of categories, and then allowing users to build detail vocabularies that map into combinations of those high-level categories.
@smrgeoinfo many of the terms you cite as adjectives are indeed individual bones (angular, articular, basibranchial, basioccipital, exoccipital, frontal) and may be important, especially for vertebrate paleontology where a complete skeleton is not found or only isolated bones are known. I do agree the list is painfully long but, as is, incomplete with all of the possible terms. The list I use in my CMS is hierarchical has a "modifier" to handle adjectives like anterior partial, left lateral partial, etc., because not everything is complete in the paleo realm.
So in a hierarchical vocabulary, one might have something like: whole organism > vertebrate organism > vertebrate body part > vertebrate bone > endochondral bone > basibranchial bone > Gymnura micrura basibranchial medial plate. For a TDWG materialSampleType vocabulary, the question is what is the useful level of granularity in this hierarchy; more detailed categorization would then fall in some free text field, or use a local, more granular vocabulary specific to some sub-community.
Rather than try to build the vocabulary for anatomical parts, I would recommend the use of a SKOS-ified version of UBERON, the construction of which could be scripted and updated at any time.
@tucotuco UBERON, works for the living, not so well for the fossil groups. It is a great start and I will explore further.
Perhaps a next step here is looking for some more general categories to lump categories into a vocabulary with a manageable number of classes, say on the order of a 100 or so.
I want to make clear that when I suggested this task that is what I had intended would happen. In the How Did It Die Task Group this is what we did to come up with the vocabulary for causeOfDeath, see here where we have a full slate of what's currently in some of the databases for cause of death and then the lumping categories of Natural - abiotic, Natural - biotic, Anthropogenic, Unknown. I would hope we could get to a lumped list of 10 or so personally :-) We are going to overwhelm data providers if we make the list too long.
@jbstatgen I've started a new issue https://github.com/tdwg/ac/issues/240 in the Audubon Core repository regarding fungal parts to avoid getting this one off the track. We can continue the discussion there.
The categories from GRSciColl Collection ContentType seem broad and relevant. Could these terms also be used as materialSampleType?
That may seem repetitive, but any given collection probably includes more than one of the ContentType(s), allowing the addition of this "tag" to every record would seem potentially useful. However, they still seem oddly specific in some cases. How about the broader categorical terms?
Archaeological Biological Human Derived Earth Planetary Paleontological Record
Really, it seems like the broader terms belong with the collection description and the more detailed values with the individual records, but I could see it going either way...
This mapping includes the GRSciColl terms.
Really, it seems like the broader terms belong with the collection description and the more detailed values with the individual records, but I could see it going either way...
Wouldn't this be the perfect situation for an ontology, ie. a hierarchical classification? In that way one could automatically generate the aggregate of a collection's contents at any level.
Archaeological Biological Human Derived Earth Planetary Paleontological Record
I like this high-level approach, though there are a couple of reasons why I would like to see the list of terms modified.
Geological
Biological
Anthropogenic
[Record (what does "Record" refer to? Is that a subclass of Anthropogenic?)]
In a hierarchical approach this could be Level 1 With Level 0 being "material sample" vs. "information artifact".
Virology
Microbiology _(Would one want to split Bacteriology from Microbiology? That is, Bacteria, Archaebacteria versus the rest of all those evolutionary dispersed lineages of microorganisms?)_
Mycology
Zoology _(How important is an immediate split into invertebrates - vertebrates?)_
Botany
Paleontology _(human remains go into Anthropology, right?)_
Biomedical _(or any term referring to human biology - and yes, actually this is Zoology)_
Planetary/Terrestrial/Earth
Extraterrestrial with WithinSolarSystem vs. ExtrasolarSystem
Alternatively, would it be "Geology" vs. "Astronomy"?
Anthropology/Archaeology
Cultural Artifacts
Library/Literature
This mapping includes the GRSciColl terms.
@smrgeoinfo Could you please change the share settings for the file? Currently I can't access it and might not be the only one. Thanks a lot, Jutta
Jutta-- sorry! permissions updated, Anyone with link should be able to comment
@smrgeoinfo can we just add this to the original file? I'd prefer to just have one.
Done
I would like to add saline water, non-saline water?, soil, and sediment but I'm not sure where to add them to the document? They aren't necessarily database uses but I would see them as materialSampleTypes that eDNA collectors would want to use. Should I add them to both the database uses tab and the iSamples mapping tab?
I don't think it matters that they aren't currently in use - just add them to the database uses tab.
Level 2 within "Biological" will be most informative for many of our use cases. Here I am suggesting
Virology Microbiology _(Would one want to split Bacteriology from Microbiology? That is, Bacteria, Archaebacteria versus the rest of all those evolutionary dispersed lineages of microorganisms?)_ Mycology Zoology _(How important is an immediate split into invertebrates - vertebrates?)_ Botany Paleontology _(human remains go into Anthropology, right?)_ Biomedical _(or any term referring to human biology - and yes, actually this is Zoology)_
But aren't these things really part of identification (with the exception of "Paleontology")? Would we be duplicating whatever is held in dwc:higherClassification?
A list (concatenated and separated) of taxa names terminating at the rank immediately superior to the taxon referenced in the taxon record.
While the terms in the list will not be found exactly in dwc:higherClassification, they can be inferred from there. Or are we to assume that any given dwc:MaterialSample may not have an associated dwc:Identification? If they do, how would this list be more informative than dwc:Identification plus dwc:higherClassification?
Some other vocabs to consider ggbn:materialSampleType - https://rs.gbif.org/extension/ggbn/materialsample.xml
dwc:preparations - https://dwc.tdwg.org/terms/#dwc:preparations
ADBC KindOfUnit - https://terms.tdwg.org › wiki › abcd2:KindOfUnit (504 Gateway Time-out)
Closing as discussion has now moved to #26 #27 and #28
As requested in the 2022-03-16 meeting, I have created a draft controlled vocabulary for the proposed
materialSampleType
term based on the existing specimen types. It can be viewed as a list of terms document and in tabular form.I believe the decision was to start with the existing specimen types, with the option of adding other values if we could agree upon what they should be. The vocabulary is easy to expand by just adding more rows to the source CSV (linked above).
Additional things to be resolved:
PreservedSpecimen
actually a broader concept ofFossilSpecimen
? The definition suggests that, so I put it in the metadata, but I'm not sure if that's right. It is allowed, but not required, to have hierarchical relationships among SKOS concepts and there is precedent within TDWG controlled vocabularies for doing that if desired.FossilSpecimen
, rather than lowerCamelCase (which would befossilSpecimen
) as has become somewhat standard for other controlled vocabularies. However, since UpperCamelCase is to some extent already in use or assumed for these terms, I thought it better to stick with that.dwcmatter
. I think it's best to include the "dwc" part if the namespace gets used beyond TDWG and I used "matter" instead of something longer because places that track namespace abbreviations such as Linked Open Vocabularies (LOV) won't accept namespace abbreviations that are too long (e.g.tdwgutility
is too long). I'm not sure what the character limit number is. But this seemed to encapsulate what the CV is talking about (things that are matter as opposed to information) and the abbreviation doesn't really have any normative meaning -- it's just for convenience. Also, most people will use the controlled value strings rather than IRIs anyway. But we could make it something else if desired.