Open dannymandel opened 9 months ago
I cheated some when I was making those example instances and did some inferencing to get the environment that the organism inhabits and used that as the sampled feature. In this case the sample is a coral, so the inference is 'marine environment'. I don't seem to have saved the original raw JSON for this record, so I'm not sure what all was there. I probably also inferred 'coral reef' as the sampled feature. I was (am) hoping that we could use the machine learning tools Sarah S. is working on to train this kind of inferencing.
In the mean time... It looks like in the code above, the has_context_category key, which takes a a value rom the sampledfeature vocabulary was using the taxon rank ("kingdom", "phylum", "genus")?
The sampledfeature vocabulary has "Biological entity", with definition "Sampled feature is an organism. Use for samples that represent some species of organism as the proximate sampled feature for which the focus is not the environment that the organism inhabits." This might well apply to many GEOME samples, and the simple default for now might be to use that if we can't figure out the environment sampled.
There is a biologicalEntityExtension vocabulary (https://github.com/isamplesorg/vocabularies/blob/main/src/extensions/biologicEntityExtension.ttl) that has the kingdom-level subclasses of biological entity. The next level would be to get the kingdom name from the GEOME record and match that to the extension vocabulary and add that as a has_context_category value.
@datadavev this was the issue we were discussing in this morning's standup around the kingdom vocabularies. Whatever you implement should support this use case as well.
The previous implementation of the GEOME
has_context_categories
method looked like this:It looks like the examples all now have
marinewaterbody
set, e.g. https://github.com/isamplesorg/metadata/blob/a59d9b35062643928f868f85da5b32bb02a6b357/examples/GEOME/test1.0Valid/ark-21547-AvL2C02_201705281001-v1.json#L9Should this just be hardcoded to
marinewaterbody
? FWIW, the taxon ranks are now included in the keywords, so we haven't lost this information.