Open mslarae13 opened 9 months ago
@turbomam misx.yaml has no separation of what environmental extension are associated with which slots, correct?
That's only done at the submissions schema, correct?
Well, there are a couple of mixs.yaml files. The one in the nmdc-schema repo does not make that connection. It's just a collection of terms, which are associated with the monolithic Biosample class and the OmicsProcessing class in nmdc.yaml.
But the previous collection of MIxS YAML files does
And so does the 6.2 release candidate
But you'll see that those two LinkML versions do it in slightly different ways. We could make all of that more transparent if necessary.
And yes, the submission schema does it too.
@turbomam why are some descriptions in ' X ' but some aren't? see tot_phosp
why are some descriptions in single quotes but some aren't?
If a phrase has a : in it, Mark writes it in single quotes because : can get messed up.
Need to decide when are single quotes required? Should we just use them all the time? Putting single quotes around is the safer action.
Checked the descriptions of all the "In NMDC schema, add to plant-associated" terms for sample type exclusivity. Only "tot_nitro" would exclude plants:
name: tot_nitro
description: 'Total nitrogen concentration of water samples, calculated by: total
nitrogen = total dissolved nitrogen + particulate nitrogen. Can also be measured
without filtering, reported as nitrogen'
domain_of:
- HydrocarbonResourcesCores
- HydrocarbonResourcesFluidsSwabs
- WastewaterSludge
- Water
@bmeluch @mslarae13 and friends: how would you feel about
GlbrcSample
We could continue to use GlbrcSample
as a mixin, or migrate the slots onto Biosample
. But I hope our efforts in December will leave us with a smaller, more modular Biosample
overall.
@turbomam how would you like alias' added to existing slots? They don't need to be curies, right? But I want to be able to attribute that this is what this term is called in GLBRC.
Yes, I am suggesting that strategy. In the cases you mentioned, we define slots in those separate YAML modules and then assign them to Biosample
in nmdc.yaml.
But I have raised the general question to our team: do we want more or fewer YAML modules. At one point, @mslarae13, I think you found the multiple modules difficult to search through. Have you become more comfortable with "find-in-files" in PyCharm or some other tool? I think the PyCharm functionality can be accessed with shift-command-f. @mbthornton-lbl has confirmed that he finds the practice of separate, thematic modules helpful.
I am really concerned about the number of slots we are adding onto Biosample. @mbthornton-lbl has opened an issue to propose a refactoring. I don't know if we can or should work on that before, during or after our December meeting in Berkeley.
Aliases can be assigned to a schema element with attribution with
I'm ok with separate organized files. I can figure it out. I just want to make sure we're making the right decision for the right reasons. We'll make a GLBRC yaml to capture the new slots & can merge it in depending on the decision.
Will the BRCs capture this information consistently? Are these slots specific to this study? Do we make mappings and aliases or provide a tool for converting their metadata to the term NMDC would use? For close mappings and more specific mapping, curie required. Can we discuss at the next GLBRC meeting seeing their DH implementation, model, and schema.
For now, pull in the submission and pause on mapping & additional/ new metadata fields until we meet. Ingest the mapped metadata & skip schema mapping for now until . Provide a spreadsheet with a column for the GLBRC term and the schema term.
In working with Adina, it's been identified that there are some data as metadata slots that should be added to the NMDC schema to support this project.
Some of the slots below are already in NMDC schema, but aren't in the plant-associated package? need to confirm
In NMDC, add alias
In NMDC, but measurement was on the SOIL. NOT Plant
not in NMDC schema
Terms that will be added to Biosample, but should discuss putting on Site
phenotypic evaluation of plants?