Closed mslarae13 closed 5 months ago
I think @bmeluch or @pkalita-lbl can make this change to the enum? This enum only needs to be update in the submission portal if possible.
The samples with no analysis type are still ingested into the data portal (@aclum confirm?). But there's no workflow generated data.
The samples with no analysis type are still ingested into the data portal (@aclum confirm?)
I think we need a definitive answer here. If samples with anaysis_type = none
are to go into Mongo then the none
permissible value needs to go into nmdc-schema
. If samples with anaysis_type = none
are to be dropped by the submission portal-to-Mongo pipeline then it's okay to have the none
permissible value only exist in submission-schema
.
Related comment, it seems like we largely need this for the submission portal. For data in mongo it is much better to use OmicsProcessing records to see what was actually done instead of looking at analysis_type on class Biosample. I don't believe this is used by the data portal at all so maybe this is a field that should just live in the submission schema.
metadata meeting recommendation is to update the slot to recommended and enforce required via linkml-rules
How does using linkml-rules solve the problem on the submission portal?
The proposal from the metadata meeting was to make the slot recommended for now so it would show up in DH as a different color. If that isn't an acceptable solution it needs to be discussed again.
The color isn't the problem... the problem is you can't submit unless you have something entered there. I don't want to make this field optional. Then, when there is data, people will leave it blank and we won't have what we needed. We should discuss again at the metadata meeting.
No NMDC authored slots are currently required and have an enum range, where one of the permissible values is "none". That is the case for a few MIxS provided enums.
Since there's a concern about this field being left blank, we could set it as optional for some period of time on one of our portals and assess how frequently it is left blank. Decisions like this shouldn't be made based on intuition alone.
"None" isn't a great answer to "what type of analysis has been or will be performed". One solution would be to use a value of "Not analyzed directly". If we did a trial of making the slot optional like I described above, we could also do a trial with that value and see how often it is used as intended.
Another solution is to add a required analysed_directly
slot to the submission schema. Hopefully we could add a rule that says, if analysed_directly
is true
, then the analysis type filed must have a value set.
5616 out of the 8158 Biosample
s in the production MongoDB have analysis_type
values
{analysis_type:{$exists:true}}
We had a good discussion yesterday about providing other analysis_type
permissible values, like biogeochemistry, bulk_chemistry, etc.
I think that's a great solution. If a user wants us to be aware of some sample that isn't a direct input into our current core "omics" analyses, they should at least lets us why we need to be aware of it.
Decision made at metadata meeting on 3-20
Add a new analysis type to the Enum: AnalysisTypeEnum that describes “bulk chemistry” or some other way of describing ‘general non-omic analyses’
Per discussion in https://github.com/microbiomedata/submission-schema/pull/169, this change needs to be made in nmdc-schema first. See https://github.com/microbiomedata/nmdc-schema/pull/1858 - this issue is currently pending an nmdc-schema release including this PR, and a submission-schema that uses that nmdc-schema version
Tested. Looks good and ready for release! I saw no issues.
Closed by updating schema version in https://github.com/microbiomedata/submission-schema/pull/193
NEON use case / example: id:6f99e199-c6c3-4d79-aac6-57446fde26f4
3 soil samples are collected, then pooled into 1 for the DNA extraction. This pooling means the 3 individual samples are NOT sent for any analysis or data generation
Need a 'none' option so the metadata will validate.
As NMDC expands, and depending on how we capture preparation metadata (like pooling) we might develop a better way to validate the 'no data generated' samples. But for now, this needs to happen to allow for submissions.