add "bulk chemistry" to slot `analysis/data type`

mslarae13 commented 9 months ago

NEON use case / example: id:6f99e199-c6c3-4d79-aac6-57446fde26f4

3 soil samples are collected, then pooled into 1 for the DNA extraction. This pooling means the 3 individual samples are NOT sent for any analysis or data generation

Need a 'none' option so the metadata will validate.

As NMDC expands, and depending on how we capture preparation metadata (like pooling) we might develop a better way to validate the 'no data generated' samples. But for now, this needs to happen to allow for submissions.

mslarae13 commented 9 months ago

I think @bmeluch or @pkalita-lbl can make this change to the enum? This enum only needs to be update in the submission portal if possible.

The samples with no analysis type are still ingested into the data portal (@aclum confirm?). But there's no workflow generated data.

pkalita-lbl commented 9 months ago

The samples with no analysis type are still ingested into the data portal (@aclum confirm?)

I think we need a definitive answer here. If samples with anaysis_type = none are to go into Mongo then the none permissible value needs to go into nmdc-schema. If samples with anaysis_type = none are to be dropped by the submission portal-to-Mongo pipeline then it's okay to have the none permissible value only exist in submission-schema.

aclum commented 8 months ago

Related comment, it seems like we largely need this for the submission portal. For data in mongo it is much better to use OmicsProcessing records to see what was actually done instead of looking at analysis_type on class Biosample. I don't believe this is used by the data portal at all so maybe this is a field that should just live in the submission schema.

aclum commented 6 months ago

metadata meeting recommendation is to update the slot to recommended and enforce required via linkml-rules

mslarae13 commented 6 months ago

How does using linkml-rules solve the problem on the submission portal?

aclum commented 6 months ago

The proposal from the metadata meeting was to make the slot recommended for now so it would show up in DH as a different color. If that isn't an acceptable solution it needs to be discussed again.

mslarae13 commented 6 months ago

The color isn't the problem... the problem is you can't submit unless you have something entered there. I don't want to make this field optional. Then, when there is data, people will leave it blank and we won't have what we needed. We should discuss again at the metadata meeting.

turbomam commented 6 months ago

No NMDC authored slots are currently required and have an enum range, where one of the permissible values is "none". That is the case for a few MIxS provided enums.

Since there's a concern about this field being left blank, we could set it as optional for some period of time on one of our portals and assess how frequently it is left blank. Decisions like this shouldn't be made based on intuition alone.

"None" isn't a great answer to "what type of analysis has been or will be performed". One solution would be to use a value of "Not analyzed directly". If we did a trial of making the slot optional like I described above, we could also do a trial with that value and see how often it is used as intended.

Another solution is to add a required analysed_directly slot to the submission schema. Hopefully we could add a rule that says, if analysed_directly is true, then the analysis type filed must have a value set.

turbomam commented 6 months ago

5616 out of the 8158 Biosamples in the production MongoDB have analysis_type values

{analysis_type:{$exists:true}}

turbomam commented 6 months ago

We had a good discussion yesterday about providing other analysis_type permissible values, like biogeochemistry, bulk_chemistry, etc.

I think that's a great solution. If a user wants us to be aware of some sample that isn't a direct input into our current core "omics" analyses, they should at least lets us why we need to be aware of it.

mslarae13 commented 6 months ago

Decision made at metadata meeting on 3-20

Add a new analysis type to the Enum: AnalysisTypeEnum that describes “bulk chemistry” or some other way of describing ‘general non-omic analyses’

bmeluch commented 6 months ago

Per discussion in https://github.com/microbiomedata/submission-schema/pull/169, this change needs to be made in nmdc-schema first. See https://github.com/microbiomedata/nmdc-schema/pull/1858 - this issue is currently pending an nmdc-schema release including this PR, and a submission-schema that uses that nmdc-schema version

mslarae13 commented 5 months ago

Tested. Looks good and ready for release! I saw no issues.

bmeluch commented 5 months ago

Closed by updating schema version in https://github.com/microbiomedata/submission-schema/pull/193

microbiomedata / submission-schema

add "bulk chemistry" to slot `analysis/data type` #166