Develop Pacbio for NMDC submission

mslarae13 commented 1 year ago

Deliverable this task is associated with

_See Deliverables tab here: https://docs.google.com/spreadsheets/d/1z_b6WbuTk4pI0Q-Z-rfCgC-8R3m3F2_JDevYuK8CjYE/edit?usp=sharing_

3

RACI

Tag people in their roles

Responsible: Montana
Accountable:
Consulted: @aclum , @emileyfadrosh
Informed:

Describe the the task?

[x] microbiomedata/issues#433
[x] microbiomedata/submission-schema#168
[ ] Dev test
[ ] Deploy to prod

Criteria for completion

Users can submit metadata via NMDC for JGI Pacbio analysis

Estimate people time

8

Completion Date (Goal)

~Oct 20~
Rescheduled, Feb 23rd

Target Sprint Start & End Dates

Start: Sept 11
End: ~Oct 20~ Feb 23

Tag Blocker/Contingent upon isues

[Tagg issues]

mslarae13 commented 9 months ago

See above issue, the slots that are different for metagenome - long reads have different requirememnts for metagenome - short reads.

To accomplish this check for JGI sample submission, the long and short reads will be split apart, in the "multi omics" data selection, if metaG is selected, additional check boxes will appear for long or short reads.

In the metadata file, long and short reads will be added to the analysis type option

when selected, those assigned long will appear in a "JGI Metagenomics - Long Reads" template tab... and those assigned to short will appear in a "JGI Metagenomics - Short Reads" tab.

@pkalita-lbl when do you think we can work on this? I think I or @bmeluch can make the updates to add the interface and the requirements updates?

pkalita-lbl commented 9 months ago

@mslarae13 can I turn the question around and ask when do we need to have this done?

ssarrafan commented 8 months ago

At least questions are in progress. Moving to next sprint. @mslarae13 let me know if this should be in the backlog instead.

mslarae13 commented 8 months ago

We should do this as part of the expansion / updates to the submission portal interface. See https://github.com/microbiomedata/issues/issues/433

I think this rolls into the the updating tabs task

mslarae13 commented 8 months ago

In schema, pacbio instrument will capture that it's long reads.

mslarae13 commented 7 months ago

Decided to separate out long and short reads for metaGs at step 4, Multi-Omics data (for JGI), and on the analysis slot. When a user selects metaG they can choose long or short read.

mslarae13 commented 4 months ago

@pkalita-lbl

Functionality on the submission portal is great and works with no issues I did have a realization / question about a potential problem

It was previously asked by Mark if the dna_slot and rna_slot (s) could beconsolidated to just a single slot. You concluded here that no, because the data goes into mongo associated with a single biosample.

If sample 1 has long and short read data, don't we have the same issue?

pkalita-lbl commented 4 months ago

🤦🏻 🤦🏻 🤦🏻

Yes, you're absolutely right. That's my bad for not thinking of that. I'll make a new issue to deal with that.

In the meantime, it doesn't really hurt anything to collect data like this in the submission portal. But if we get any submissions with data like that, we'll just need to hold off on bringing that data into Mongo until the issue is resolved.

EDIT: Here's the new issue https://github.com/microbiomedata/nmdc-schema/issues/1937

mslarae13 commented 4 months ago

Thanks @pkalita-lbl ! Let's check with @aclum

Alicia, of these DNA vs RNA slots that are JGI specific... do we need to store any of them in NMDC/mongo? Or can they be considered "submission portal & UF specific"?

See the slots in MGInterface in submission schema : https://github.com/microbiomedata/submission-schema/blob/0b9413915f63bd7fa9be70f32061db49dc422009/src/nmdc_submission_schema/schema/nmdc_submission_schema.yaml#L34798

ssarrafan commented 4 months ago

Thanks @pkalita-lbl ! Let's check with @aclum

Alicia, of these DNA vs RNA slots that are JGI specific... do we need to store any of them in NMDC/mongo? Or can they be considered "submission portal & UF specific"?

See the slots in MGInterface in submission schema : https://github.com/microbiomedata/submission-schema/blob/0b9413915f63bd7fa9be70f32061db49dc422009/src/nmdc_submission_schema/schema/nmdc_submission_schema.yaml#L34798

@aclum can you respond to this when you get a chance?
I'll remove this from the sprint and add backlog label since it hasn't been updated for 2 weeks.

aclum commented 4 months ago

I would like to keep dna_isolate_meth and map it to a slot on NMDC's Extraction class. However in looking at those slots we've conflated extraction target and how the extraction was done into one permissible value. If we were to store the values JGI has we'd need to just allow this to be a string b/c JGI doesn't place any CV on this so this needs further discussion with @turbomam

mslarae13 commented 2 months ago

we've conflated extraction target and how the extraction was done into one permissible value. Fixed in nmdc-schema and merged into berk-schema.

Make short and long read, deal with mapping the 1 field we care about back later.

mslarae13 commented 2 months ago

Schema change, post berk.

microbiomedata / issues

Develop Pacbio for NMDC submission #413