Submission Portal - Add DataHarmonizer tab for supplying external (sequencing) data

aclum commented 3 months ago

Problem: We'd like to replace the system of emailed spreadsheets with a DataHarmonizer tab to ingest data. The suggestion is to prototype this with sequencing data since we've done this manually for 1000 soils and TRiP.

Proposed logic/requirements: In the Submission Portal 'Submission Context' section if a user checks 'Yes' to 'Have data already been generated for your study?' and does not check 'Data was generated by a DOE user facility' a 'Data' tab would appear. The headers for those columns would be sample_name,file_size_bytes,md5_checksum,data_object_type,compression_type,url,file name,description,alternative_identifiers

sample name would map to nmdc-schema Biosample name file name would map to nmdc-schema DataObject name all other column headers map to nmdc-schema DataObject slots

sample name, file name, data_object_type would be required md5_checksum would be recommended

data_object_type - would be a drop down a subset of FileTypeEnum that correspond to outputs from Omics processing. If we are just starting with sequencing data this could be limited to 'Metagenome Raw Reads', 'Metagenome Raw Read 1','Metagenome Raw Read 2'

@mslarae13 @pkalita-lbl

FY25 Q1 goal Deploy support for automated data staging (non-JGI/EMSL data)

ssarrafan commented 2 months ago

@aclum @pkalita-lbl does this issue relate to a milestone? Can you add the milestone number (from the proposal) to the title please?

aclum commented 2 months ago

This would be one piece of what is needed for milestone 6.7. I've linked this ticket to a ticket for the milestone.

microbiomedata / nmdc-server

Submission Portal - Add DataHarmonizer tab for supplying external (sequencing) data #1207