Open aclum opened 3 months ago
@aclum @pkalita-lbl does this issue relate to a milestone? Can you add the milestone number (from the proposal) to the title please?
This would be one piece of what is needed for milestone 6.7. I've linked this ticket to a ticket for the milestone.
Problem: We'd like to replace the system of emailed spreadsheets with a DataHarmonizer tab to ingest data. The suggestion is to prototype this with sequencing data since we've done this manually for 1000 soils and TRiP.
Proposed logic/requirements: In the Submission Portal 'Submission Context' section if a user checks 'Yes' to 'Have data already been generated for your study?' and does not check 'Data was generated by a DOE user facility' a 'Data' tab would appear. The headers for those columns would be sample_name,file_size_bytes,md5_checksum,data_object_type,compression_type,url,file name,description,alternative_identifiers
sample name would map to nmdc-schema Biosample name file name would map to nmdc-schema DataObject name all other column headers map to nmdc-schema DataObject slots
sample name, file name, data_object_type would be required md5_checksum would be recommended
data_object_type - would be a drop down a subset of FileTypeEnum that correspond to outputs from Omics processing. If we are just starting with sequencing data this could be limited to 'Metagenome Raw Reads', 'Metagenome Raw Read 1','Metagenome Raw Read 2'
@mslarae13 @pkalita-lbl
FY25 Q1 goal Deploy support for automated data staging (non-JGI/EMSL data)