microbiomedata / nmdc_automation

Prototype automation
2 stars 2 forks source link

`berkeley` issue with DataObject records from import_automation #267

Closed aclum closed 3 weeks ago

aclum commented 1 month ago

Example record from import automation [DataObject(id='nmdc:abcd', type='nmdc:DataObject', name='Metagenome bin tarfiles archive', description='Metagenome Bins for nmdc:omprc-11-importT', alternative_identifiers=[], compression_type=None, data_category=None, data_object_type=FileTypeEnum(text='Metagenome Bins', description='Metagenome bin contigs fasta'), file_size_bytes=14065, insdc_experiment_identifiers=[], md5_checksum='3ba455a6a73bdd37cfdc81299fea942c']

This will not pass API validation with either json:validate or the workflow_executions post endpoint b/c data_category requires an enumeration.

Either fix https://github.com/microbiomedata/nmdc_automation/issues/259 to remove empty keys or add support for specifying the correct enumeration values. If doing the former we still should have spin off a ticket to add support for this later. If doing the latter outputs of NucleotideSequencing would have a permissible value of 'instrument_data' and outputs of WorkflowExecution would have a permissible value of processed_data. A permissible value of workflow_parameter_data is only valid for mass spec data at this time.