microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
26 stars 8 forks source link

Migrations: Update migrators to raise Exception instead of printing error message #2004

Closed eecavanna closed 1 month ago

eecavanna commented 1 month ago

Tasks

Related: https://github.com/microbiomedata/nmdc-runtime/issues/474

Alternatively, instead of raising an Exception, the migrator could call some callback function passed in from the things running the migrator; e.g. on_error or report_error. That reminds me, the migrator initializer does accept a logger parameter. One caveat with calling a callback is that unplanned exceptions wouldn't call it (unless the migrator does stuff in a try/except block, or is called within a try/except block). That reminds me, the thing that runs the migrator could run the migrator within a try/except block.

eecavanna commented 1 month ago

There's only one occurrence of this in the migrators in the nmdc-schema repo. That is:

I think there are several occurrences in the berkeley-schema-fy24 repo.

eecavanna commented 1 month ago

Here are all the occurrences in the berkeley-schema-fy24 repo:

self.logger.error(f"The instrument_name {omics_doc['id']} has a value of {omics_doc['instrument_name']}, but did not have any matches with the cutoff set at 0.25")
self.logger.error(f"The workflow {workflow_id} has an execution_resource value of {workflow_resource_value}, but did not have any matches with the cutoff set at 0.8")
print(f"Error: Collection '{current_collection_name}' not found in the adapter.")
self.logger.error(f"ERROR: analyte_category for {data_gen_doc['id']} is {data_gen_doc['analyte_category']}, which is not one of {nucleotide_seqs} or {mass_spec}")
self.logger.error(f"Workflow doc {doc['id']} with instrument: {doc['used']} does not match {omics_processing_doc['instrument_name']}")
self.logger.error(f"omics type does not match any analyte categories for {omics_proc['id']}")
self.logger.error(f"No WorkflowChain ID available for OmicsProcessing: {omics_processing_id}")
self.logger.error(
    f"WorkflowExecution doc with id {workflow_doc['id']} was_informed_by slot does not match"
    f"its workflow chain doc with id {workflow_chain_doc['id']} was_informed_by slot"
)
self.logger.error(f'ERROR: Unexpected value in {slot_name} of {study["id"]} skipping slot deletion')
eecavanna commented 1 month ago

Since all the migrators that exist in nmdc-schema also exist in berkeley-schema-fy24, I will create a branch in the latter repo and make these changes there.