Migrations: Update migrators to raise Exception instead of printing error message

eecavanna commented 1 month ago

Tasks

[x] Update existing migrators
[ ] Update migrator-writing documentation accordingly

Alternatively, instead of raising an Exception, the migrator could call some callback function passed in from the things running the migrator; e.g. on_error or report_error. That reminds me, the migrator initializer does accept a logger parameter. One caveat with calling a callback is that unplanned exceptions wouldn't call it (unless the migrator does stuff in a try/except block, or is called within a try/except block). That reminds me, the thing that runs the migrator could run the migrator within a try/except block.

eecavanna commented 1 month ago

There's only one occurrence of this in the migrators in the nmdc-schema repo. That is:

[ ] 1. https://github.com/microbiomedata/nmdc-schema/blob/aa028108b4450f2ae0c8cff486f3b38a026725af/nmdc_schema/migrators/migrator_from_8_1_to_9_0.py#L154-L155

I think there are several occurrences in the berkeley-schema-fy24 repo.

eecavanna commented 1 month ago

Here are all the occurrences in the berkeley-schema-fy24 repo:

[ ] 1. https://github.com/microbiomedata/berkeley-schema-fy24/blob/a4419bd3ccf6f8f1ad7dfb91f4687306b9235980/nmdc_schema/migrators/migrator_from_X_to_PR19_and_PR70.py#L83-L84

self.logger.error(f"The instrument_name {omics_doc['id']} has a value of {omics_doc['instrument_name']}, but did not have any matches with the cutoff set at 0.25")

[ ] 2. https://github.com/microbiomedata/berkeley-schema-fy24/blob/a4419bd3ccf6f8f1ad7dfb91f4687306b9235980/nmdc_schema/migrators/migrator_from_X_to_PR23.py#L64-L65

self.logger.error(f"The workflow {workflow_id} has an execution_resource value of {workflow_resource_value}, but did not have any matches with the cutoff set at 0.8")

[ ] 3. https://github.com/microbiomedata/berkeley-schema-fy24/blob/a4419bd3ccf6f8f1ad7dfb91f4687306b9235980/nmdc_schema/migrators/migrator_from_X_to_PR2_and_PR24.py#L34C17-L34C98

print(f"Error: Collection '{current_collection_name}' not found in the adapter.")

[ ] 4. https://github.com/microbiomedata/berkeley-schema-fy24/blob/a4419bd3ccf6f8f1ad7dfb91f4687306b9235980/nmdc_schema/migrators/migrator_from_X_to_PR3.py#L41-L42

self.logger.error(f"ERROR: analyte_category for {data_gen_doc['id']} is {data_gen_doc['analyte_category']}, which is not one of {nucleotide_seqs} or {mass_spec}")

[ ] 5. https://github.com/microbiomedata/berkeley-schema-fy24/blob/a4419bd3ccf6f8f1ad7dfb91f4687306b9235980/nmdc_schema/migrators/migrator_from_X_to_PR31.py#L80-L81

self.logger.error(f"Workflow doc {doc['id']} with instrument: {doc['used']} does not match {omics_processing_doc['instrument_name']}")

[ ] 6. https://github.com/microbiomedata/berkeley-schema-fy24/blob/a4419bd3ccf6f8f1ad7dfb91f4687306b9235980/nmdc_schema/migrators/migrator_from_X_to_PR4.py#L48

self.logger.error(f"omics type does not match any analyte categories for {omics_proc['id']}")

[ ] 7. https://github.com/microbiomedata/berkeley-schema-fy24/blob/a4419bd3ccf6f8f1ad7dfb91f4687306b9235980/nmdc_schema/migrators/migrator_from_X_to_PR9.py#L110

self.logger.error(f"No WorkflowChain ID available for OmicsProcessing: {omics_processing_id}")

[ ] 8. https://github.com/microbiomedata/berkeley-schema-fy24/blob/a4419bd3ccf6f8f1ad7dfb91f4687306b9235980/nmdc_schema/migrators/migrator_from_X_to_PR9.py#L219-L222

self.logger.error(
    f"WorkflowExecution doc with id {workflow_doc['id']} was_informed_by slot does not match"
    f"its workflow chain doc with id {workflow_chain_doc['id']} was_informed_by slot"
)

[ ] 9. https://github.com/microbiomedata/berkeley-schema-fy24/blob/a4419bd3ccf6f8f1ad7dfb91f4687306b9235980/nmdc_schema/migrators/migrator_from_8_1_to_9_0.py#L154-L155 (note: this file exists in the nmdc-schema repo)

self.logger.error(f'ERROR: Unexpected value in {slot_name} of {study["id"]} skipping slot deletion')

eecavanna commented 1 month ago

Since all the migrators that exist in nmdc-schema also exist in berkeley-schema-fy24, I will create a branch in the latter repo and make these changes there.

microbiomedata / nmdc-schema

Migrations: Update migrators to raise Exception instead of printing error message #2004

Tasks