microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
26 stars 8 forks source link

Migrations: Implement `do_for_each_document` adapter method #2006

Closed eecavanna closed 1 month ago

eecavanna commented 1 month ago

Summary

In this branch, I implemented a new adapter method. Its name is do_for_each_document.

I also made the make migration-doctests command more flexible by allowing the user to specify whether they want its output to be verbose or not. In practice, verbose output from the doctest module can produce a "wall of text," making it difficult to notice failures. The non-verbose mode is more of a "just tell me if anything fails" mode, as opposed to "tell me everything you do."

About the do_for_each_document method

It differs from the existing process_each_document method in that — although both methods require the function passed in to accept a document as a parameter — this new method does not require the function passed in to return a document (i.e. a document which would then be either passed to the next function in a processing pipeline or — if the function happened to be the final one in a pipeline — written back to the database).

This new method was designed to make the process of iterating over all documents in a collection — when not wanting to update those same documents — more intuitive for migrator authors. It also improves performance by eliminating an unnecessary "write" operation to the database for each document in the collection.

eecavanna commented 1 month ago

I'm ready for this to be merged into the nmdc-schema repo. I want to use it in the berkeley-schema-fy24 fork repo.