microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
26 stars 8 forks source link

Migrations: Implement `set_field_of_each_document` adapter method #2009

Closed eecavanna closed 1 month ago

eecavanna commented 1 month ago

Summary

In this branch, I implemented a new adapter method. Its name is set_field_of_each_document.

I designed the method with the upcoming Berkeley schema migrations in mind. Specifically, one of the migrators assigns the same type value to every document in a collection. That migrator currently uses the process_each_document method, which does an ETL (extract, transform, load) process on each document. It's a relatively general-purpose method.

In contrast, with this new method — regardless of what the original document contains — this new method always sets the specified field to the specified value (so, instead of ETL, it's just "L" — the loading of the specified value into the existing document). Because this method's job responsibility is more narrow, it can use a more optimized query under the hood. I expect that this method will speed up the Berkeley schema migration.

eecavanna commented 1 month ago

I'm ready for this to be merged into the nmdc-schema repo. I want to use it in the berkeley-schema-fy24 fork repo.

turbomam commented 1 month ago

I won't delete the branch until you check if my commit did anything nasty, and it can easily be resolved.

eecavanna commented 1 month ago

@turbomam, thanks for bringing this additional commit to my attention. I looked at its contents (diff) and don't have any concerns. I'll delete the branch now.