Closed eecavanna closed 3 weeks ago
Here are links to the schema documentation pages for the classes I think will be involved here.
In terms of existing adapter methods, here's what I expect this migrator to do for each of those child classes (written here in pseudocode):
# Move all documents from the "pooling_set" collection into the "material_processing_set" collection,
# then delete the "pooling_set" collection.
self.adapter.do_for_each_document(
collection_name="pooling_set",
action=lambda document: self.adapter.insert_document(collection="material_processing_set", document=document)
)
self.adapter.delete_collection(collection_name="pooling_set")
DataGeneration subclasses are already in a combined collection so there no action there. All existing collections from children of MaterialProcessing should combined to a new collection called material_processing_set, same for WorkflowExecution. Im assuming you want to put this migrator at the end in which case you use commit id https://github.com/microbiomedata/berkeley-schema-fy24/commit/ca304e47916f9ff2825dcd854a7a936dfdd5b07f to determine what the starting Database slot names are. If this is running earlier you may need to use nmdc-schema Database slot names. Note that some of these subclasses never had a collection in mongo so the code should be able to handle that.
Thanks! That was very helpful to me. I'm operating under the assumption that this will run after all migrators that have been implemented so far, so I'll refer to that commit you linked to.
P.S. I'll be out until about 9:45pm PT.
I implemented this migrator. It's in this PR: https://github.com/microbiomedata/berkeley-schema-fy24/pull/196
Hi @aclum, I created this ticket to represent the task that came up during today's metadata meeting.
It sounded to me like you wanted all of the documents in one collection to be moved to another collection, and to have the first collection be deleted. Is there more to it than that (e.g. modifying fields within documents)?