Closed PeopleMakeCulture closed 2 months ago
The alldocs
collection that you are creating as part of your referential integrity checking and validation notebook/PR here: metadata-translation/notebooks/repl_validation_referential_integrity-1715162638.ipynb could be very useful to a use case that we have in one of the existing squads called the NCBI Export squad.
There is a PR on runtime that is implementing all the requirements laid out for the above squad. See here: https://github.com/microbiomedata/nmdc-runtime/pull/518
One of the requirements/blockers to continue the development on the above PR/squad is that we need a way to be able to retrieve the URLs of DataObject records (in data_object_set
) given a Biosample record/id. Here are the two cases that need to be handled/covered:
has_input
key) to OmicsProcessing, the "output" (has_output
key) of which is/are DataObjects, orhas_input
key) to a variety of lab processing classes (Pooling, Extraction, LibraryPreparation) the output of which will be input to OmicsProcessing, and that will have an output of one or more DataObjectsSo now we need a method (@op
/ API endpoint / etc.) to achieve the above. I need to be able to plug in a Biosample/id and retrieve DataObjects from it.
It's not realistic to do this search in realtime by iterating over all the different collections, but instead would be nice to have one materialized collection (like alldocs
) using which we can implement a method to check the inputs and outputs and get back the desired DataObjects.
The above usage is a specific use case of the Database roll-up described in #551
@sujaypatil96 Could you describe the use case you had in mind?