Given a Biosample id we need to be able to retrieve the associated DataObjects

sujaypatil96 commented 1 month ago

As part of the efforts in the NCBI Export squad, one of the requirements that has come up is the need to be able to retrieve DataObjects (ids and URLs) given a Biosample id.

Ideally, this would be a specific case of NMDC Database roll-up, but since we don't have the "machinery" for that just yet, we will need to implement something custom for this use case in the meantime.

The code for the NCBI Export squad is being developed in PR #518

The two cases that we need to cover are:

The given Biosample id may be a direct input (through has_input key) on an OmicsProcessing record, the output (through has_output key) of which will be one or two DataObject ids, and we need to retrieve the DataObject records for those ids, or
The given Biosample id may be input into a lab processing class (Pooling, Extraction, LibraryPreparation) the output (through has_output key) of which will be a ProcessedSample, and that ProcessedSample will be input (through has_input key) into an OmicsProcessing record

Implementation details:

We can develop this either as an API endpoint or just as an @op and use it in code. Which would be better?
We can use the get_mongo_db() method or the mongo resource. Which would be better?
I'm also thinking that the method I implement will iterate over all the records in the alldocs collection

sujaypatil96 commented 1 month ago

CC: @PeopleMakeCulture @dwinston

PeopleMakeCulture commented 1 month ago

@sujaypatil96 we actually built out an api endpoint for this but it never got pushed to prod. let me find the issue for you.

PeopleMakeCulture commented 1 month ago

Duplicate of #401

microbiomedata / nmdc-runtime

Given a Biosample id we need to be able to retrieve the associated DataObjects #545