microbiomedata / nmdc-runtime

Runtime system for NMDC data management and orchestration
https://microbiomedata.github.io/nmdc-runtime/
Other
4 stars 3 forks source link

Cannot find or update documents by `_id` ObjectId via `/queries:run` endpoint #575

Open mbthornton-lbl opened 3 days ago

mbthornton-lbl commented 3 days ago

The overall goal here is to fix malformed workflow ID.versions in the database e.g. nmdc:wfrbt-11-t0xw3k43.1.1 that were created by a bug in part of the re-ID tool chain. We would like to be able to do this by submitting to the queries:run endpoint.

Description / Steps to Recreate. Note that I am using the "Napa" instance of our runtime/database https://api-napa.microbiomedata.org/docs#/queries/run_query_queries_run_post

Updating a record via _id

Finding Records based on _id

- Response

{ "ok": 1, "cursor": { "firstBatch": [], "partialResultsReturned": null, "id": 0, "ns": "nmdc.read_based_taxonomy_analysis_activity_set" } }

eecavanna commented 3 days ago

I checked the endpoint's code and didn't notice any special handling for the _id field.

I assume the Runtime is not converting the submitted string into an ObjectId instance before sending the query to Mongo.

That conversion step is referenced in the pymongo docs. In those docs, its absence is referred to as a "common mistake." I don't know whether it was a mistake in this case or it was done (er... not done) intentionally. If it was intentional, I would recommend that that user-impacting limitation (i.e. "cannot filter by _id") be documented in the API endpoint's documentation (i.e. docstring) that appears on the Swagger UI.

mbthornton-lbl commented 2 hours ago

Is blocking: https://github.com/microbiomedata/nmdc_automation/issues/198