Cannot find or update documents by `_id` ObjectId via `/queries:run` endpoint

mbthornton-lbl commented 3 days ago

The overall goal here is to fix malformed workflow ID.versions in the database e.g. nmdc:wfrbt-11-t0xw3k43.1.1 that were created by a bug in part of the re-ID tool chain. We would like to be able to do this by submitting to the queries:run endpoint.

Description / Steps to Recreate. Note that I am using the "Napa" instance of our runtime/database https://api-napa.microbiomedata.org/docs#/queries/run_query_queries_run_post

Updating a record via _id

request body

{
"update": "read_based_taxonomy_analysis_activity_set",
"updates": [
    {
        "q": {
            "_id": "6647b8ffe95c3fa0447675bd"
        },
        "u": {
            "$set": {
                "id": "nmdc:wfrbt-11-t0xw3k43.1"
            }
        }
    },
    {
        "q": {
            "_id": "6647b8ffe95c3fa0447675bd"
        },
        "u": {
            "$addToSet": {
                "alternative_identifiers": "nmdc:wfrbt-11-t0xw3k43.1.1"
            }
        }
    }
]
}

curl

curl -X 'POST' \
'https://api-napa.microbiomedata.org/queries:run' \
-H 'accept: application/json' \
-H 'Authorization: Bearer redacted' \
-H 'Content-Type: application/json' \
-d '{
"update": "read_based_taxonomy_analysis_activity_set",
"updates": [
    {
        "q": {
            "_id": "6647b8ffe95c3fa0447675bd"
        },
        "u": {
            "$set": {
                "id": "nmdc:wfrbt-11-t0xw3k43.1"
            }
        }
    },
    {
        "q": {
            "_id": "6647b8ffe95c3fa0447675bd"
        },
        "u": {
            "$addToSet": {
                "alternative_identifiers": "nmdc:wfrbt-11-t0xw3k43.1.1"
            }
        }
    }
]
}'

response

{
"detail": [
"update command modified zero documents. I'm guessing that's not what you expected. Check the syntax of your request. But what do I know? I'm just a teapot."
]
}

Finding Records based on _id

Request body


{
"find": "read_based_taxonomy_analysis_activity_set",
"filter": {"_id": "6647b8ffe95c3fa0447675bd"}
}

- Response

{ "ok": 1, "cursor": { "firstBatch": [], "partialResultsReturned": null, "id": 0, "ns": "nmdc.read_based_taxonomy_analysis_activity_set" } }

eecavanna commented 3 days ago

I checked the endpoint's code and didn't notice any special handling for the _id field.

I assume the Runtime is not converting the submitted string into an ObjectId instance before sending the query to Mongo.

That conversion step is referenced in the pymongo docs. In those docs, its absence is referred to as a "common mistake." I don't know whether it was a mistake in this case or it was done (er... not done) intentionally. If it was intentional, I would recommend that that user-impacting limitation (i.e. "cannot filter by _id") be documented in the API endpoint's documentation (i.e. docstring) that appears on the Swagger UI.

mbthornton-lbl commented 2 hours ago

Is blocking: https://github.com/microbiomedata/nmdc_automation/issues/198

microbiomedata / nmdc-runtime

Cannot find or update documents by `_id` ObjectId via `/queries:run` endpoint #575