jhu-bids / TermHub

Web app and CLI tools for working with biomedical terminologies. https://github.com/orgs/jhu-bids/projects/9/views/7
https://bit.ly/termhub
GNU General Public License v3.0
11 stars 10 forks source link

Bug: Csets not uploading (invalid concept IDs) #901

Closed joeflack4 closed 1 month ago

joeflack4 commented 1 month ago

Overview

Siggie was uploading some csets using the API. It passed on the validate step, but failed on apply. (slack).

Examples

Cset 1000016833. This had a concept 45492135 which was "dead" / "not a primary key for an OMOP concept" (slack). This cset does not appear in the Enclave; was not successfully uploaded.

Sub-tasks

1. Fix issue

Possible causes/solutions: i. ~Enclave support? It's possible that this might be a problem that they can only solve on the Enclave-side.~ ii. ~Illegal concepts?: See task "2"~ iii. Siggie passed invalid concept IDs. Just need to use correct concept IDs

2. How to avoid selecting "dead" concepts"

Questions

i. What does 'not primary key' mean? Don't concepts only have 1 ID? Did he just mean "not in the concept table"? ii. How did this happen? New vocab update? iii. How to know if concept is invalid? Just by "PK"? iv. How did Siggie select an illegal concept? Cuz in previous set?

When Enclave issues are fixed:

a. Delete existing ones and upload anew b. Possible to fix existing ones?

Additional details

Upload error log

```sh [I] ➜ curl -H "authorization: $PALANTIR_ENCLAVE_AUTHENTICATION_BEARER_TOKEN" -H "Content-type: application/json" \ 'https://unite.nih.gov/api/v1/ontologies/ri.ontology.main.ontology.00000000-0000-0000-0000-000000000000/actions/add-selected-concepts-as-omop-version-expressions/validate' \ --data '{"parameters": {"sourceApplication": "TermHub", "concepts": [3450381, 45492135, 40794012, 4042051, 4198748, 4209251, 36304662, 3042837, 3019800, 40653873, 3446476, 40769783, 3033745, 40797633, 4020703, 4005525, 40653875, 40653874, 4007805, 3021337, 3451471, 46235895, 3441257, 4192937, 3005227, 3019572, 3476485, 4021291, 40783217, 36305612, 37076644, 3468321, 3437011, 36304114, 37066693, 37075202, 37071238, 2212610, 3048529, 36303871, 3032971, 37072384, 44805124, 37045968, 3448781, 37040477, 3052931, 37063873, 2212605, 36306105], "includeMapped": false, "includeDescendants": false, "version": 1000016833, "isExcluded": false, "optional-annotation": ""}}' | jq { "result": "VALID", "submissionCriteria": [], "parameters": { "sourceApplication": { "result": "VALID", "evaluatedConstraints": [], "required": true }, "concepts": { "result": "VALID", "evaluatedConstraints": [ { "type": "objectQueryResult" }, { "type": "arraySize", "gte": 1 } ], "required": true }, "includeMapped": { "result": "VALID", "evaluatedConstraints": [], "required": true }, "includeDescendants": { "result": "VALID", "evaluatedConstraints": [], "required": true }, "version": { "result": "VALID", "evaluatedConstraints": [ { "type": "objectQueryResult" } ], "required": true }, "isExcluded": { "result": "VALID", "evaluatedConstraints": [], "required": true }, "optional-annotation": { "result": "VALID", "evaluatedConstraints": [], "required": false } } } [I] ➜ curl -H "authorization: $PALANTIR_ENCLAVE_AUTHENTICATION_BEARER_TOKEN" -H "Content-type: application/json" \ 'https://unite.nih.gov/api/v1/ontologies/ri.ontology.main.ontology.00000000-0000-0000-0000-000000000000/actions/add-selected-concepts-as-omop-version-expressions/apply' \ --data '{"parameters": {"sourceApplication": "TermHub", "concepts": [3450381, 45492135, 40794012, 4042051, 4198748, 4209251, 36304662, 3042837, 3019800, 40653873, 3446476, 40769783, 3033745, 40797633, 4020703, 4005525, 40653875, 40653874, 4007805, 3021337, 3451471, 46235895, 3441257, 4192937, 3005227, 3019572, 3476485, 4021291, 40783217, 36305612, 37076644, 3468321, 3437011, 36304114, 37066693, 37075202, 37071238, 2212610, 3048529, 36303871, 3032971, 37072384, 44805124, 37045968, 3448781, 37040477, 3052931, 37063873, 2212605, 36306105], "includeMapped": false, "includeDescendants": false, "version": 1000016833, "isExcluded": false, "optional-annotation": ""}}' | jq { "errorCode": "INVALID_ARGUMENT", "errorName": "FunctionInvalidInput", "errorInstanceId": "49967844-ab98-4482-be79-ba94e46e9432", "parameters": { "functionRid": "ri.function-registry.main.function.459fcf12-750c-4653-9db8-e32bfa1c6f65", "functionVersion": "1.17.0" } } ```

Related

joeflack4 commented 1 month ago

@Sigfried I checked and there are no deleted concepts found. See:

So consider this one case from slack:

Cset 1000016833. This had a concept 45492135 which was "dead" / "not a primary key for an OMOP concept" (slack). This cset does not appear in the Enclave; was not successfully uploaded.

It's not that 45492135 was deleted; it appears to have never been a concept. Perhaps Siggie made a typo?

Sigfried commented 1 month ago

I fixed the upload code so it just leaves out any definition concept_ids that don't appear in the concept table. (These would only be from very old concept set versions being copied.)

joeflack4 commented 1 month ago

Mmm, I see. Well if those concepts weren't in our previous vocab from 2023, I guess they were even older than that then!

It is a bit disconcerting that concepts got deleted. That doesn't seem like that should happen in OMOP...