Closed AlanSimmons closed 1 year ago
The endpoint requires one parameterized, correlated subquery in Cypher
Following is an example that returns codes for child concepts of HUBMAP C000402 from the ordered list of SABs ['HUSAT','SNOMEDCT_US'].
//Correlated subquery:
//Reference: neo4j Operations manual/Managing Databases/Composite Databases/Querying - Example 7 (Correlated subqueries)
//1. Find the child concepts with an isa relationship with the parent HUBMAP concept (identified by code).
//2. Order the child concepts based on the positions of the SABs for their codes in a list (as opposed to an alphabetic order).
//3. Identify the code from the SAB that is the earliest in the list. For example, if codes from SNOMEDCT_US are preferred to those from NCI, the list would include [...,'SNOMEDCT_US','NCI',...].
CALL
{
MATCH (codeChild:Code)<-[:CODE]-(conceptChild:Concept)-[:isa]->(conceptParent:Concept)-[:CODE]->(codeParent:Code)
WHERE codeParent.SAB='HUBMAP' AND codeParent.CODE='C000530' AND codeChild.SAB IN ['HUSAT','SNOMEDCT_US']
RETURN conceptChild.CUI AS conceptChildCUI, min(CASE codeChild.SAB WHEN 'HUSAT' THEN 1 WHEN 'SNOMEDCT_US' THEN 2 END) as minSAB
ORDER BY conceptChildCUI
}
//4. Filter to the code for the child concepts with the "earliest" SAB. The "earliest" SAB will be different for each child concept. Limit to 1 to account for multiple cross-references (e.g., UMLS C0026018, which maps to 2 NCI codes)
CALL
{
WITH conceptChildCUI,minSAB
MATCH (codeChild:Code)<-[:CODE]-(conceptChild:Concept)
WHERE conceptChild.CUI = conceptChildCUI
AND CASE codeChild.SAB WHEN 'HUSAT' THEN 1 WHEN 'SNOMEDCT_US' THEN 2 END= minSAB
RETURN codeChild
ORDER BY codeChild.CODE
LIMIT 1
}
//5. Get the term associated with the child concept code with the earliest SAB
WITH codeChild
MATCH (termChild:Term)<-[:PT]-(codeChild:Code)
RETURN termChild.name, codeChild.CODE,codeChild.SAB
Compared current knowledge graph with expected values from CEDAR Metadata Editor. Results here. The validation identified a number of changes required to the HUBMAP ontology, which will be described in release notes.
Changes to HUBMAP ontology to match CEDAR templates Concept | Change | Reason |
---|---|---|
HUBMAP C000416 | Changed term from "None" to "No medium" | "None" is a reserved word in Python, and was causing conflicts |
HUBMAP C000433 | Changed dbxref to UMLS:C5575819 | Original was for "Calorie 4 Degrees Celsius" |
HUBMAP C000432 | Added dbxref to UMLS:C5669848 | Concept introduced in UMLS2022AB |
HUBMAP C000534, HUBMAP C000535, HUBMAP C000536, HUBMAP C000531, HUBMAP C000532, HUBMAP C000533, HUBMAP C000537, HUBMAP C000538 | Added | New storage versions of medium concepts |
HUBMAP C000440 | Removed | No longer in CEDAR |
HUBMAP C000539 | Added | filtering concept for sample block area unit |
HUBMAP C000461, HUBMAP C000461 | Changed isa to HUBMAP C000539 | filtered list of area units |
HUBMAP C000530 | Added | filtering concept for sample block volume unit |
HUBMAP C000454, HUBMAP C000455 | Changed isa to HUBMAP C000530 | filtered list of volume units |
HUBMAP C000416 | Changed dbxref to UMLS:C0442735 | term is "nothing" instead of "none", which is a reserved word in Python |
HUBMAP C000540 - C000555 | added | for assay_category, analyte_class, is_targeted, library_yield_final_unit, library_concentration_unit, library_layout, library_indexing_type,is_technical_replicate |
The ontology graph has been updated to support CEDAR metadata templates. Comparison here.
@yuanzhou
Task list for remaining work to implement new API endpoint, as I see it:
[ ] Decide which repo this endpoint should be deployed:
the UBKG repo (under the API directory), which is supposed to be the base ontology api repo OR
this repo
[x] Instantiate local version of the relevant ontology OpenApI server. These instructions are a start.
[x] Configure Postman, per instructions here.
[x] Add a new method to the neo4j manager that implements the Cypher query described in this thread.
[x] Update ontology-api-spec.yaml, which is currently at the root level of the repo and not in the API directory
[x] Rebuild the local API server
[x] Test. Values for parameters for the query, as well as expected output, can be found in this document.
[ ] Deploy the changes (manager, yaml, etc.) to the Dev and Prod API servers.
I tried to instantiate a local instance of the OpenAPI server, but was unable to get it to work. Issues that I encountered:
I think that most of these problems are package dependency issues.
@ChuckKollar helped me to set up a local instance of the API server.
@yuanzhou and @shirey ,
I have a revised version of the ontology-api server with a new endpoint, and want to deploy it in dev and production. I think that I need to rebuild the Docker container, but am not sure.
The "Rebuilding the Docker Container" section of this README document describes a process for updating the server. It appears that this process assumes that the Docker container on the server is current, because it pulls source for composing from the server instead of from DockerHub. This, in turn, assumes that the underlying source used to build the openapi server in the container on the server is current. This is not the case.
To deploy the new endpoint, I had to make to make changes to the ontology-api source that are different from those automatically generated by the OpenAPI process--in particular, to requirements.txt and build-server.sh.
Should I build a new local Docker container for the ontology-api and push it to the DockerHub repo?
@yuanzhou
- [ ] Update the production ontology neo4j instance with the Archive found here:
@AlanSimmons I've rebuilt the DEV and PROD ontology neo4j images using these latest CSVs. Following the details for PROD:
Creating network "ontology-api_ontology-neo4j-network" with the default driver
Creating volume "ontology-api_ontology-neo4j-data" with default driver
Building ontology-neo4j
Sending build context to Docker daemon 3.41GB
Step 1/12 : FROM hubmap/neo4j-image:4.2.5
---> 0a9c59a9d224
Step 2/12 : WORKDIR /usr/src/app
---> Running in ad28dea18a21
Removing intermediate container ad28dea18a21
---> 26c9c0868c1e
Step 3/12 : COPY start.sh .
---> c0f4debe033d
Step 4/12 : COPY set_constraints.cypher .
---> 4fab5e5e349d
Step 5/12 : COPY neo4j.conf /usr/src/app/neo4j/conf
---> 91838d00ce3f
Step 6/12 : ENV IMPORT=/usr/src/app/neo4j/import
---> Running in b7d2e88a03ea
Removing intermediate container b7d2e88a03ea
---> 9ae48dbad8fb
Step 7/12 : COPY import/current/*.csv ${IMPORT}/
---> 290b8ce76891
Step 8/12 : WORKDIR /usr/src/app/neo4j/bin
---> Running in 7b0983b09524
Removing intermediate container 7b0983b09524
---> b8df49ecf3f6
Step 9/12 : RUN ./neo4j-admin import --verbose --database=ontology --nodes=Semantic="${IMPORT}/TUIs.csv" --nodes=Concept="${IMPORT}/CUIs.csv" --nodes=Code="${IMPORT}/CODEs.csv" --nodes=Term="${IMPORT}/SUIs.csv" --nodes=Definition="${IMPORT}/DEFs.csv" --relationships=ISA_STY="${IMPORT}/TUIrel.csv" --relationships=STY="${IMPORT}/CUI-TUIs.csv" --relationships="${IMPORT}/CUI-CUIs.csv" --relationships=CODE="${IMPORT}/CUI-CODEs.csv" --relationships="${IMPORT}/CODE-SUIs.csv" --relationships=PREF_TERM="${IMPORT}/CUI-SUIs.csv" --relationships=DEF="${IMPORT}/DEFrel.csv" --skip-bad-relationships --skip-duplicate-nodes
---> Running in 04b14d67ced9
neo4j 4.2.5
VM Name: OpenJDK 64-Bit Server VM
VM Vendor: Red Hat, Inc.
VM Version: 11.0.11+9-LTS
JIT compiler: HotSpot 64-Bit Tiered Compilers
VM Arguments: [-XX:+UseParallelGC, -Dfile.encoding=UTF-8]
Neo4j version: 4.2.5
Importing the contents of these files into /usr/src/app/neo4j/data/databases/ontology:
Nodes:
[Concept]:
/usr/src/app/neo4j/import/CUIs.csv
[Semantic]:
/usr/src/app/neo4j/import/TUIs.csv
[Definition]:
/usr/src/app/neo4j/import/DEFs.csv
[Term]:
/usr/src/app/neo4j/import/SUIs.csv
[Code]:
/usr/src/app/neo4j/import/CODEs.csv
Relationships:
/usr/src/app/neo4j/import/CUI-CUIs.csv
/usr/src/app/neo4j/import/CODE-SUIs.csv
CODE:
/usr/src/app/neo4j/import/CUI-CODEs.csv
DEF:
/usr/src/app/neo4j/import/DEFrel.csv
STY:
/usr/src/app/neo4j/import/CUI-TUIs.csv
ISA_STY:
/usr/src/app/neo4j/import/TUIrel.csv
PREF_TERM:
/usr/src/app/neo4j/import/CUI-SUIs.csv
Available resources:
Total machine memory: 30.92GiB
Free machine memory: 2.389GiB
Max heap memory : 6.872GiB
Processors: 8
Configured max memory: 21.64GiB
High-IO: true
Nodes, started 2022-12-06 20:39:22.543+0000
[*Nodes:0B/s 1.160GiB-------------------------------------------------------------------------]21.2M ∆3.21M
Done in 15s 926ms
Prepare node index, started 2022-12-06 20:39:38.477+0000
[*DEDUPLICATE:1.240GiB------------------------------------------------------------------------]89.0M ∆21.5M
Done in 9s 293ms
DEDUP, started 2022-12-06 20:39:47.887+0000
[*DEDUP---------------------------------------------------------------------------------------] 0 ∆ 0
Done in 314ms
Relationships, started 2022-12-06 20:39:48.205+0000
[*Relationships:0B/s 1.240GiB-----------------------------------------------------------------]55.5M ∆ 100K
Done in 1m 2s 982ms
Node Degrees, started 2022-12-06 20:40:51.352+0000
[>(2)======================================|*CALCULATE:1.194GiB(5)============================]55.4M ∆21.6M
Done in 5s 232ms
Relationship --> Relationship 1-1738/1738, started 2022-12-06 20:40:56.719+0000
[>------------------------|*LINK(5)============================|v:149.9MiB/s------------------]55.4M ∆4.18M
Done in 12s 867ms
RelationshipGroup 1-1738/1738, started 2022-12-06 20:41:09.597+0000
[*>:??--------------------------------------------------------------------------------|v:??---]3.01M ∆3.01M
Done in 961ms
Node --> Relationship, started 2022-12-06 20:41:10.570+0000
[>:81.03MiB/s---------|>(2)======================|LINK-----|*v:146.6MiB/s(2)==================]20.5M ∆18.6M
Done in 2s 62ms
Relationship <-- Relationship 1-1738/1738, started 2022-12-06 20:41:12.686+0000
[>-------------------------------|*LINK(5)===========================|v:149.9MiB/s------------]55.4M ∆2.89M
Done in 12s 830ms
Count groups, started 2022-12-06 20:41:25.588+0000
[>|*>(6)=============================================|COUNT:1.036GiB--------------------------]3.01M ∆3.01M
Done in 318ms
Gather, started 2022-12-06 20:41:26.377+0000
[>------|*CACHE:1.410GiB----------------------------------------------------------------------]3.01M ∆ 712K
Done in 3s 297ms
Write, started 2022-12-06 20:41:29.683+0000
[*>:??----------------------------------------------------------------------------------||v:??]2.90M ∆2.90M
Done in 633ms
Node --> Group, started 2022-12-06 20:41:30.351+0000
[*>---------------------------------------|FIRST-----------------------|v:??(2)===============] 156K ∆9.82K
Done in 610ms
Node counts and label index build, started 2022-12-06 20:41:31.235+0000
[>(2)==========================|LABEL INDEX----------------|*COUNT:1.154GiB-------------------]21.2M ∆11.0M
Done in 2s 345ms
Relationship counts and relationship type index build, started 2022-12-06 20:41:33.595+0000
[*>(2)=========================================|RELATIONSH|COUNT(4)===========================]55.5M ∆15.5M
Done in 4s 450ms
IMPORT DONE in 2m 16s 907ms.
Imported:
21225536 nodes
55490055 relationships
82696223 properties
Peak memory usage: 1.336GiB
There were bad entries which were skipped and logged into /usr/src/app/neo4j/bin/import.report
Removing intermediate container 04b14d67ced9
---> eb809ebb0222
Step 10/12 : RUN yum install -y curl && chmod +x /usr/src/app/start.sh && rm -rf ${IMPORT}/*
---> Running in 3e28c920c93a
Loaded plugins: fastestmirror, ovl
Determining fastest mirrors
* base: download.cf.centos.org
* extras: download.cf.centos.org
* updates: download.cf.centos.org
Package curl-7.29.0-59.el7_9.1.x86_64 already installed and latest version
Nothing to do
Removing intermediate container 3e28c920c93a
---> dce845db89b5
Step 11/12 : EXPOSE 7474 7687
---> Running in 0ab0ed23ebf5
Removing intermediate container 0ab0ed23ebf5
---> 2c1546f73b84
Step 12/12 : CMD ["/usr/src/app/start.sh"]
---> Running in e8f86902167f
Removing intermediate container e8f86902167f
---> 5d33e31ad8ad
Successfully built 5d33e31ad8ad
Successfully tagged ontology-api_ontology-neo4j:latest
Issue
CEDAR metadata templates contain questions with categorical responses. The available responses are identified with concepts from standard biomedical ontologies such as UO and UBERON. Membership in the subset of concepts used as responses for a particular question is determined by business logic: i.e., while all units of measure concepts from UO are potential members of the set of concepts for responses to a question, only a few are actually available.
We describe as a valueset the set of concepts associated with concepts that encode responses to a particular categorical question.
CEDAR valuesets are either defined manually (e.g., by explicit declaration) or from the response to an API endpoint. CEDAR currently interacts with a REST API provided by NCBO BioPortal.
Solution
We need to provide similar functionality for HuBMAP.
The HuBMAP application ontology cross-references concepts in standard biomedical ontologies. By means of bi-directional ontological assertions, it is possible to define a set of concepts in other ontologies that share a relationship to a HuBMAP concept. The relationship can be hierarchical (e.g., all concepts with an _inverseis relationship with the HuBMAP concept) or non-hierarchical (e.g., all concepts that have a _derivesfrom relationship with a HuBMAP concept).
When a HuBMAP concept cross-references a UMLS CUI, concepts from multiple standard biomedical ontologies are possible. For example, HUBMAP C000411 (methanol) maps to UMLS:C0001963, which corresponds to concepts in a number of other ontologies. It will be necessary to specify an order of precedence for concepts available via cross-reference.
Requirements
It should be possible to call a RESTful endpoint that:
Example
CEDAR question: preparation medium