AlanSimmons commented 1 year ago

Issue

CEDAR metadata templates contain questions with categorical responses. The available responses are identified with concepts from standard biomedical ontologies such as UO and UBERON. Membership in the subset of concepts used as responses for a particular question is determined by business logic: i.e., while all units of measure concepts from UO are potential members of the set of concepts for responses to a question, only a few are actually available.

We describe as a valueset the set of concepts associated with concepts that encode responses to a particular categorical question.

CEDAR valuesets are either defined manually (e.g., by explicit declaration) or from the response to an API endpoint. CEDAR currently interacts with a REST API provided by NCBO BioPortal.

Solution

We need to provide similar functionality for HuBMAP.

The HuBMAP application ontology cross-references concepts in standard biomedical ontologies. By means of bi-directional ontological assertions, it is possible to define a set of concepts in other ontologies that share a relationship to a HuBMAP concept. The relationship can be hierarchical (e.g., all concepts with an _inverseis relationship with the HuBMAP concept) or non-hierarchical (e.g., all concepts that have a _derivesfrom relationship with a HuBMAP concept).

When a HuBMAP concept cross-references a UMLS CUI, concepts from multiple standard biomedical ontologies are possible. For example, HUBMAP C000411 (methanol) maps to UMLS:C0001963, which corresponds to concepts in a number of other ontologies. It will be necessary to specify an order of precedence for concepts available via cross-reference.

Requirements

It should be possible to call a RESTful endpoint that:

Accepts:

the HubMAP code--i.e., the code for a concept from the HuBMAP application ontology that corresponds to a categorical question used in a CEDAR metadata template. e.g., HUBMAP_C000442, which corresponds to "storage time unit".
the name of a relationship that the HuBMAP concept has with a set of concepts in standard biomedical ontologies--e.g., _inverseisa, _derivesfrom, etc. -a list of SABs that defines the order of precedence for concepts available via cross-reference.

Returns the following information related to concepts that share the specified relationship with the specified HuBMAP concept:

codes for the concepts in their source ontologies (SABs)
preferred terms for the concepts

The return for a particular concept is governed by the order of precedence for SAB.

Example

CEDAR question: preparation medium

Corresponds to HUBMAP_C000402
relationship is _inverseisa--i.e., find all concepts a for which (c1:Code{SAB:'HUBMAP',CODE:'C00042'})<-[:CODE]-(p:Concept)-(:_inverseisa]->(p2:Concept)-[:CODE]->(c2:Code{CODE:'a'))
order of SAB precedence: HUSAT, NCIT, MESH

AlanSimmons commented 1 year ago

The endpoint requires one parameterized, correlated subquery in Cypher

Following is an example that returns codes for child concepts of HUBMAP C000402 from the ordered list of SABs ['HUSAT','SNOMEDCT_US'].

//Correlated subquery:
//Reference: neo4j Operations manual/Managing Databases/Composite Databases/Querying - Example 7 (Correlated subqueries)
//1. Find the child concepts with an isa relationship with the parent HUBMAP concept (identified by code).
//2. Order the child concepts based on the positions of the SABs for their codes in a list (as opposed to an alphabetic order).
//3. Identify the code from the SAB that is the earliest in the list. For example, if codes from SNOMEDCT_US are preferred to those from NCI, the list would include [...,'SNOMEDCT_US','NCI',...].

CALL
{
MATCH (codeChild:Code)<-[:CODE]-(conceptChild:Concept)-[:isa]->(conceptParent:Concept)-[:CODE]->(codeParent:Code)
WHERE codeParent.SAB='HUBMAP' AND codeParent.CODE='C000530' AND codeChild.SAB IN ['HUSAT','SNOMEDCT_US']
RETURN conceptChild.CUI AS conceptChildCUI, min(CASE codeChild.SAB WHEN 'HUSAT' THEN 1 WHEN 'SNOMEDCT_US' THEN 2 END) as minSAB
ORDER BY conceptChildCUI
}
//4. Filter to the code for the child concepts with the "earliest" SAB. The "earliest" SAB will be different for each child concept.  Limit to 1 to account for multiple cross-references (e.g., UMLS C0026018, which maps to 2 NCI codes)
CALL
{
WITH conceptChildCUI,minSAB
MATCH (codeChild:Code)<-[:CODE]-(conceptChild:Concept)
WHERE conceptChild.CUI = conceptChildCUI
AND CASE codeChild.SAB WHEN 'HUSAT' THEN 1 WHEN 'SNOMEDCT_US' THEN 2 END= minSAB
RETURN codeChild
ORDER BY codeChild.CODE
LIMIT 1
}
//5. Get the term associated with the child concept code with the earliest SAB
WITH codeChild
MATCH (termChild:Term)<-[:PT]-(codeChild:Code)
RETURN termChild.name, codeChild.CODE,codeChild.SAB

AlanSimmons commented 1 year ago

Compared current knowledge graph with expected values from CEDAR Metadata Editor. Results here. The validation identified a number of changes required to the HUBMAP ontology, which will be described in release notes.

AlanSimmons commented 1 year ago

Release Notes

Changes to HUBMAP ontology to match CEDAR templates Concept	Change	Reason
HUBMAP C000416	Changed term from "None" to "No medium"	"None" is a reserved word in Python, and was causing conflicts
HUBMAP C000433	Changed dbxref to UMLS:C5575819	Original was for "Calorie 4 Degrees Celsius"
HUBMAP C000432	Added dbxref to UMLS:C5669848	Concept introduced in UMLS2022AB
HUBMAP C000534, HUBMAP C000535, HUBMAP C000536, HUBMAP C000531, HUBMAP C000532, HUBMAP C000533, HUBMAP C000537, HUBMAP C000538	Added	New storage versions of medium concepts
HUBMAP C000440	Removed	No longer in CEDAR
HUBMAP C000539	Added	filtering concept for sample block area unit
HUBMAP C000461, HUBMAP C000461	Changed isa to HUBMAP C000539	filtered list of area units
HUBMAP C000530	Added	filtering concept for sample block volume unit
HUBMAP C000454, HUBMAP C000455	Changed isa to HUBMAP C000530	filtered list of volume units
HUBMAP C000416	Changed dbxref to UMLS:C0442735	term is "nothing" instead of "none", which is a reserved word in Python
HUBMAP C000540 - C000555	added	for assay_category, analyte_class, is_targeted, library_yield_final_unit, library_concentration_unit, library_layout, library_indexing_type,is_technical_replicate

AlanSimmons commented 1 year ago

The ontology graph has been updated to support CEDAR metadata templates. Comparison here.

AlanSimmons commented 1 year ago

@yuanzhou

[x] Update the production ontology neo4j instance with the Archive found here:

AlanSimmons commented 1 year ago

Task list for remaining work to implement new API endpoint, as I see it:

[ ] Decide which repo this endpoint should be deployed:
the UBKG repo (under the API directory), which is supposed to be the base ontology api repo OR
this repo
[x] Instantiate local version of the relevant ontology OpenApI server. These instructions are a start.
[x] Configure Postman, per instructions here.
[x] Add a new method to the neo4j manager that implements the Cypher query described in this thread.
[x] Update ontology-api-spec.yaml, which is currently at the root level of the repo and not in the API directory
[x] Rebuild the local API server
[x] Test. Values for parameters for the query, as well as expected output, can be found in this document.
[ ] Deploy the changes (manager, yaml, etc.) to the Dev and Prod API servers.

Note

I tried to instantiate a local instance of the OpenAPI server, but was unable to get it to work. Issues that I encountered:

The script build-server.sh obtains a dated version of a requirements.txt file from a github repo and overwrites the local copy. The archived requirements file has a few problems, such as:
- It installs a version of flask that has a dependency on jinja that uses a deprecated method (escape). I get the error described here.
- It does not install connexion, or at least the right version.
If you call the build-server.sh script with the -c argument, it tries to create a client application, but fails because it does not recognize an application that was apparently supposed to be installed from requirements.txt. (openapi-python-client).
If you run the build script without arguments and then run run_server_on_localhost.sh, you get a connection refused error. That might just be a port configuration error on the local machine.

I think that most of these problems are package dependency issues.

AlanSimmons commented 1 year ago

@ChuckKollar helped me to set up a local instance of the API server.

AlanSimmons commented 1 year ago

@yuanzhou and @shirey ,

I have a revised version of the ontology-api server with a new endpoint, and want to deploy it in dev and production. I think that I need to rebuild the Docker container, but am not sure.

The "Rebuilding the Docker Container" section of this README document describes a process for updating the server. It appears that this process assumes that the Docker container on the server is current, because it pulls source for composing from the server instead of from DockerHub. This, in turn, assumes that the underlying source used to build the openapi server in the container on the server is current. This is not the case.

To deploy the new endpoint, I had to make to make changes to the ontology-api source that are different from those automatically generated by the OpenAPI process--in particular, to requirements.txt and build-server.sh.

Should I build a new local Docker container for the ontology-api and push it to the DockerHub repo?

yuanzhou commented 1 year ago

@yuanzhou

[ ] Update the production ontology neo4j instance with the Archive found here:

@AlanSimmons I've rebuilt the DEV and PROD ontology neo4j images using these latest CSVs. Following the details for PROD:

Creating network "ontology-api_ontology-neo4j-network" with the default driver
Creating volume "ontology-api_ontology-neo4j-data" with default driver
Building ontology-neo4j
Sending build context to Docker daemon   3.41GB
Step 1/12 : FROM hubmap/neo4j-image:4.2.5
 ---> 0a9c59a9d224
Step 2/12 : WORKDIR /usr/src/app
 ---> Running in ad28dea18a21
Removing intermediate container ad28dea18a21
 ---> 26c9c0868c1e
Step 3/12 : COPY start.sh .
 ---> c0f4debe033d
Step 4/12 : COPY set_constraints.cypher .
 ---> 4fab5e5e349d
Step 5/12 : COPY neo4j.conf /usr/src/app/neo4j/conf
 ---> 91838d00ce3f
Step 6/12 : ENV IMPORT=/usr/src/app/neo4j/import
 ---> Running in b7d2e88a03ea
Removing intermediate container b7d2e88a03ea
 ---> 9ae48dbad8fb
Step 7/12 : COPY import/current/*.csv ${IMPORT}/
 ---> 290b8ce76891
Step 8/12 : WORKDIR /usr/src/app/neo4j/bin
 ---> Running in 7b0983b09524
Removing intermediate container 7b0983b09524
 ---> b8df49ecf3f6
Step 9/12 : RUN ./neo4j-admin import --verbose --database=ontology --nodes=Semantic="${IMPORT}/TUIs.csv" --nodes=Concept="${IMPORT}/CUIs.csv" --nodes=Code="${IMPORT}/CODEs.csv" --nodes=Term="${IMPORT}/SUIs.csv" --nodes=Definition="${IMPORT}/DEFs.csv" --relationships=ISA_STY="${IMPORT}/TUIrel.csv" --relationships=STY="${IMPORT}/CUI-TUIs.csv" --relationships="${IMPORT}/CUI-CUIs.csv" --relationships=CODE="${IMPORT}/CUI-CODEs.csv" --relationships="${IMPORT}/CODE-SUIs.csv" --relationships=PREF_TERM="${IMPORT}/CUI-SUIs.csv" --relationships=DEF="${IMPORT}/DEFrel.csv" --skip-bad-relationships --skip-duplicate-nodes
 ---> Running in 04b14d67ced9
neo4j 4.2.5
VM Name: OpenJDK 64-Bit Server VM
VM Vendor: Red Hat, Inc.
VM Version: 11.0.11+9-LTS
JIT compiler: HotSpot 64-Bit Tiered Compilers
VM Arguments: [-XX:+UseParallelGC, -Dfile.encoding=UTF-8]
Neo4j version: 4.2.5
Importing the contents of these files into /usr/src/app/neo4j/data/databases/ontology:
Nodes:
  [Concept]:
  /usr/src/app/neo4j/import/CUIs.csv

  [Semantic]:
  /usr/src/app/neo4j/import/TUIs.csv

  [Definition]:
  /usr/src/app/neo4j/import/DEFs.csv

  [Term]:
  /usr/src/app/neo4j/import/SUIs.csv

  [Code]:
  /usr/src/app/neo4j/import/CODEs.csv

Relationships:
  /usr/src/app/neo4j/import/CUI-CUIs.csv
  /usr/src/app/neo4j/import/CODE-SUIs.csv

  CODE:
  /usr/src/app/neo4j/import/CUI-CODEs.csv

  DEF:
  /usr/src/app/neo4j/import/DEFrel.csv

  STY:
  /usr/src/app/neo4j/import/CUI-TUIs.csv

  ISA_STY:
  /usr/src/app/neo4j/import/TUIrel.csv

  PREF_TERM:
  /usr/src/app/neo4j/import/CUI-SUIs.csv

Available resources:
  Total machine memory: 30.92GiB
  Free machine memory: 2.389GiB
  Max heap memory : 6.872GiB
  Processors: 8
  Configured max memory: 21.64GiB
  High-IO: true

Nodes, started 2022-12-06 20:39:22.543+0000
[*Nodes:0B/s 1.160GiB-------------------------------------------------------------------------]21.2M ∆3.21M
Done in 15s 926ms
Prepare node index, started 2022-12-06 20:39:38.477+0000
[*DEDUPLICATE:1.240GiB------------------------------------------------------------------------]89.0M ∆21.5M
Done in 9s 293ms
DEDUP, started 2022-12-06 20:39:47.887+0000
[*DEDUP---------------------------------------------------------------------------------------]    0 ∆    0
Done in 314ms
Relationships, started 2022-12-06 20:39:48.205+0000
[*Relationships:0B/s 1.240GiB-----------------------------------------------------------------]55.5M ∆ 100K
Done in 1m 2s 982ms
Node Degrees, started 2022-12-06 20:40:51.352+0000
[>(2)======================================|*CALCULATE:1.194GiB(5)============================]55.4M ∆21.6M
Done in 5s 232ms
Relationship --> Relationship 1-1738/1738, started 2022-12-06 20:40:56.719+0000
[>------------------------|*LINK(5)============================|v:149.9MiB/s------------------]55.4M ∆4.18M
Done in 12s 867ms
RelationshipGroup 1-1738/1738, started 2022-12-06 20:41:09.597+0000
[*>:??--------------------------------------------------------------------------------|v:??---]3.01M ∆3.01M
Done in 961ms
Node --> Relationship, started 2022-12-06 20:41:10.570+0000
[>:81.03MiB/s---------|>(2)======================|LINK-----|*v:146.6MiB/s(2)==================]20.5M ∆18.6M
Done in 2s 62ms
Relationship <-- Relationship 1-1738/1738, started 2022-12-06 20:41:12.686+0000
[>-------------------------------|*LINK(5)===========================|v:149.9MiB/s------------]55.4M ∆2.89M
Done in 12s 830ms
Count groups, started 2022-12-06 20:41:25.588+0000
[>|*>(6)=============================================|COUNT:1.036GiB--------------------------]3.01M ∆3.01M
Done in 318ms
Gather, started 2022-12-06 20:41:26.377+0000
[>------|*CACHE:1.410GiB----------------------------------------------------------------------]3.01M ∆ 712K
Done in 3s 297ms
Write, started 2022-12-06 20:41:29.683+0000
[*>:??----------------------------------------------------------------------------------||v:??]2.90M ∆2.90M
Done in 633ms
Node --> Group, started 2022-12-06 20:41:30.351+0000
[*>---------------------------------------|FIRST-----------------------|v:??(2)===============] 156K ∆9.82K
Done in 610ms
Node counts and label index build, started 2022-12-06 20:41:31.235+0000
[>(2)==========================|LABEL INDEX----------------|*COUNT:1.154GiB-------------------]21.2M ∆11.0M
Done in 2s 345ms
Relationship counts and relationship type index build, started 2022-12-06 20:41:33.595+0000
[*>(2)=========================================|RELATIONSH|COUNT(4)===========================]55.5M ∆15.5M
Done in 4s 450ms

IMPORT DONE in 2m 16s 907ms. 
Imported:
  21225536 nodes
  55490055 relationships
  82696223 properties
Peak memory usage: 1.336GiB
There were bad entries which were skipped and logged into /usr/src/app/neo4j/bin/import.report
Removing intermediate container 04b14d67ced9
 ---> eb809ebb0222
Step 10/12 : RUN yum install -y curl &&     chmod +x /usr/src/app/start.sh &&     rm -rf ${IMPORT}/*
 ---> Running in 3e28c920c93a
Loaded plugins: fastestmirror, ovl
Determining fastest mirrors
 * base: download.cf.centos.org
 * extras: download.cf.centos.org
 * updates: download.cf.centos.org
Package curl-7.29.0-59.el7_9.1.x86_64 already installed and latest version
Nothing to do
Removing intermediate container 3e28c920c93a
 ---> dce845db89b5
Step 11/12 : EXPOSE 7474 7687
 ---> Running in 0ab0ed23ebf5
Removing intermediate container 0ab0ed23ebf5
 ---> 2c1546f73b84
Step 12/12 : CMD ["/usr/src/app/start.sh"]
 ---> Running in e8f86902167f
Removing intermediate container e8f86902167f
 ---> 5d33e31ad8ad
Successfully built 5d33e31ad8ad
Successfully tagged ontology-api_ontology-neo4j:latest

hubmapconsortium / ontology-api

EPIC - Ontology: build API endpoints for valuesets used by CEDAR metadata templates #175

Issue

Solution

Requirements

Example

Release Notes

Note