x-atlas-consortia / hs-ontology-api

Ontology API built on top of the UBKG API and shared by HuBMAP and SenNet projects
0 stars 0 forks source link

Endpoints: celltypes, celltype #8

Closed AlanSimmons closed 6 months ago

AlanSimmons commented 1 year ago

References

  1. Request
  2. [Specification](https://docs.google.com/document/d/1gW53B9YL7P3Qnm0sTvMuzz9DFvd4j80qDZFWtHwNavQ/edit?usp=sharing

Request

See cell_type_detail in specification.

Dependencies:

  1. @AlanSimmons will provide the Cypher query for the endpoint.
  2. Dependent on #14 being finished
  3. See specification.
AlanSimmons commented 1 year ago

Parameters from the cells API

Based on[ this Slack message](Slack message), the cells API will provide cell types as an array of terms corresponding to AZ codes.

Following is the list mentioned in the Slack channel. This list is from 9 months ago.

Azimuth codes are ingested into UBKG using this mapping. AZ codes are cross-referenced to codes in Cell Ontology. Because AZ has a higher resolution than CL (e.g., it distinguishes B cells based on organ), there is a 1:many relationship between CL and AZ codes.

Based on the Azimuth mapping, it appears that the response from the cells API corresponds to terms of type SY in the AZ SAB.

For example, consider "Ascending Thin Limb Cell" (now apparently "Ascending Thin Limb". In the UBKG, "Ascending Thin Limb" is a synonym for code AZ:0000023, (Kidney_L3_Ascending Thin Limb).

['Afferent Arteriole Endothelial Cell',
 'Ascending Thin Limb Cell',
 'Ascending Vasa Recta Endothelial Cell',
 'Connecting Tubule Cell',
 'Connecting Tubule Intercalated Cell Type A',
 'Connecting Tubule Principal Cell',
 'Cortical Collecting Duct Principal Cell',
 'Cortical Thick Ascending Limb Cell',
 'Dendritic Cell (classical)',
 'Descending Thin Limb Cell Type 1',
 'Descending Thin Limb Cell Type 2',
 'Descending Thin Limb Cell Type 3',
 'Descending Vasa Recta Endothelial Cell',
 'Distal Convoluted Tubule Cell Type 1',
 'Fibroblast',
 'Glomerular Capillary Endothelial Cell',
 'Inner Medullary Collecting Duct Cell',
 'Intercalated Cell Type B',
 'Juxtaglomerular granular cell (Renin positive)',
 'Lymphatic Endothelial Cell',
 'M2-Macrophage',
 'Macula Densa cell',
 'Mast cell',
 'Medullary Thick Ascending Limb Cell',
 'Mesangial Cell',
 'Monocyte',
 'Natural Killer T Cell',
 'Neutrophil',
 'non Classical Monocyte',
 'Outer Medullary Collecting Duct Intercalated Cell Type A',
 'Outer Medullary Collecting Duct Principal Cell',
 'Parietal Epithelial Cell',
 'Peritubular Capillary Endothelial Cell',
 'Plasma cell',
 'Podocyte',
 'Proximal Tubule Epithelial Cell Segment 1',
 'Proximal Tubule Epithelial Cell Segment 2',
 'Proximal Tubule Epithelial Cell Segment 3',
 'unknown',
 'Vascular Smooth Muscle Cell/Pericyte (general)']
AlanSimmons commented 1 year ago

Limitation: cell type-biomarker

The UBKG imports information from the HRA regarding how cell types associate with biomarkers. The HRA information in UBKG is limited to cell type (CL) to gene (HGNC). These mappings are general, and not specific to a cell types identified in a particular dataset.

Biomarker information for a particular cell type identified in actual data comes from the cells API.

AlanSimmons commented 1 year ago

Cypher query for endpoint

Method

POST

File

cell_type_detail_cypher.txt

Cypher

// Return detailed inforomation on cell types, based on a input list of AZ terms.

CALL
// Get CUIs of concepts for cell types that match the criteria.  

{

// Criteria: set of terms for Azimuth cell types, which correspond to synonyms for AZ codes. These are cross-referenced to CL codes.
//WITH ['Adipocyte','Fibroblast'] AS ids
WITH [''] AS ids
OPTIONAL MATCH (pAZ:Concept)-[:CODE]->(cAZ:Code)-[r]->(tAZ:Term) WHERE r.CUI=pAZ.CUI AND type(r) IN ['SY'] AND cAZ.SAB='AZ' AND CASE WHEN ids[0]<>'' THEN ANY(id in ids WHERE tAZ.name=id) ELSE 1=1 END RETURN DISTINCT pAZ.CUI AS AZCUI  
}

CALL
{

// CL codes and preferred term

// Cell types - CL Code|preferred term
// CL codes can be ingested as part of the ingestion of other ontologies in UBKG (e.g. UBERON).
// A CL code may have multiple terms of type "PT".  
// If a CL code was ingested as part of CL, then there will be a PT code; if not, then there may be one or more terms of type PT_SAB--e.g.,  PT_UBERON is the preferred term for the CL code ingested with UBERON.  
// The preferred term will be the term of type PT; if there is no PT, then any of the others of type PT_SAB will do. 

// First, order the preferred terms by whether they are the PT or a PT_SAB. 
WITH AZCUI  
CALL{ 
WITH AZCUI 
OPTIONAL MATCH (pCL:Concept)-[:CODE]->(cCL:Code)-[rCL]->(tCL:Term) WHERE pCL.CUI=AZCUI AND cCL.SAB='CL' AND rCL.CUI=pCL.CUI AND type(rCL) STARTS WITH 'PT' RETURN cCL.CodeID AS CLID, MIN(CASE WHEN type(rCL)='PT' THEN 0 ELSE 1 END) AS mintype order by CLID,mintype 
} 

// Next, filter to either the PT or one of the PT_SABs. 
WITH CLID, mintype 
OPTIONAL MATCH (cCL:Code)-[rCL]->(tCL:Term) 
WHERE cCL.CodeID = CLID AND type(rCL) STARTS WITH 'PT' 
AND CASE WHEN type(rCL)='PT' THEN 0 ELSE 1 END=mintype 
RETURN cCL.CodeID AS CLID, 'cell_types_name' AS ret_key, CASE WHEN tCL.name IS NULL THEN '' ELSE tCL.name END AS ret_value  
ORDER BY CLID

UNION

// Cell types - CL code|definition
// Because definitions link to Concepts and multiple CL codes can match to the same concept, there will be duplicate and extraneous definitions. 
// There is currently no way to link the definition to the code, so collect the definitions and take the first one.

WITH AZCUI  
OPTIONAL MATCH (pCL:Concept)-[:CODE]->(cCL:Code),(pCL:Concept)-[:DEF]->(dCL:Definition) WHERE pCL.CUI=AZCUI AND cCL.SAB='CL' AND dCL.SAB='CL' RETURN cCL.CodeID AS CLID,'cell_types_definition' as ret_key, COLLECT(DISTINCT dCL.DEF)[0]  as ret_value 
ORDER BY CLID

UNION
//CL-HGNC mappings via HRA

//HGNC ID
WITH AZCUI
OPTIONAL MATCH (cCL:Code)<-[:CODE]-(pCL:Concept)-[:HRA_has_marker_component]->(pGene:Concept)-[:CODE]->(cGene:Code)-[r]->(tGene:Term)
WHERE pCL.CUI=AZCUI AND cGene.SAB='HGNC' AND r.CUI=pGene.CUI AND cCL.SAB='CL' AND type(r) IN ['ACR','PT']
RETURN distinct cCL.CodeID as CLID, 'cell_types_hgnc' as ret_key, cGene.CodeID + '|' + apoc.text.join(COLLECT(tGene.name),'|') AS ret_value
ORDER BY CLID, cGene.CodeID + '|' + apoc.text.join(COLLECT(tGene.name),'|')

}

//Pivot results

WITH CLID, ret_key, COLLECT(ret_value) AS values  
WITH CLID,apoc.map.fromLists(COLLECT(ret_key),COLLECT(values)) AS map  
RETURN CLID, 
map['cell_types_name'] AS cell_types_code_name,
map['cell_types_definition'] AS cell_types_definition,
map['cell_types_hgnc'] AS cell_types_hgnc_id

order by CLID
AlanSimmons commented 1 year ago

Working with response: biomarkers

The cell_types_hgnc field is an array of genes that associate with the cell type. Each element of the array is in format

HGNC ID|approved name|approved symbol

e.g.,

["HGNC:21411|apoptosis inducing factor mitochondria associated 2|AIFM2", "HGNC:2211|collagen type VI alpha 1 chain|COL6A1", "HGNC:2212|COL6A2|collagen type VI alpha 2 chain", "HGNC:2705|decorin|DCN", "HGNC:2731|discoidin domain receptor tyrosine kinase 2|DDR2", "HGNC:4620|GSN|gelsolin", "HGNC:8803|PDGFRA|platelet derived growth factor receptor alpha"]

These values should be translated to the biomarkers element in the response JSON.

Because HRA currently only maps CL codes to HGNC codes, the type and vocabulary keys are hard-coded.

Example

"biomarkers": [
    {
      "type": "gene",
      "vocabulary": "hgnc",
      "id": "7178",
      "symbol": "MMRN1",
      "name": "multimerin-1"
    }
  ]
AlanSimmons commented 11 months ago

Endpoints documented in SmartAPI for review.

AlanSimmons commented 11 months ago

Initial scope

The initial scope of these calls will be UBKG.