obophenotype / bio-attribute-ontology

source files for OBA (Ontology of Biological Attributes)
https://obophenotype.github.io/bio-attribute-ontology
Creative Commons Zero v1.0 Universal
27 stars 11 forks source link

Create infrastructure and SOP for mapping chemicals to CHEBI & swissLipids for generation of chemical measurement terms #243

Closed dosumis closed 1 year ago

dosumis commented 1 year ago

Background - The GWAS catalog needs to curate associations between variants and changes in the levels of many metabolites in, for example, blood. Studies often measure large numbers of metabolites and record these using standard chemical identifiers. We need to map these in to chemical ontologies that we can use - this means CHEBI right now, and possibly SwissLipids in the near future. Both of these ontologies are rich sources of ID mappings.

Proposed strategy - ensure that chemical ID mappings from CHEBI & swissslipids are available in OxO (loaded as SSOM) and that they can be used easily for mapping lists of IDs from GWAS curators to CHEBI/SwissLipids. From there we need SOPs/pipelines to get these IDs into TSVs for generation of OBA terms - with the OBA terms added to lists for import to EFO + a mechanism to inform GWAS curators of term availability.

Potential issue: Chemicals not in CHEBI. Can we devise a strategy for mapping up? Worth discussing with experts on CHEBI channel on OBO foundry.

As well as clear SOPs - all involved in mapping and term generation should have a good understanding of the whole process of generation of and use of mappings so that they can take ownership in future.

dosumis commented 1 year ago

Test: Sheet 2 on this table has a list of metabolites measured in a GWAS study. https://static-content.springer.com/esm/art%3A10.1038%2Fs41588-022-01270-1/MediaObjects/41588_2022_1270_MOESM4_ESM.xlsx

The table is very rich in chemical identifiers:

Metabolites Compound ID (Metabolon) CHRO_LIB_ENTRY_ID COMP_ID LIB_ID SUPER_PATHWAY SUB_PATHWAY PATHWAY_SORTORDER TYPE INCHIKEY SMILES CAS CHEMSPIDER HMDB HMDB_curated KEGG PUBCHEM PUBCHEM_curated
S-1-pyrroline-5-carboxylate X35 166164 42370 400 Amino Acid Glutamate Metabolism 62 NAMED DWAKNKKXGALPNW-REWHXWOFAV OC(C1CCC=N1)=O 2906-39-0 10140206 HMDB0001301 HMDB0001301 C04322 11966181 11966181
spermidine X50 157851 485 402 Amino Acid Polyamine Metabolism 545 NAMED ATHGHQPFGPMSJY-UHFFFAOYAK NCCCCNCCCN 124-20-9 1071 HMDB0001257 HMDB0001257 C00315 1102 1102
1-methylnicotinamide X55 155829 27665 400 Cofactors and Vitamins Nicotinate and Nicotinamide Metabolism 4316 NAMED LDHMAVIPBRSVRG-UHFFFAOYAE C[N+]1=CC(C([NH-])=O)=CC=C1 1005-24-9 8305504 HMDB0000699 HMDB0000699 C02918 457 457

Given the identifier mappings extracted from CHEBI & Swisslipids, how many terms from this table can we map to an ontology identifier (=> separate breakdowns for CHEBI & Swisslipids).

If many do not map, I would be interested to discuss with chemi-informatics experts whether there are good strategies for mapping up. I'm hoping it would be enough to have a short discussion on OBO Slack with some examples.

Motivating issue: https://github.com/EBISPOT/efo/issues/1905

Note - Santhi originally requested 415 terms (although the abstract mentions associations for 690 metabolites). Following discussion of identifiers, she has reduced her list to just 28 terms, but I think that is just because we communicated back that IDs are needed and she chose a subset of possible IDs (CAS ID/KEGG ID/PubChem ID), not because the science or needs of the GWAS catalog dictate adding such a small set.

cmungall commented 1 year ago

related: