monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
5 stars 3 forks source link

Synonyms missing from `components/ncit.owl` #612

Closed joeflack4 closed 4 weeks ago

joeflack4 commented 1 month ago

Overview

While working on the synonym sync, I noticed that this file has no synonyms.

Additional info

tmp/component-download-ncit.owl.owl does have synonyms, though.

Example case code snippets

Example: `NCIT:C157344` `components/ncit.owl`: ```owl Stage I Differentiated Thyroid Gland Carcinoma 45 Years and Older AJCC v7 ``` `tmp/component-download-ncit.owl.owl`: ```owl A preparation of autologous T-lymphocytes transduced with a retroviral vector encoding a T-cell receptor (TCR) sequence specific for CT-RCC-1, a tumor-associated antigen (TAA) and HLA-A11-restricted peptide encoded by human endogenous retrovirus (HERV) type E as well as a truncated CD34 chain (CD34t), with potential antineoplastic activity. Upon isolation, transduction, expansion ex vivo and re-introduction into the patient, the autologous CT-RCC-1 HERV-E-TCR-transduced-HLA-A11-restricted CD8+/CD34t+ T-cells bind to and induce selective toxicity in tumor cells expressing both the HLA-A11 allele and the CT-RCC-1 HERV-E antigen. The CD34t protein allows the transduced cells to be identified with an anti-CD34 antibody, and facilitates monitoring of the genetically modified T-cells following adoptive transfer. CT-RCC-1 HERV-E is a TAA found in a high percentage of clear cell renal cell carcinoma (ccRCC) cells. C157344 Cell Pharmacologic Substance Autologous CT-RCC-1 HERV-E-TCR-transduced-HLA-A11-restricted CD8+/CD34t+ T-cells Autologous CT-RCC-1 HERV-E-TCR-transduced-HLA-A11-restricted CD8+/CD34t+ T-cells CL936969 CTRP 796813 796813 796813 Anti-CT-RCC-1 HERV-E-TCR-transduced-HLA-A11-restricted Autologous CD8+/CD34t+ T-cells (SY) Autologous CT-RCC-1 HERV-E-TCR-transduced-HLA-A11-restricted CD8+/CD34t+ T-cells Autologous CT-RCC-1 HERV-E-TCR-transduced-HLA-A11-restricted CD8+/CD34t+ T-lymphocytes Autologous CT-RCC-1 HERV-E-TCR-transduced-HLA-A11-restricted CD8+/CD34t+ T-cells A preparation of autologous T-lymphocytes transduced with a retroviral vector encoding a T-cell receptor (TCR) sequence specific for CT-RCC-1, a tumor-associated antigen (TAA) and HLA-A11-restricted peptide encoded by human endogenous retrovirus (HERV) type E as well as a truncated CD34 chain (CD34t), with potential antineoplastic activity. Upon isolation, transduction, expansion ex vivo and re-introduction into the patient, the autologous CT-RCC-1 HERV-E-TCR-transduced-HLA-A11-restricted CD8+/CD34t+ T-cells bind to and induce selective toxicity in tumor cells expressing both the HLA-A11 allele and the CT-RCC-1 HERV-E antigen. The CD34t protein allows the transduced cells to be identified with an anti-CD34 antibody, and facilitates monitoring of the genetically modified T-cells following adoptive transfer. CT-RCC-1 HERV-E is a TAA found in a high percentage of clear cell renal cell carcinoma (ccRCC) cells. NCI Anti-CT-RCC-1 HERV-E-TCR-transduced-HLA-A11-restricted Autologous CD8+/CD34t+ T-cells (SY) SY NCI Autologous CT-RCC-1 HERV-E-TCR-transduced-HLA-A11-restricted CD8+/CD34t+ T-cells DN CTRP Autologous CT-RCC-1 HERV-E-TCR-transduced-HLA-A11-restricted CD8+/CD34t+ T-cells PT NCI Autologous CT-RCC-1 HERV-E-TCR-transduced-HLA-A11-restricted CD8+/CD34t+ T-lymphocytes SY NCI ``` Here's the component goal. At a glance, I don't see why synonyms are being stripped, but I haven't looked closely: ```make $(COMPONENTSDIR)/ncit.owl: $(TMPDIR)/ncit_relevant_signature.txt | component-download-ncit.owl if [ $(SKIP_HUGE) = false ] && [ $(COMP) = true ]; then $(ROBOT) remove -i $(TMPDIR)/component-download-ncit.owl.owl --select imports \ rename --mappings config/property-map.sssom.tsv --allow-missing-entities true --allow-duplicates true \ query --update ../sparql/rm_xref_by_prefix.ru \ remove -T $(TMPDIR)/ncit_relevant_signature.txt --select complement --select "classes individuals" --trim false \ remove -T config/properties.txt --select complement --select properties --trim true \ remove --term "http://purl.obolibrary.org/obo/NCIT_C179199" --axioms "equivalent" \ annotate --ontology-iri $(URIBASE)/mondo/sources/ncit.owl --version-iri $(URIBASE)/mondo/sources/$(TODAY)/ncit.owl -o $@; fi ```

matentzn commented 1 month ago

The NCIT Synonym types need to be mapped to oio:exact synonym and friends..

joeflack4 commented 1 month ago

@matentzn Hmm, it looks like that may possibly be already completed. If you look in the OP at the component-download-ncit.owl.owl snippet, the example class has both of these properties:

        <obo:NCIT_P107>Autologous CT-RCC-1 HERV-E-TCR-transduced-HLA-A11-restricted CD8+/CD34t+ T-cells</obo:NCIT_P107>
        <obo:NCIT_P108>Autologous CT-RCC-1 HERV-E-TCR-transduced-HLA-A11-restricted CD8+/CD34t+ T-cells</obo:NCIT_P108>
        <oboInOwl:hasExactSynonym>Autologous CT-RCC-1 HERV-E-TCR-transduced-HLA-A11-restricted CD8+/CD34t+ T-cells</oboInOwl:hasExactSynonym>

It brings up some questions to maybe investigate whenever we look at this, though:

  1. Is the oboInOwl:SYNONYM coverage currently 100%?
  2. Does the NCIT release come with these oio predicates, or are they mapped / copied somewhere in the ODK pipeline?
  3. What are all of the various obo:NCIT_* synonym predicates?
  4. Why in the example above is there a synonym that uses multiple obo:NCIT_* predicates? (*P107 and *P108)?
twhetzel commented 1 month ago

I looked into this a bit. Oddly (since this step is used in many goals) when this line is removed from $(COMPONENTSDIR)/ncit.owl the the synonyms are present in the "ncit.owl" file.

matentzn commented 1 month ago

Ping me on slack if you need help with this!

twhetzel commented 4 weeks ago

Closed with https://github.com/monarch-initiative/mondo-ingest/pull/613