monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
6 stars 3 forks source link

General guidelines for importing genes #48

Open matentzn opened 2 years ago

matentzn commented 2 years ago
  1. https://github.com/monarch-initiative/mondo-ingest/issues/47
  2. Gene associations must be confirmed by OMIM (!?, @nicolevasilevsky says clingen suggests d2gs)
nicolevasilevsky commented 2 years ago
sabrinatoro commented 2 years ago

We will import all gene links that are connected to one disease and one disease only. @nicolevasilevsky I don't think we can do this: we have branches that are called "gene-related diseases", so you already have 1 gene - N diseases

nicolevasilevsky commented 2 years ago

I will need to look at this more closely but I'm not sure what you mean @sabrina. We have branches that are gene-related disease, and the children are 1 gene - 1 disease.

At any rate, I'll work on this and share the draft with you two for review. :)

nicolevasilevsky commented 2 years ago

@sabrinatoro I misunderstood your comment. You are right, sometimes we have single genes associated with more than one disease.

nicolevasilevsky commented 2 years ago

Draft:

  1. We slurp the gene association from OMIM whenever there is only a single disease to gene association for a disease. We implicitly assume that these cases are monogenic diseases (which may not be fully optimal but will be mostly ok (according to our knowledge of OMIM)). We won't import disease to gene associations if there is more than one gene.
  2. Genes can be associated with more than one disease, for example, APC is implicated in MONDO:0016613 'APC-related attenuated familial adenomatous polyposis' and MONDO:0021056 'familial adenomatous polyposis 1'
  3. We may get disease to gene associations from external groups as well, such as ClinGen, who often request new gene-related disease terms.

@sabrinatoro @matentzn

nicolevasilevsky commented 2 years ago

discussed on curation call, need to discuss with technical group

nicolevasilevsky commented 2 years ago

revised text:

  1. We slurp the gene association from OMIM whenever a disease has only 1 gene. We implicitly assume that these cases are monogenic diseases (which may not be fully optimal but will be mostly ok (according to our knowledge of OMIM)). We won't import disease to gene associations if there is more than one gene.
  2. Genes can be associated with more than one disease, for example, APC is implicated in MONDO:0016613 'APC-related attenuated familial adenomatous polyposis' and MONDO:0021056 'familial adenomatous polyposis 1'
  3. We may get disease to gene associations from external groups as well, such as ClinGen, who often request new gene-related disease terms.

@nicolevasilevsky add to Mondo documentation

Edit: Done - added here: https://mondo.readthedocs.io/en/latest/editors-guide/synching/#omim-syncslurp

nicolevasilevsky commented 2 years ago

assigning to @matentzn to work on updating the slurp pipeline