monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
57 stars 26 forks source link

refactor OMIM for non-direct-causation gene-disease association #127

Closed nlwashington closed 9 years ago

nlwashington commented 9 years ago

From the OMIM FAQ:

Brackets, "[ ]", indicate "nondiseases," mainly genetic variations that lead to apparently abnormal laboratory test values (e.g., dysalbuminemic euthyroidal hyperthyroxinemia).

Braces, "{ }", indicate mutations that contribute to susceptibility to multifactorial disorders (e.g., diabetes, asthma) or to susceptibility to infection (e.g., malaria).

A question mark, "?", before the disease name indicates an unconfirmed or possibly spurious mapping.

The number in parentheses after the name of each disorder indicates the following: (1) the disorder was positioned by mapping of the wildtype gene; (2) the disease phenotype itself was mapped; (3) the molecular basis of the disorder is known; (4) the disorder is a chromosome deletion or duplication syndrome. Move the cursor over the number to display this information.

we need to refactor our OMIM ingest to either:

  1. completely remove these associations
  2. change the relationship between the gene and the "disease". (my preferred method) but we will need those other relationships that i've requested from @cmungall .

therefore, we need relationships for general kinds of associations/correlations (such as for QTLs), and for susceptibility toward.

note that suscep* is not in RO.

some relationships might work for susceptibility, such as: capable of regulating (RO:0002596) contributes_to (RO:0002326)

as for correlation, we could use something like: participates in (RO:0000056) but not all things that are correlated with a disease are necessarily participating in it's causation, so this is making assumptions.

then, we should create a description for the association that includes some detail taken from the [] or {} or ? definition above.

nlwashington commented 9 years ago

@cmungall did we decide on these? can you comment?

nlwashington commented 9 years ago

for the bracket entries, we can use the 'is_marker_for' relationship RO:0002607.

for now, i'm going to use the "contributes to" RO:0002326 for the susceptibility relationships (curly braces). @cmungall redirect me if you disagree.

cmungall commented 9 years ago

+1

nlwashington commented 9 years ago

i figure the "possible spurious mapping" with the question mark should have some kind of different evidence code? not sure what this should be.

nlwashington commented 9 years ago

moving the "?" ones to another ticket #154.