monarch-initiative / mckb

Monarch Cancer Knowledge Base
1 stars 1 forks source link

curate variant types #3

Open nlwashington opened 9 years ago

nlwashington commented 9 years ago

the following sequence variant types need to be mapped to SO:

+-----------------------------+
| description                 |
+-----------------------------+
| Conversion                  |
| Deletion                    |
| Deletion/insertion (indels) |
| Duplication                 |
| Insertion                   |
| Inversion                   |
| Substitution                |
| Translocation               |
+-----------------------------+

additionally, there are CNVs and gene fusions that are not found in the "variant_class" table: we'll need to map: copy number gain; copy number loss; and gene_fusion

there is also functional consequences:

+-------------+
| description |
+-------------+
| Stop gain   |
| Stop loss   |
+-------------+

+---------------------------------+
| description                     |
+---------------------------------+
| gain-of-function                |
| gain-of-function (low activity) |
| loss-of-function                |
| reduced kinase activity         |
| switch-of-function              |
+---------------------------------+

and from the "protein variant type"

+--------------------------+
| description              |
+--------------------------+
| frameshift               |
| in-frame                 |
| nonsynonymous - missense |
| nonsynonymous - nonsense |
| synonymous               |
+--------------------------+
mbrush commented 9 years ago

General comment here - should we create a central document to house all mappings of terms/value lists from source data, to ontology terms in our import chain (e.g. SO, GENO). Would facilitate coordinated efforts here, inform new source mappings, and ensure consistent mappings across sources. Would such a document be a google doc for now?

Specific to the issue at hand - basic sequence variant type mappings to SO should be straightforward. Some classification of variants based on function and protein consequences is found in SO, but there is also VariO which deals with these types of terms in much greater detail. I think we would prefer to stick to SO for all variant terminology, but @mellybelly should weigh in about additional use of VariO (as she has been asked to perform a formal review of how this ontology is being applied). An alternate or complementary approach here is to separate the functional consequence from the mutation, perhaps by creating an additional mapping from a more generic variant type to some GO MF term (e.g. protein kinase activity).

Finally, I have recently contacted SO to ask about the best approach for new term requests given the extensive refactoring efforts, and recent inactivity on the sourceforge tracker. It might be that I am able to edit directly within an assigned ID range to implement terms we need and get IRIs back for mapping.

nicolevasilevsky commented 9 years ago

I finished a pass at mapping these terms with SO terms.