monarch-initiative / mckb

Monarch Cancer Knowledge Base
1 stars 1 forks source link

curate gene-fusion partners #2

Open nlwashington opened 9 years ago

nlwashington commented 9 years ago

from what i can tell, only one of the two genes in a gene fusion is curated in oscar's database. for example, selection of therapy_variant.id = 724 gives:

+--------------------+----------------------+--------+---------------+---------+------+
| therapy_variant_id | comment              | aa_var | transcript_id | tv_gene | gene |
+--------------------+----------------------+--------+---------------+---------+------+
|                724 | BCR ABL1 fusion gene | NULL   | NULL          | BCR     | NULL |
+--------------------+----------------------+--------+---------------+---------+------+

only linked to BCR. i don't see anywhere else that the ABL1 partner is referenced.

nlwashington commented 9 years ago

these may be accessible through the schema:

As you noticed, the therapy_variant is sparsely populated. That is because it implements a single table inheritance pattern: http://www.martinfowler.com/eaaCatalog/singleTableInheritance.html That table is only used for specifying "rules" for mapping variants to the treatment/phenotype/evidence in or linked by the therapy_genotype table.

The basic idea is that there are 6 rule types that can be stored in the therapy_variant table. A given row in the table represents only 1 of those rule types and would be invalid if it had the wrong combination of fields filled out. The 6 types are:

  1. Protein Variant – links directly to a Protein Variant in the CGD database via the protein_variant field
  2. Copy Number – describes a copy number variant using the copy_gene and copy_number_result (GAIN/LOSS). NOTE: should have used the gene field here instead of creating copy_gene.
  3. Gene Fusion – describes a gene fusion by specifying both fusion partners or if gene is blank then it is just any rearrangement of the second gene that goes in the gene_fusion field. For example, if you wanted “rearrangement of ALK” you would set the gene_fusion field to be ALK. If you wanted the specific fusion of BCR-ABL1 then you would set gene to be BCR and gene_fusion to be ABL1.
  4. Functional Impact – Any mutation on the gene that produces the given functional_impact (e.g. gain-of-function)
  5. Protein Coordinates – Any protein_variant_type on the given transcript between amino_acid_start and amino_acid_end.
  6. Genomic Coordinates – Any variant_type for the given genome_build on chromosome between(inclusive) genomic_start and genomic_end
kshefchek commented 9 years ago

_get_fusion_copy_any_mutation_genotypes() will return both genes