openvax / varcode

Library for manipulating genomic variants and predicting their effects
Apache License 2.0
81 stars 25 forks source link

Support loading structural variants #123

Open iskandr opened 9 years ago

iskandr commented 9 years ago

Currently we assume that every variant is a focal change with a nucleotide alt sequence. This is only a subset of the VCF spec and doesn't allow us to load variants such as:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  synthetic.challenge.set4.normal synthetic.challenge.set4.tumour
1   959526  MantaDEL:42:0:1:0:0:0   T   <DEL>   .   MinSomaticScore END=962016;SVTYPE=DEL;SVLEN=-2490;IMPRECISE;CIPOS=-86,86;CIEND=-65,66;SOMATIC;SOMATICSCORE=18   PR  12,0    8,5
1   4932289 MantaDUP:TANDEM:263:0:1:0:0:0   A   <DUP:TANDEM>    .   PASS    END=4943947;SVTYPE=DUP;SVLEN=11658;SOMATIC;SOMATICSCORE=73  PR:SR   24,0:60,0   14,6:28,6
1   8411313 MantaDUP:TANDEM:461:0:1:0:0:0   A   <DUP:TANDEM>    .   PASS    END=8419870;SVTYPE=DUP;SVLEN=8557;SOMATIC;SOMATICSCORE=55   PR:SR   11,0:20,0   5,5:14,8
1   10856998    MantaINV:563:0:1:0:0:0  C   <INV>   .   PASS    END=10864731;SVTYPE=INV;SVLEN=7733;CIPOS=0,5;CIEND=-5,0;HOMLEN=5;HOMSEQ=CCCCC;INV3;EVENT=MantaINV:563:0:1:0:0:0;SOMATIC;SOMATICSCORE=79;JUNCTION_SOMATICSCORE=32    PR:SR   9,0:15,0    2,5:10,4

Though we probably can't do much to predict the protein sequence for most of these variants we should still be able to load them and distinguish them from focal variants.

armish commented 9 years ago

Surfaced because of this error within Cycledash: https://github.com/hammerlab/cycledash/issues/840