Closed julia326 closed 7 years ago
Variant sequence:
VariantSequence(prefix='TCAAACACGATTTCATCATCATCCCCAGCACCAAATGTGTCCATTTCATTGATT', alt='C', suffix='TGGCATGTTCTGGAAGTTCTCCATAGGCCTTCAGACTTCTTGCTTAG', sequence='TCAAACACGATTTCATCATCATCCCCAGCACCAAATGTGTCCATTTCATTGATTCTGGCATGTTCTGGAAGTTCTCCATAGGCCTTCAGACTTCTTGCTTAG', reads=frozenset([AlleleRead(prefix='CTCTTCATCATCCTCTCCAATATCATCAAACACGATTTCATCATCATCCCCAGCACCAAATGTGTCCATTTCATTGATT', allele='C', suffix='TGGCATGTTCTGGAAGTTCTCCATAGGCCTTCAGACTTCTTGCTT', name='D00543:132:CA2WYANXX:7:2301:17630:72894', sequence='CTCTTCATCATCCTCTCCAATATCATCAAACACGATTTCATCATCATCCCCAGCACCAAATGTGTCCATTTCATTGATTCTGGCATGTTCTGGAAGTTCTCCATAGGCCTTCAGACTTCTTGCTT'), AlleleRead(prefix='CTTCATCATCCTCTCCAATATCATCAAACACGATTTCATCATCATCCCCAGCACCAAATGTGTCCATTTCATTGATT', allele='C', suffix='TGGCATGTTCTGGAAGTTCTCCATAGGCCTTCAGACTTCTTGCTTAG', name='D00543:132:CA2WYANXX:7:2301:17630:72894', sequence='CTTCATCATCCTCTCCAATATCATCAAACACGATTTCATCATCATCCCCAGCACCAAATGTGTCCATTTCATTGATTCTGGCATGTTCTGGAAGTTCTCCATAGGCCTTCAGACTTCTTGCTTAG'), AlleleRead(prefix='TTGGAATATAGCATGATTAGATTATGTTAGATATCATCAATGTCTTCATCATCCTCTCCAATATCATCAAACACGATTTCATCATCATCCCCAGCACCAAATGTGTCCATTTCATTGATT', allele='C', suffix='TGGC', name='D00543:132:CA2WYANXX:7:1210:19312:79146', sequence='TTGGAATATAGCATGATTAGATTATGTTAGATATCATCAATGTCTTCATCATCCTCTCCAATATCATCAAACACGATTTCATCATCATCCCCAGCACCAAATGTGTCCATTTCATTGATTCTGGC')]))
Reference context:
ReferenceContext(strand='-', sequence_before_variant_locus='ATCCAGATGAAGCAAGAAGTCTGAAGGCCTATGGAGAACTTCCAGAACATGCCA', sequence_at_variant_locus='A', sequence_after_variant_locus='AATCAATGAAATGGACACATTTGGTGCTGGGGATGATGATGAAATCGTGTTTGA', offset_to_first_complete_codon=2, overlaps_start_codon=False, contains_start_codon=False, contains_five_prime_utr=False, amino_acids_before_variant='PDEARSLKAYGELPEHA', variant=Variant(contig='12', start=88271648, ref='T', alt='C', reference_name='GRCm38'), transcripts=(Transcript(transcript_id=ENSMUST00000101168, name=Gm5662-001, gene_id=ENSMUSG00000079029, gene_name=Gm5662, biotype=protein_coding, location=12:88270640-88274500),))
That variant sequence results in a cdna_prefix of ''CTAAGCAAGAAGTCTGAAGGCCTATGGAGAACTTCCAGAACATGCCA' and is matched to a reading frame with offset 1, having 2 mismatches before the variant (fitting under threshold). The first codon is then TAA (a stop codon), so we get an empty amino acid sequence. We need to figure out the right way to deal with surprise stop codons resulting from some variant sequence in reading frame. @iskandr thoughts?
Issue moved to hammerlab/isovar #85 via ZenHub
Possibly relevant logs (maybe problematic variant?):