Open krabapple opened 2 years ago
I know there is a previous bugfix on this #65 , and it was marked a closed issue, but I still see it, using 0.12.7
Here's the test input, a transcript with an incorrect translation start coordinate (off by 1)
$cat badstart.fa >chr_I:c7772805-7771137 TATGCAGGCAAGTGCAGATATTCGCACATTAGAGCGCAATGTCTTTATCAATGCCATTACAAAGATCAAGAGCGAGAATA AAGGATTCAAATTCATTCTCGTTGTCGATACAATGACATTGCCAGCAATTTCTGCTTTAATCAGCACAACAGAGCTCATT GAGCACAATGTCATTCTGACAGAACTTTACGAGAAGAAGCGTGAACCCTTCCCAATGTATCAAGCTATTTATTTACTTCA TCCTTGCGTTGATATCAAACAATTTATTGCTGATTTCAGTGATAAAACACCATTATACGCGGCAGCTCACCTTTTATTCA CATACACATGCCTTCCATCAACAATGAGTGCATTAGAATCACATCCTGAAGTTGTCAATCACATTTTAACATTTCTTGAT CTCTGGATCCATTATCAGCCTACACAACCAGGTGTTTTCATGGTACCATCACCACGAGCCTACAAATTCCTCTATTCTCC AAACTCTCCCGTTGATTTCAACGCGATCTCCGACGAGATCAACTACGGCCTTATCAGCTTCTTCTTGACCATCAAGGCCA TCCCAAGTATCGCCTATCCAAAATCCATGGATAAAATCAAGAGATTCTGCACAAAATTCAACGATCAATTAATGGGCGTT ACTCGCGAGCTAGATGATGGCGGACGATCCCTTAACAAGGACAACACCCTTCTTCTCATCATTCCTCGTGGTGCCGACCC AATTACGCCATTACTCCACAGATTCACCTACCAATCCATGATTTACGAGAACTTCCCAGTTGAAAACGACTGTGTCTACC TTAAGGAAGGCGATCCGAACAGCGTTGTCGCACTCAATCCATACGAAGACAAAATCTTCAACGATTACTGCTACAAACAC TTCCAGAAGTTCATCGAGATCCGTGATCTCTCATCGCAAGTCACACAGACCCACAAAATCATTACAGATCCAAATATCGA CAAAACATCTCAAAAATATGCGGAAGCAATGAAGCGCTACATCAGAGATCAGTCGTATGCTACAACAGTATCCAACCACC TCGACATTTGTATACGTCTGGATGAAGCATTAACTAAGCGTTGCCTTGCAGATATTTCGAAATACGAACAACAGCTTGCC GCCAAGGAGAAAGACGGCTCCAGCTACAAGCCGAACCTTTCCGACCTCCAGGACATCATTATCAAGCAGGGACCAGAGCC AATGGACAAGTTAAGAGCATTGGCCGTTTACCAACTTTCCGGTGGAAAAATTTCTGACACAGAATTGGAGAGGCTTTTAG AGGCAGGATCACTTACAACGGGCAACTGGAAAGATGCGATCTTGAACCTCAAATACCTTGGAGATGCAGCTCCAAGACAA TCCTTGAGAGAAGTAAATCCGGTCGAAGAACCATTTGTTACTGACATTTACTACCCTTACGCTGCACAGGCTTGCATGGA CGCAATCGATGGAAAACTCGACAAGAAGTGCTTCGAAATTCCAACACACACAGGAAGATACACAAACGTTGTTTTGTTCT TCTTCGGAGGTGTTTCGTTCGCTGAATTAGAGAAGGTAGAGTTGCTCAGACAGAGAATGAAGAGCACAAAGATCTACGTC GGTGCAACAAACACTTTGACGCCAAGACAGTTCTTGGCTCAGACAAGAGGTTTAACTTCAACTCAGTAA cat short.badstart.gff chr_I AUGUSTUS transcript 7771137 7772805 . - . ID=mrna-0948210 chr_I AUGUSTUS CDS 7771137 7772805 . - ID=cds-0948210;Parent=mrna-0948210
gffread correctly recognizes it as having in frame stop codons (at phase 0).
Command line was: ./gffread -g genome.fasta -P -v -E short.badstart.gff .. loaded 1 genomic features from short.badstart.gff Warning: In-frame STOP found for 'mrna-TVAGG3_0948210' ##gff-version 3 # gffread v0.12.7 # ./gffread -g genome.fasta -P -v -E short.badstart.gff chr_I AUGUSTUS transcript 7771137 7772805 . - . ID=mrna-0948210;InFrameStop=true chr_I AUGUSTUS CDS 7771137 7772805 . - 0 Parent=mrna-0948210
-y output has the numerous in-frame stops resulting from phase 0 translation
$cat short.badstart.outfasta >mrna-0948210 InFrameStop=true YAGKCRYSHIRAQCLYQCHYKDQERE.RIQIHSRCRYNDIASNFCFNQHNRAH.AQCHSDRTLREEA.TL PNVSSYLFTSSLR.YQTIYC.FQ..NTIIRGSSPFIHIHMPSINNECIRITS.SCQSHFNIS.SLDPLSA YTTRCFHGTITTSLQIPLFSKLSR.FQRDLRRDQLRPYQLLLDHQGHPKYRLSKIHG.NQEILHKIQRSI NGRYSRAR.WRTIP.QGQHPSSHHSSWCRPNYAITPQIHLPIHDLRELPS.KRLCLP.GRRSEQRCRTQS IRRQNLQRLLLQTLPEVHRDP.SLIASHTDPQNHYRSKYRQNISKICGSNEALHQRSVVCYNSIQPPRHL YTSG.SIN.ALPCRYFEIRTTACRQGERRLQLQAEPFRPPGHHYQAGTRANGQVKSIGRLPTFRWKNF.H RIGEAFRGRITYNGQLERCDLEPQIPWRCSSKTILERSKSGRRTICY.HLLPLRCTGLHGRNRWKTRQEV LRNSNTHRKIHKRCFVLLRRCFVR.IREGRVAQTENEEHKDLRRCNKHFDAKTVLGSDKRFNFNSV
adding the -H argument removes the inframestop warning, and corrects the protein output, but does not correct the gff coordinates or phase:
Command line was: ./gffread -g genome.fasta -H -P -v -E short.badstart.gff .. loaded 1 genomic features from short.badstart.gff ##gff-version 3 # gffread v0.12.7 # ./gffread -g genome.fasta -H -P -v -E short.badstart.gff chr_I AUGUSTUS transcript 7771137 7772805 . - . ID=mrna-0948210 chr_I AUGUSTUS CDS 7771137 7772805 . - 0 Parent=mrna-0948210
-y output is correct
$cat short.badstart.H.out.fasta >mrna-0948210 MQASADIRTLERNVFINAITKIKSENKGFKFILVVDTMTLPAISALISTTELIEHNVILTELYEKKREPF PMYQAIYLLHPCVDIKQFIADFSDKTPLYAAAHLLFTYTCLPSTMSALESHPEVVNHILTFLDLWIHYQP TQPGVFMVPSPRAYKFLYSPNSPVDFNAISDEINYGLISFFLTIKAIPSIAYPKSMDKIKRFCTKFNDQL MGVTRELDDGGRSLNKDNTLLLIIPRGADPITPLLHRFTYQSMIYENFPVENDCVYLKEGDPNSVVALNP YEDKIFNDYCYKHFQKFIEIRDLSSQVTQTHKIITDPNIDKTSQKYAEAMKRYIRDQSYATTVSNHLDIC IRLDEALTKRCLADISKYEQQLAAKEKDGSSYKPNLSDLQDIIIKQGPEPMDKLRALAVYQLSGGKISDT ELERLLEAGSLTTGNWKDAILNLKYLGDAAPRQSLREVNPVEEPFVTDIYYPYAAQACMDAIDGKLDKKC FEIPTHTGRYTNVVLFFFGGVSFAELEKVELLRQRMKSTKIYVGATNTLTPRQFLAQTRGLTSTQ
I would think the -H corrected gff output should look like:
chr_I AUGUSTUS transcript 7771137 7772804 . - . ID=mrna-0948210 chr_I AUGUSTUS CDS 7771137 7772804 . - 0 Parent=mrna-0948210
or, if only the phase is corrected
chr_I AUGUSTUS transcript 7771137 7772805 . - . ID=mrna-0948210 chr_I AUGUSTUS CDS 7771137 7772805 . - 1 Parent=mrna-0948210
I know there is a previous bugfix on this #65 , and it was marked a closed issue, but I still see it, using 0.12.7
Here's the test input, a transcript with an incorrect translation start coordinate (off by 1)
gffread correctly recognizes it as having in frame stop codons (at phase 0).
-y output has the numerous in-frame stops resulting from phase 0 translation
adding the -H argument removes the inframestop warning, and corrects the protein output, but does not correct the gff coordinates or phase:
-y output is correct
I would think the -H corrected gff output should look like:
or, if only the phase is corrected