jorvis / biocode

Bioinformatics code libraries and scripts
MIT License
504 stars 247 forks source link

gff/write_fasta_from_gff.pl mistranslates some Augustus -> GFF3 output #10

Open jorvis opened 10 years ago

jorvis commented 10 years ago

There are some classes of Augustus output genes that are a bit puzzling, such as the one below. The transcript starts with an intron and then the following (first) CDS fragment has a non-zero phase value, which goes against the GFF specification (in my understanding of it.) This needs to be checked and corrected for.

# start gene g856
NODE_4651_length_1024_cov_24.708984     AUGUSTUS        gene    1       1068    0.91    +       .       g856
NODE_4651_length_1024_cov_24.708984     AUGUSTUS        transcript      1       1068    0.91    +       .       g856.t1
NODE_4651_length_1024_cov_24.708984     AUGUSTUS        intron  1       106     0.91    +       .       transcript_id "g856.t1"; gene_id "g856";
NODE_4651_length_1024_cov_24.708984     AUGUSTUS        CDS     107     1068    0.91    +       2       transcript_id "g856.t1"; gene_id "g856";
NODE_4651_length_1024_cov_24.708984     AUGUSTUS        stop_codon      1066    1068    .       +       0       transcript_id "g856.t1"; gene_id "g856";
# protein sequence = [TQTSTAQSQAMDAESNTSTDPKNGDSQSALVQQLCQTVERLTNELSQARHEIQHLQERINTINSTTTPLSPLEFPTLQ
# ESQIRSTAFPDAPWNNPSKIQALKQPSIQRSEQRRMQREATAARFFQPPSENQGFKYLYIPTKARIPVGTIRTTFRKLGVNNARLLDIHYPARNTVAV
# LIHNDYEAEFVELLTRKNVHIRTDFTPFNGKILADPKYTSLPQEERDSIAIRLQKLRLSRALDYIRSPVKYAVARYFLDQEWISRTRYEEIMADRYNT
# KLTSIFDQTSQQQTTQDTFNDVSDNDLNMEAIDELPTGTSSPALH]
# Evidence for and against this transcript:
# % of transcript supported by hints (any source): 0
# CDS exons: 0/1
# CDS introns: 0/1
#5'UTR exons and introns: 0/0
#3'UTR exons and introns: 0/0
# hint groups fully obeyed: 0
# incompatible hint groups: 0
# end gene g856