jorvis / biocode

Bioinformatics code libraries and scripts
MIT License
504 stars 247 forks source link

Incorrect parent features from convert_tRNAScanSE_to_gff3.pl #70

Open 14zac2 opened 3 years ago

14zac2 commented 3 years ago

Hi there,

First off, thank you so much for making this script! I'm trying to incorporate filtered tRNAScanSE results into my genome annotation. However, the results from this script are giving me some issues. It looks like the only parent features that exist do not point to the correct attribute. The biggest problem is with multi-exon tRNAs. Here's an example of a few lines from the resulting gff file:

WCK01_AAF20200214_F8-ctg250     tRNAScan-SE     gene    905646  905751  75.6    +       .       ID=tRNA-Leu39_gene
WCK01_AAF20200214_F8-ctg250     tRNAScan-SE     tRNA    905646  905751  75.6    +       .       ID=tRNA-Leu39_tRNA;Name=tRNA-Leu;anticodon=CAA
WCK01_AAF20200214_F8-ctg250     tRNAScan-SE     exon    905646  905683  75.6    +       .       ID=tRNA-Leu39_exon;Note=contains predicted Intron
WCK01_AAF20200214_F8-ctg250     tRNAScan-SE     exon    905706  905751  75.6    +       .       ID=tRNA-Leu39_exon;Parent=tRNA-Leu39_exon

As you can see, the only parent attribute belongs to the second exon and it points toward the exon IDs of both exons which are identical. Do you think you might be able to modify the script so that exon features have unique ID and that the parents point towards the tRNA?

Thanks so much!

jorvis commented 3 years ago

Thanks for the report - do you think you can attach at least a partial test input file?

14zac2 commented 3 years ago

Of course! My trna file is very small, as it's the output of EukHighConfidenceFilter, the internal script of tRNAScanSE that filters for high confidence RNAs. I'm thinking now that perhaps it's the existing columns that are messing up the results of the gff file, as EukHighConfidenceFilter requires that certain extra columns are included in the regular tRNAScanSE output. I've attached the input and the resulting gff; they had to have the .txt suffix to attach properly. take2_filtered.txt take2_filtered_gff.txt

Thanks again!