Closed rbutleriii closed 8 years ago
@rbutleriii Thank you for sending this. It is indeed concerning. I am wondering if it something wrong with my parsing of the aragorn output, or my choice of options. Would you be able to email me your contig file so I can test?
torsten.seemann@gmail.com
I've had similar problems with Aragorn (independantly from Prokka). I think it's definitvely the aragorn options and not the parsing of the output.
I have that problem with Aragorn only with highly fragmented genomes and only when Aragorn scans for tRNAs and tmRNAs simultaneously (default when using the standalone tool without arguments or with the "-m" argument", NOT the default when using the Webservice)
No idea what is actually behind this problem, but when I run aragorn with the "-t" argument (but without the "-m" argument) to only scan for tRNAs I seem to get all of them also for fragmented genomes.
I also already contacted the Aragorn developers on this. They had a look into it but apparently could not reproduce the results well (I guess they were not using fragmented genomes), So I would be interested to know if you are using prokka & Aragorn on highly fragmented genomes as well.
I checked the browser based version and found that this version only checks for tRNAs ("-t" argument) and not for tmRNAs by default so this may explain why you get all tRNAs with the web-interface version but not with the standalone tool.
@jvollme Thank you for this thorough analysis! I think having tmRNA is less important that ensuring we don't miss any tRNA. perhaps I could scan for them separately. I will think about this.
@rbutleriii I've gone through 100s of contig sets from various genomes and I can't reproduce this behaviour. Can you send me a FASTA file that does this? I can then examine the aragorn C code and track it down.
Thanks for sending your data files. I did a combinatorial experiment with aragorn. Circular mode (old Prokka) and linear mode (more recent Prokka), and the choice of default (tRNA + tmRNA) and either just tRNA or tmRNA).
UW_1k.fasta : >end 48 sequences 51 tRNA genes 0 tmRNA genes
UW_1k.fasta -t : >end 48 sequences 65 tRNA genes
UW_1k.fasta -m : >end 48 sequences 0 tmRNA genes
UW_1k.fasta -l : >end 48 sequences 65 tRNA genes 0 tmRNA genes
UW_1k.fasta -l -t : >end 48 sequences 65 tRNA genes
UW_1k.fasta -l -m : >end 48 sequences 0 tmRNA genes
You can see that with -l
linear mode (as Prokka has used for a while now) you get the correct result. It's only in circular mode (which you shouldn't use with contigs!) that problems occur.
UW_5k.fasta : >end 36 sequences 59 tRNA genes 0 tmRNA genes
UW_5k.fasta -t : >end 36 sequences 59 tRNA genes
UW_5k.fasta -m : >end 36 sequences 0 tmRNA genes
UW_5k.fasta -l : >end 36 sequences 59 tRNA genes 0 tmRNA genes
UW_5k.fasta -l -t : >end 36 sequences 59 tRNA genes
UW_5k.fasta -l -m : >end 36 sequences 0 tmRNA genes
For 5k where is no problem because you don't have lots of small fragment contigs trying to force a tRNA match to wrap around from one end of the contig to another end.
SUMMARY: I think it is the lack of -l
mode which is the problem. This was fixed since the release of Prokka 1.11 I think, but it is in the git version. I am struggling to find time to make a 1.12 but before Xmas would be my goal.
When I double check the tRNA annotations with tRNAscan-SE, many of my assemblies (8) are lacking 10-30 tRNAs. I tried using the included Aragorn binary, and also went to http://mbioserv2.mbioekol.lu.se/ARAGORN/ and installed that version (both came up short) with both prokka 1.10 and 1.11. However, when I run the same fasta file through the browser-based Aragorn I get all of them. I tried contacting the Aragorn developer but the email bounced back.