hyattpd / Prodigal

Prodigal Gene Prediction Software
GNU General Public License v3.0
430 stars 85 forks source link

Metagenome samples with multiple translation tables #7

Closed tseemann closed 9 years ago

tseemann commented 9 years ago

Doug

I wanted to check if you are developing code to handle metagenomic samples where there are a mixture of organisms with different translation tables?

Torsten

hyattpd commented 9 years ago

Prodigal currently does this with genetic codes 4 and 11 (it uses preset training files that include a couple of Mycoplasma). In v3.0, I will go through Genbank again and redo the clustering from the 2012 paper, likely picking up at least one genetic code 25 representative. (Very hard to distinguish 4 and 25, though). As far as other more exotic codes, I am not sure if I will add any representatives.

JGI recently did a Science paper using a modified version of Prodigal to find weird genetic codes: http://www.sciencemag.org/content/344/6186/909.

They found viruses that translated the other two stop codons, but no bacteria/archaea. (Doesn't mean they don't exist, however). It's difficult to build a viral training file, though, since those genomes are so small, but I may give it some thought if it's a useful feature for viral metagenomics.

hyattpd commented 9 years ago

Should point out that 3.0 is changing the training files a lot, so redoing the metagenomic stuff will be the last thing I do (since I need to be sure I'm happy with the training file structure before recreating a bunch of them).