Closed jolespin closed 4 years ago
@jolespin what is the use case? do you have data from something not in the offical genetic code list?
I was thinking about the situation where researchers suspected that Gracillibacteria used one of their stop codons as an amino acid during translation and created a custom table. I believe that was the finding that introduced translation table 25 but I may be mistaken. For my data, there was a situation in which I thought one of my draft genomes had a similar property and I wanted to call the ORFs using a custom genetic code. If it would be easy to implement, I think it could be really beneficial for users studying microbial "dark matter". I'm not sure what the best input format would be but, at first glance, a tab delimited table could be easy to generate. However, not sure how easy that would be to implement in the actual gene calling using custom codes in the backend though.
Maybe, I'll think about it, if it's something multiple people would want. Right now, you can still find most everything since there are enough codes that have different combinations of stop codons. It would only really be an issue for truly weird tables that use non-TAA/TGA/TAG stops. Even 25 would still "work" with 4; it would just mistranslate a codon.
Identification of reassigned contigs in assembled metagenome data Prodigal (19) software has been modified to add one non-standard genetic code, in which TAA is reassigned to Gln.
http://science.sciencemag.org/content/sci/suppl/2014/05/21/344.6186.909.DC1/Ivanova.SM.pdf
This JGI paper made a hack to incorporate the ochre codon for dark matter microbes.
I've decided to support this in the Go version as a differential from an existing genetic code.
@hyattpd will the Go version just use the machine readable genetic code tables?
This is in ASN1 text format: https://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt
It has the latest tables:
--
-- Version 4.5
-- Added Cephalodiscidae mitochondrial genetic code 33
--
Ideally the user could provide a custom one in that format:
name "Mars Rover Microbe" ,
id 42 ,
ncbieaa "FFLLSSSSYYYYCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG",
sncbieaa "--------------*--------------------M----------------------------"
-- Base1 TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
-- Base2 TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
-- Base3 TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG
That's a good idea.
For performance reasons, will probably hard code 1, 4, 11, and 25, and only read genetic codes from this file if the user specifies a weird one (or via a flag that says use the file).
@hyattpd on a related note: will the new Go version also give users the option to build their own models? Excited to hear there's a new version on the horizon!
Sorry I think I missed something. What is the “Go” version exactly?
On Sep 30, 2019, at 4:24 PM, Marcel Huntemann notifications@github.com wrote:
@hyattpd on a related note: will the new Go version also give users the option to build their own models? Excited to hear there's a new version on the horizon!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@jolespin The Go programming language (Golang). I'm writing the next version in Go.
@mhuntemann Not sure what's meant by "build your own models"...
Hi @hyattpd, maybe I am remembering it wrong, but I thought that in the meta mode Prodigal uses models that are built from publicly available genomes (and their genes) at that time (when you wrote the last version). If that's the case, I assume those models get updated for the new version, but there are probably people (including us) that have databases with genomes that are not publicly available yet. It would be nice if there was a way to create models from a specific set of currently private genomes that I am interested in and add them to the set of models (the new) Prodigal uses in meta mode. Or am I remembering it incorrectly and it works completely different? Thanks, Marcel
Ah ok, I get it. Yeah, I think I can support that.
Awesome! Thanks for really listening to the community. :-) Really looking forward to the new version. Do you have a rough release roadmap yet?
Btw.: if you need any beta tester or some kind of feedback on any new version, I am happy to help with that. We are running your 2.63. version on several hundred genomes and metagenomes every month. So there's enough data to encounter edge cases I'd assume. :-)
I imagine the way I will implement this is just to have single or multiple modes, and the user just passes in a list of files they've made themselves in addition or in place of the preset models. I can also give the preset models shorthand ids and provide a complete list, and allow the user to specify whichever of those they want.
@hyattpd "For performance reasons" you will hard code tables?
Modern compilers are amazingly good at optimizing, especially if you use const
properly.
I would write the first version completely generically, then optimize later.
In future versions, are there plans to set an option to input custom translation tables?
Possibly in the format of:
https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi