mlin / PhyloCSF

Phylogenetic analysis of multi-species genome sequence alignments to identify conserved protein-coding regions
http://compbio.mit.edu/PhyloCSF
GNU Affero General Public License v3.0
63 stars 24 forks source link

Mitochondrial genomes #31

Open martinChCh opened 1 year ago

martinChCh commented 1 year ago

Is it possible to use PhyloCSF on mitochondrial genomes, i.e. with an alternative genetic code?

iljungr commented 1 year ago

PhyloCSF does not make explicit use of the genetic code, but the information is implicit in the data used to train the coding and non-coding models. In addition to having a different genetic code, mitochondria have different codon and substitution frequencies, so in principle separately training on mitochondrial alignments is necessary. This might be challenging because the short length, particularly of the non-coding region, might not provide enough training data.

However, in practice using the standard PhyloCSF matrices works pretty well. The attached image shows the PhyloCSF browser tracks in the UCSC Genome Browser for the human mitochondrial chromosome using the 58 placental mammal alignment. As you can see, there is a strong signal in the correct reading frame and strand localized to the coding genes. hgt_genome_28fb2_ea3e10.pdf

On Jan 17, 2023, at 3:14 AM, martinChCh @.***> wrote:

Is it possible to use PhyloCSF on mitochondrial genomes, i.e. with an alternative genetic code?

— Reply to this email directly, view it on GitHub https://github.com/mlin/PhyloCSF/issues/31, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ4HUOLGPTEOHEOK63TMJLWSZIH7ANCNFSM6AAAAAAT5RROSE. You are receiving this because you are subscribed to this thread.

martinChCh commented 1 year ago

This is quite impressive. So here is an idea I have been thinking about for a while ‒ the current paradigm is that there are no nuclear genes that would be translated in the mitochondrion. Correspondingly, the nuclear genome is searched for CDSs using the nuclear genetic code. However, what if there are CDSs present that are CDSs only under mitochondrial genetic code? Could PhyloCSF be used to search the nuclear genome for candidate mitochondrial CDSs that are conserved accross species?

iljungr commented 1 year ago

I would guess that such transcripts would look like lncRNAs, transcribed and spliced in the nucleus before heading to the mitochondrion for translation. I'd expect that most of their codons would get a PhyloCSF signal except for the UGA codons, since they are rarely coding in PhyloCSF's training data. They could be found by looking for sequences in lncRNAs that would be open reading frames in mitochondria and running PhyloCSF on their alignments, excluding columns with UGAs.

On Feb 1, 2023, at 8:26 AM, martinChCh @.***> wrote:

This is quite impressive. So here is an idea I have been thinking about for a while ‒ the current paradigm is that there are no nuclear genes that would be translated in the mitochondrion. Correspondingly, the nuclear genome is searched for CDSs using the nuclear genetic code. However, what if there are CDSs present that are CDSs only under mitochondrial genetic code? Could PhyloCSF be used to search the nuclear genome for candidate mitochondrial CDSs that are conserved accross species?

— Reply to this email directly, view it on GitHub https://github.com/mlin/PhyloCSF/issues/31#issuecomment-1412056820, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ4HUK4C2IDKZ3U5RI2MD3WVJP7VANCNFSM6AAAAAAT5RROSE. You are receiving this because you commented.