nickjcroucher / gubbins

Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins
http://nickjcroucher.github.io/gubbins/
GNU General Public License v2.0
159 stars 49 forks source link

Why are some recombination events blue but at the same loci of red ones in the same pattern in 2 distant lineages? #382

Closed Isoris closed 10 months ago

Isoris commented 10 months ago

Hello,

I'm investigating a dataset of Streptococcus iniae species comprising around 80 bacteria, split into two distinct lineages (79 in one lineage and 1 in the other). Despite being isolated from various locations, hosts, and times, the bacteria within the main lineage appear highly clonal. This is consistent with the bacteria separating into lineages thousands or even millions of years ago.

The one isolate in the distinct lineage is highly divergent from the others, with an unusually long branch length and around 8,000 SNPs, compared to variations of mostly 10 to 200 SNPs throughout the main lineage. Interestingly, this isolate was also flagged as unsuitable for gubbins analysis due to being too distant, as determined by a chi-square test.

What intrigues me is the pattern of recombination. The main lineage shows red bars in the gubbins analysis, indicating shared recombination events. The divergent isolate, on the other hand, shows blue bars at the exact same loci as the red bars of the main lineage.

Screenshot 2023-08-25 at 22-34-03 phandango

  1. Could these shared loci be indicative of recombination hotspots, or is it possible that the gubbins algorithm is interpreting these as standalone recombination events in the divergent isolate because it's too mutated? Might this be considered as evidence of a common ancestor, parallel evolution, or something else altogether?

  2. I'd appreciate any insights or alternative hypotheses to understand why this separate bacteria of the same species would show blue bars at the exact loci of the red bars in the other lineage.

  3. Your thoughts on why this isolate was flagged as unsuitable would also be helpful. I mean I understand it's distant, I just wonder if you ever came across a dataset like this with the exact same recombination events no matter the country, host, or date of sampling? and

  4. why one single bacteria would be totally different and far away despite being sampled like the others at around the same period and similar hosts? is it an assembly or sequencing error?

  5. Also do I have to exclude repetitive regions like IS elements by hiding them before gubbins? i noticed recombination in an IS3 elements but I'm thinking that it could be due to the poor mapping causing a false positive.

I am not asking you to do the analysis for me. I want to troubleshoot my dataset understand if those results make sense or not.

Thank you!

P.S: Also why some isolates or nodes have 0 in r/m and rho/theta and some have high values? is it normal ?

nickjcroucher commented 10 months ago

This is an annoying artefact of phylogenetic reconstruction.

The long branch on which the phylogeny is rooted is artificially split in two by the root. Both branches are of equal length. Therefore a simple reconstruction randomly distributes the base substitutions on each half of this root, resulting in recombinations being inferred on both halves of the root branch - even though there is only one recombination occurring, the base substitutions get split onto two branches, making it look like the same loci have been recombined twice.

I thought I had fixed this in the more recent versions of Gubbins - which version are you using? It probably makes sense to specify the more distantly-related isolate as an outgroup in the analysis.

Isoris commented 10 months ago

Hi nick!

When I type run_gubbins.py --version it says 3.3.0, I have downloaded the source version from git and use it with the conda environment.

Do you think that I should remove this Isolate from the analysis? I get this when I do so:

homologous_recombination_QMA0141_2

Isoris commented 10 months ago

Also why when I use the tree in Itol I get the same topology but the branches are in different order? Does gubbins change the branch order by flipping them to make it look more clean on the matrix ?

Screenshot 2023-08-26 at 05-30-13 iTOL Interactive Tree Of Life

So according to you, based on previous informations I would choose to specify QMA0140 as an outgroup?

Thank you for your answer.

nickjcroucher commented 10 months ago

iTOL has just ladderised the tree - I would try using that isolate as an outgroup and see if that clarifies the output.

Isoris commented 10 months ago

Ok! I got it! Here's the output, it looks better right?

Screenshot 2023-08-26 at 06-40-57 phandango

Thank you very much.