jameshadfield / phandango

an interactive viewer for populations of bacterial genomes linked by a phylogeny
http://phandango.net
MIT License
113 stars 27 forks source link

Graph of recombination events algorithm #142

Open valery-shap opened 2 years ago

valery-shap commented 2 years ago

Hello,

I have the data from gubbins gff file with coordinated (Start and Stop, and snp_count), I've downloaded this data and had the graph with peaks at the bottom. Could you please explain what is y-value? and how this value was calculated? I've tried to do the same distribution and had different result)

Best regards, Valery

jameshadfield commented 2 years ago

Hi Valery, the y-axis is the number of recombination events observed which cover that particular position in the genome. Note that an event may involve many strains but only be counted once (the red blocks indicate events which are inferred on ancestral nodes and therefore involve more than one tip).

valery-shap commented 2 years ago

sorry, one more clarifying question. by the number of recombination events Do you mean exactly the number of strains that were detected in the regions of recombination on that position? But it couldn't be so, because I have near 300 isolates and there were positions where all isolates were involved but the max value on the graph is near 100. Maybe is there some window within it's counting? or relative value? I supposed that y-axis is the mean number of snps, that was counted:

  1. count snps density within the region of recombination so every position within this region has this density
  2. sum this densities between the isolates that had this region of recombination
  3. get the sum density for every position
jameshadfield commented 2 years ago

Do you mean exactly the number of strains that were detected in the regions of recombination on that position?

No, the number of recombination events. If a recombination event is inferred to happen at an ancestral node then one event will involve multiple strains. The Gubbins paper details how these are inferred, phandango just visualises them.

I supposed that y-axis is the mean number of snps, that was counted:

No. It is the number of recombination events.

valery-shap commented 2 years ago

Ok, thank you, I'll try, it's special terminology from gubbins. So if we return to the output gff file and the output of Phandango graph. Does it count exactly number of rows where the value from the last column "node=" is the ancestral node?