thierrygosselin / radiator

RADseq Data Exploration, Manipulation and Visualization using R
https://thierrygosselin.github.io/radiator/
GNU General Public License v3.0
59 stars 23 forks source link

How to interpret the genetic distances calculated by detect_duplicate_genomes? #94

Closed clrbtl closed 4 years ago

clrbtl commented 4 years ago

Hi Thierry,

I was wondering for the individuals.pairwise.dist.tsv file created while running filter_rad at the duplicate genomes step, what was the difference between distance and distance relative (notably the distance is relative to what ?) ? I didn't quite comprehend the difference between the two and the way the each distance is estimated. Is it through an allele count, propotion of alleles different between 2 individuals ?

I used the file to extract the distances for each of my strata and I did a violin plot for each so as to get an idea of the distribution (I'm working on a clonal species).

Thanks for the help!

thierrygosselin commented 4 years ago

Hi, filter_rad is using the radiator individual functions and they are all described separately. For your question, the function documentation is found here.

With a tidy dataset by default it's the Manhattan distance, with a GDS object (this is usually what's used inside filter_rad) the function is using SNPRelate::snpgdsIBS.

I re-read the documentation and realize that relative distance was not explained. I'll update the doc. Basically, It's the reported distance calculated based on the maximum observed in the entire dataset, not strata.

For your clonal species, I'm intrigued, could you send the Manhattan plot by email ?

Best Thierry

thierrygosselin commented 4 years ago

re-open the issue if I have not properly handled your question