simonhmartin / genomics_general

General tools for genomic analyses.
343 stars 93 forks source link

Why is the output matrix of distMat.py not a diagonal zero matrix? #118

Closed NCtraveling closed 3 months ago

NCtraveling commented 3 months ago

Hi Simon,

As the output of distMat.py --windType cat is a matrix at the dimension of n*n (n represents the number of individuals), distances at zero of self-comparisons are expected. However, I got a matrix with some varied values in its diagonal, even these values were not minimal in the whole matrix.

I am confused about the value in the diagonal of the matrix. How should I understand the non-zero self-distances? I suppose this is because the phased/unphased status of each site differs among all the individuals. Am I right?

Again, how should the non-zero value of self-distance be explained?

NCtraveling commented 3 months ago

Oh, I just noticed the sentence "For ploidy > 1, the pairwise distance will be the average distance among all haplotypes in the two individuals". So the diagonal values of the output matrix are distances of two haplotypes of one individual.