Closed wleoncio closed 10 months ago
Looks like poorman managed to get a fixed version on CRAN, so this issue is no longer urgent for that reason. However, I am worried that the calculation of the Jaccard index might be wrong, as it is taking into account the entire distance matrix. In other words, each distance pair is being counted twice (once in the lower triangle, once in the upper triangle), and the diagonals (a vector of zeros) is also part of the mean.
If I only use one of the triangles, then the output of both the solutions below match:
philentropy::distance(t(d), method = "jaccard") # current solution
vegan::vegdist(d, method = "jaccard") # new solution proposed by Salim
Another question regarding the calculation is whether the returned value should be mean(1 - jac)
or mean(jac)
, since vegdist()
seems to already return the Jaccard distance, which is 1 - the Jaccard coefficient (a.k.a. the Jaccard index).
From @SystemsBiologist:
Yes, we still need 1 to be subtracted from the output!
I agree with you, it is better to use the new method!
So the next step is to release a new version using vegdist()
, which is a bugfix for the calculation of the Jaccard index.
Package
poorman
is schedule for removal from CRAN, which affects thephilentropy
package and, by extension,DIscBIO
. We usephilentropy
for calcualtion of the Jaccard distances, so the functionality can be rewritten.