ocbe-uio / DIscBIO

A user-friendly R pipeline for biomarker discovery in single-cell transcriptomics
Other
12 stars 5 forks source link

Remove philentropy dependency #44

Closed wleoncio closed 10 months ago

wleoncio commented 11 months ago

Package poorman is schedule for removal from CRAN, which affects the philentropy package and, by extension, DIscBIO. We use philentropy for calcualtion of the Jaccard distances, so the functionality can be rewritten.

wleoncio commented 11 months ago

Looks like poorman managed to get a fixed version on CRAN, so this issue is no longer urgent for that reason. However, I am worried that the calculation of the Jaccard index might be wrong, as it is taking into account the entire distance matrix. In other words, each distance pair is being counted twice (once in the lower triangle, once in the upper triangle), and the diagonals (a vector of zeros) is also part of the mean.

If I only use one of the triangles, then the output of both the solutions below match:

philentropy::distance(t(d), method = "jaccard")  # current solution
vegan::vegdist(d, method = "jaccard")  # new solution proposed by Salim
wleoncio commented 11 months ago

Another question regarding the calculation is whether the returned value should be mean(1 - jac) or mean(jac), since vegdist() seems to already return the Jaccard distance, which is 1 - the Jaccard coefficient (a.k.a. the Jaccard index).

wleoncio commented 11 months ago

From @SystemsBiologist:

Yes, we still need 1 to be subtracted from the output!

I agree with you, it is better to use the new method!

So the next step is to release a new version using vegdist(), which is a bugfix for the calculation of the Jaccard index.