waldronlab / bugsigdbr

R-side access to published microbial signatures from BugSigDB
https://bioconductor.org/packages/bugsigdbr
GNU General Public License v3.0
4 stars 2 forks source link

Calc pairwise overlaps #5

Closed lwaldron closed 3 years ago

lwaldron commented 3 years ago

@lgeistlinger not sure this belongs here but also not sure for now where else to put it. We need to test some (dis)similarity options before implementing them in the wiki, so I've implemented calcPairwiseOverlaps() to return a dataframe (with Jaccard Index, overlap, and a few other things) and makeDist() to return a distance matrix for 1-Jaccard or 1-overlap. Merge or try them out first, before we ask @tosfos to implement something. One thing I already found is that we get a lot of Jaccard Indices of exactly 1, with two signatures that are both length 1 and the same taxon.

Not sure if we should give any weight to descendants when calculating (dis)similarities, not sure offhand how to do that.

lgeistlinger commented 3 years ago

I think those would fit nicely into https://github.com/waldronlab/BugSigDBStats - would you mind open the pull request there?

lwaldron commented 3 years ago

Done, see https://github.com/waldronlab/BugSigDBStats/pull/1

lgeistlinger commented 3 years ago

Thanks!