@lgeistlinger not sure this belongs here but also not sure for now where else to put it. We need to test some (dis)similarity options before implementing them in the wiki, so I've implemented calcPairwiseOverlaps() to return a dataframe (with Jaccard Index, overlap, and a few other things) and makeDist() to return a distance matrix for 1-Jaccard or 1-overlap. Merge or try them out first, before we ask @tosfos to implement something. One thing I already found is that we get a lot of Jaccard Indices of exactly 1, with two signatures that are both length 1 and the same taxon.
Not sure if we should give any weight to descendants when calculating (dis)similarities, not sure offhand how to do that.
@lgeistlinger not sure this belongs here but also not sure for now where else to put it. We need to test some (dis)similarity options before implementing them in the wiki, so I've implemented calcPairwiseOverlaps() to return a dataframe (with Jaccard Index, overlap, and a few other things) and makeDist() to return a distance matrix for 1-Jaccard or 1-overlap. Merge or try them out first, before we ask @tosfos to implement something. One thing I already found is that we get a lot of Jaccard Indices of exactly 1, with two signatures that are both length 1 and the same taxon.
Not sure if we should give any weight to descendants when calculating (dis)similarities, not sure offhand how to do that.