pneuvial / adjclust

Adjacency-constrained hierarchical clustering of a similarity matrix
https://pneuvial.github.io/adjclust/
16 stars 8 forks source link

test_hicClust is slow #25

Closed pneuvial closed 6 years ago

pneuvial commented 6 years ago

It typically takes >30s to run.

pneuvial commented 6 years ago

To speed it up I've tried binning the data:

      hic_bin <- HiTC::binningC(hic_imr90_40_XX, binsize = 2e5)  ## bin (to speed up this test)
      fit1 <- hicClust(hic_bin)
      mat <- HiTC::intdata(hic_bin) 

But then the test fails because the results of hicClust applied to mat and to a text file generated from mat differ. This is due to the presence of empty bins in mat:

> sum(rowSums(as.matrix(mat))==0)
[1] 15

I don't understand how we can get empty bins after binning if we did not have empty bins before binning, but this it what seems to happen here.

pneuvial commented 6 years ago

Subsetting the original object instead of binning (as suggested by @tuxette) addresses this issue.

(I still don't understand why my binning solution does not work though.)