Closed hdetering closed 6 years ago
Hi Harry, I looked at your data and it seems there's a bug in treeCl. The Spectral object is supposed to automatically adjust a density parameter to fit your distance matrix, to improve clustering and protect against divide-by-zero errors. With your data this automatic adjustment isn't working - looks like the wrong binary search routine is being called.
There is a workaround. If you construct the Spectral object like this:
spclust = treeCl.Spectral(scale_option=treeCl.clustering.options.LOCAL_SCALE_MANUAL,
manual_scale=245)
you get a working object. If you're wondering where the 245 comes from, it's the result of calling the correct binary search routine.
Hope this helps for now. I'll work on making a proper fix for the next release.
Thanks for your reply, kgori!
Your workaround works for the distance matrix I uploaded in the initial post but seems to be specific to that data. For other matrices (e.g. example1.dm.pkl.zip) I still get the same error. But you probably knew this already.
Since I have some more data sets to run I will go with the MDS clustering which works without problems (according to your paper they perform equally well).
Anyway, thanks a lot for your effort! -- Harry
Hi kgori,
I also get this same error with my dataset (using geodesic distances and Spectral clustering). This dataset has a lot of missing data and a few potentially very incongruent tree topologies.
I have also found that when I do MDS clustering on this dataset I get a similar result to your manuscript's Supplementary Figure 12, ie two groups strongly separated by PC1 in a vertical line. This clustering found by MDS is not consistent with the partitions found using Ward's method (I cannot test Spectral since it gives the same assert error as Harry).
Any suggestions?
best, Alex
Hi Dr. Gori,
I also get this same assertion error with my data, using geodesic distances and spectral clustering. I have ~4000 gene trees, so unfortunately I cannot attached a zipped pickle of my distance matrix here.
Cheers, Rohan
Hi kgoryi,
when initializing a treeCl.Spectral object, I get an assertion error:
It seems that the rescaling of the distance matrix (which contains zeros) is not producing the expected result. I'll attach you a pickle'd version of the distance matrix used: example.dm.pkl.zip Distances were calculated using the 'geo' method.
A minimal working example to reproduce the error would be:
What seems to be the problem here? Are the input trees to similar (how could this be bad)? How can the problem be avoided?
Thanks a lot and cheers -- Harry