haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
5.97k stars 1.13k forks source link

Non-monotonic cluster tree -- the linkage is probably not appropriate! #758

Closed den-gr closed 4 months ago

den-gr commented 5 months ago

Describe the bug I use Hierarchical Clustering with "upgmc" method (should be centroid based) but I get an exception. In your repository I find code that lunch this exeption:

public int[] partition(double h) {
        for (int i = 0; i < height.length - 1; i++) {
            if (height[i] > height[i + 1]) {
                throw new IllegalStateException("Non-monotonic cluster tree -- the linkage is probably not appropriate!");
            }
        }
    ...    

With debugger I get internal clustering data before raising of the exception: Heights

[4.446954675256884, 5.254708313255595, 5.681905824852277, 5.21738511928247, 6.413680963168845, 7.090507135802847, 7.790216021890831, 8.131854173995553, 10.23174651659164, 11.306003791823292, 16.997071855474292, 18.867959837970442, 35.04605647171662, 972.1894748453102]

Merges:

0 = {int[2]@12560} [0, 3]
1 = {int[2]@12561} [5, 10]
2 = {int[2]@12562} [2, 12]
3 = {int[2]@12563} [7, 17]
4 = {int[2]@12564} [4, 9]
5 = {int[2]@12565} [11, 15]
6 = {int[2]@12566} [13, 16]
7 = {int[2]@12567} [1, 8]
8 = {int[2]@12568} [6, 22]
9 = {int[2]@12569} [19, 20]
10 = {int[2]@12570} [18, 21]
11 = {int[2]@12571} [23, 24]
12 = {int[2]@12572} [25, 26]
13 = {int[2]@12573} [14, 27]

Expected behavior I do not expect any exception.

Actual behavior Exception: java.lang.IllegalStateException: Non-monotonic cluster tree -- the linkage is probably not appropriate!

Code snippet

val clusteringLimit = 2.0812080199999996
val c = hclust(data, "upgmc")
c.partition(clusteringLimit)

Input data

0 = {double[2]@12577} [507.4412276587268, -724.3813542113928]
1 = {double[2]@12578} [511.4241329313626, -708.7316909569605]
2 = {double[2]@12579} [520.1643016699769, -768.7583900169105]
3 = {double[2]@12580} [511.21481233082994, -726.7341113769867]
4 = {double[2]@12581} [516.5322237351805, -739.9902131733364]
5 = {double[2]@12582} [518.1074502763483, -747.2831975139684]
6 = {double[2]@12583} [500.1664221644433, -713.8610022572665]
7 = {double[2]@12584} [521.49507359286, -762.8123460664691]
8 = {double[2]@12585} [509.22752433984266, -716.5612476346602]
9 = {double[2]@12586} [512.2826763360573, -735.1863966987556]
10 = {double[2]@12587} [513.0059477345203, -748.5428136341582]
11 = {double[2]@12588} [515.5540374804062, -728.9506662360001]
12 = {double[2]@12589} [525.5473522287389, -766.9399233920493]
13 = {double[2]@12590} [520.0332082325227, -754.2886096509344]
14 = {double[2]@12591} [-70.41683112459084, 37.80258788751804]

Additional context

haifengl commented 5 months ago

It is not a bug. As the message states, upgmc is not appropriate on your data. Please try other linkages.