scikit-learn-contrib / py-earth

A Python implementation of Jerome Friedman's Multivariate Adaptive Regression Splines
http://contrib.scikit-learn.org/py-earth/
BSD 3-Clause "New" or "Revised" License
455 stars 121 forks source link

Selected iteration is 0 in pruning pass #197

Open charleszxiong opened 5 years ago

charleszxiong commented 5 years ago

Hi,

I'm using py-earth with multicolumn input and multicolumn output. Currently I have the pruning pass enabled. I ran into this problem where sometimes the selected iteration doesn't appear to come out right. I understand that the selected iteration is the one with the minimum GCV among all iterations in the pruning pass? Sometimes the selected iteration comes out to be 0, resulting in a large model, even though that's not the iteration with the minimum GCV.

I ran some tests as shown below. It seems it might have to do with multicolumn output?

Thanks! Charles

import numpy as np
from pyearth import Earth
# Generate data
np.random.seed(0)
X = np.random.rand(1000,4)
y = np.cos(np.exp(2 * X[:, [1, 0, 3, 2]] + X[:, [2, 3, 0, 1]]**2))
# This is a minimal example I've found where the problem occurs
mars = Earth(max_degree=1, max_terms=100, verbose=2)
mars.fit(X, y[:, [0, 1]])
Beginning forward pass
---------------------------------------------------------------
iter  parent  var  knot  mse       terms  gcv    rsq    grsq   
---------------------------------------------------------------
0     -       -    -     0.432705  1      0.434  0.000  0.000  
1     0       1    283   0.376711  3      0.381  0.129  0.121  
2     0       0    540   0.336250  5      0.344  0.223  0.207  
3     0       1    141   0.330969  7      0.342  0.235  0.212  
4     0       3    452   0.328305  9      0.343  0.241  0.210  
5     0       0    586   0.325640  11     0.343  0.247  0.208  
6     0       0    64    0.323022  13     0.344  0.253  0.207  
7     0       2    -1    0.322201  14     0.345  0.255  0.204  
8     0       2    957   0.320938  16     0.347  0.258  0.199  
9     0       2    718   0.320092  18     0.350  0.260  0.193  
10    0       2    137   0.318617  20     0.352  0.264  0.188  
11    0       2    993   0.317926  22     0.355  0.265  0.181  
12    0       2    417   0.317230  24     0.358  0.267  0.175  
13    0       2    229   0.316417  26     0.361  0.269  0.168  
14    0       2    878   0.315782  28     0.364  0.270  0.161  
15    0       2    3     0.315109  30     0.367  0.272  0.153  
16    0       2    172   0.314489  32     0.370  0.273  0.146  
17    0       2    942   0.313924  34     0.374  0.275  0.138  
18    0       2    548   0.313293  36     0.377  0.276  0.130  
19    0       2    460   0.312742  38     0.381  0.277  0.122  
20    0       2    255   0.312319  40     0.384  0.278  0.114  
---------------------------------------------------------------
Stopping Condition 2: Improvement below threshold
Beginning pruning pass
--------------------------------------------
iter  bf  terms  mse   gcv    rsq    grsq   
--------------------------------------------
0     -   40     0.31  0.385  0.277  0.112  
1     13  39     0.31  0.382  0.278  0.119  
2     15  38     0.31  0.380  0.278  0.124  
3     39  37     0.31  0.378  0.278  0.128  
4     22  36     0.31  0.376  0.278  0.133  
5     34  35     0.31  0.374  0.278  0.138  
6     33  34     0.31  0.372  0.278  0.143  
7     11  33     0.31  0.370  0.278  0.147  
8     20  32     0.31  0.368  0.278  0.152  
9     26  31     0.31  0.366  0.278  0.156  
10    24  30     0.31  0.364  0.278  0.161  
11    1   29     0.31  0.362  0.278  0.165  
12    36  28     0.31  0.360  0.278  0.170  
13    31  27     0.31  0.358  0.278  0.174  
14    18  26     0.31  0.356  0.278  0.179  
15    29  25     0.31  0.354  0.278  0.183  
16    4   24     0.31  0.352  0.278  0.187  
17    17  23     0.31  0.351  0.278  0.191  
18    38  22     0.31  0.349  0.277  0.195  
19    16  21     0.31  0.348  0.276  0.198  
20    37  20     0.31  0.346  0.275  0.201  
21    10  19     0.31  0.345  0.274  0.204  
22    28  18     0.31  0.344  0.272  0.206  
23    25  17     0.32  0.344  0.270  0.208  
24    35  16     0.32  0.343  0.268  0.210  
25    14  15     0.32  0.342  0.266  0.212  
26    7   14     0.32  0.341  0.264  0.213  
27    30  13     0.32  0.341  0.261  0.214  
28    19  12     0.32  0.340  0.258  0.216  
29    21  11     0.32  0.340  0.255  0.216  
30    27  10     0.32  0.339  0.254  0.219  
31    32  9      0.32  0.338  0.252  0.221  
32    23  8      0.32  0.337  0.250  0.223  
33    8   7      0.33  0.337  0.246  0.223  
34    5   6      0.33  0.339  0.238  0.219  
35    9   5      0.34  0.344  0.223  0.207  
36    12  4      0.35  0.352  0.199  0.187  
37    3   3      0.37  0.380  0.133  0.125  
38    2   2      0.41  0.410  0.058  0.053  
39    6   1      0.43  0.434  -0.000  -0.000  
----------------------------------------------
Selected iteration: 0
# The selected iteration is also reflected in the attribute gcv_
print(mars.gcv_)

# and in summary().
# First, we see that no term was pruned.
# Secondly, we see that the coefficients are huge because a large model was selected
print(mars.summary())
0.38522523845256895
Earth Model
-------------------------------------------------------
Basis Function   Pruned  Coefficient 0  Coefficient 1  
-------------------------------------------------------
(Intercept)      No      -5.74346e+13   -1.95383e+13   
h(x1-0.330441)   No      9.04795e+12    3.07796e+12    
h(0.330441-x1)   No      -9.04795e+12   -3.07796e+12   
h(x0-0.318403)   No      7.67987e+12    2.61257e+12    
h(0.318403-x0)   No      -7.67987e+12   -2.61257e+12   
h(x1-0.922348)   No      -9.04795e+12   -3.07796e+12   
h(0.922348-x1)   No      9.04795e+12    3.07796e+12    
h(x3-0.622968)   No      0.173292       0.543856       
h(0.622968-x3)   No      -0.0658372     0.415127       
h(x0-0.838227)   No      -8.21228e+12   -2.79368e+12   
h(0.838227-x0)   No      8.21228e+12    2.79368e+12    
h(x0-0.552192)   No      5.32415e+11    1.81119e+11    
h(0.552192-x0)   No      -5.32415e+11   -1.81119e+11   
x2               No      2.74186e+13    9.32734e+12    
h(x2-0.663457)   No      -6.57405e+12   -2.23638e+12   
h(0.663457-x2)   No      6.57405e+12    2.23638e+12    
h(x2-0.343067)   No      3.22098e+12    1.09572e+12    
h(0.343067-x2)   No      -3.22098e+12   -1.09572e+12   
h(x2-0.79084)    No      -1.04684e+13   -3.56118e+12   
h(0.79084-x2)    No      1.04684e+13    3.56118e+12    
h(x2-0.0910697)  No      1.09251e+13    3.71653e+12    
h(0.0910697-x2)  No      -1.09251e+13   -3.71653e+12   
h(x2-0.851198)   No      -1.23137e+13   -4.18892e+12   
h(0.851198-x2)   No      1.23137e+13    4.18892e+12    
h(x2-0.757364)   No      -9.44498e+12   -3.21303e+12   
h(0.757364-x2)   No      9.44498e+12    3.21303e+12    
h(x2-0.15454)    No      8.98467e+12    3.05644e+12    
h(0.15454-x2)    No      -8.98467e+12   -3.05644e+12   
h(x2-0.0710361)  No      1.15376e+13    3.92489e+12    
h(0.0710361-x2)  No      -1.15376e+13   -3.92489e+12   
h(x2-0.867167)   No      -1.28019e+13   -4.355e+12     
h(0.867167-x2)   No      1.28019e+13    4.355e+12      
h(x2-0.671516)   No      -6.82041e+12   -2.32019e+12   
h(0.671516-x2)   No      6.82041e+12    2.32019e+12    
h(x2-0.6502)     No      -6.16875e+12   -2.09851e+12   
h(0.6502-x2)     No      6.16875e+12    2.09851e+12    
h(x2-0.175276)   No      8.35073e+12    2.84078e+12    
h(0.175276-x2)   No      -8.35073e+12   -2.84078e+12   
h(x2-0.639622)   No      -5.84536e+12   -1.9885e+12    
h(0.639622-x2)   No      5.84536e+12    1.9885e+12     
-------------------------------------------------------
MSE: 0.3131, GCV: 0.3852, RSQ: 0.2765, GRSQ: 0.1115
# Same with max_degree=2
mars = Earth(max_degree=2, max_terms=100, verbose=2)
mars.fit(X, y[:, [0, 1]])
Beginning forward pass
---------------------------------------------------------------
iter  parent  var  knot  mse       terms  gcv    rsq    grsq   
---------------------------------------------------------------
0     -       -    -     0.432705  1      0.434  0.000  0.000  
1     0       1    283   0.376711  3      0.381  0.129  0.121  
2     0       0    540   0.336250  5      0.344  0.223  0.207  
3     2       2    -1    0.315479  6      0.324  0.271  0.252  
4     4       3    -1    0.297390  7      0.307  0.313  0.292  
5     0       1    141   0.292109  9      0.305  0.325  0.297  
6     8       2    -1    0.281995  10     0.296  0.348  0.318  
7     0       2    957   0.276784  12     0.293  0.360  0.324  
8     11      1    45    0.260287  14     0.279  0.398  0.357  
9     10      1    217   0.238636  16     0.258  0.449  0.405  
10    1       1    616   0.227117  18     0.248  0.475  0.427  
11    10      1    937   0.208339  20     0.230  0.519  0.469  
12    3       0    705   0.203523  22     0.227  0.530  0.476  
13    0       3    476   0.198965  24     0.224  0.540  0.482  
14    23      0    64    0.164085  26     0.187  0.621  0.568  
15    22      0    824   0.149508  28     0.172  0.654  0.603  
16    22      0    849   0.145704  30     0.170  0.663  0.609  
17    1       2    318   0.141926  32     0.167  0.672  0.615  
18    3       3    960   0.139680  34     0.166  0.677  0.616  
19    1       3    121   0.137431  36     0.165  0.682  0.618  
20    23      0    492   0.135419  38     0.165  0.687  0.620  
21    1       1    966   0.133580  40     0.164  0.691  0.621  
22    3       3    613   0.131756  42     0.164  0.696  0.622  
23    1       2    736   0.130233  44     0.164  0.699  0.622  
24    0       0    416   0.128911  46     0.164  0.702  0.622  
25    44      3    613   0.124917  48     0.161  0.711  0.629  
26    22      3    732   0.122916  50     0.160  0.716  0.631  
27    3       1    586   0.121711  52     0.160  0.719  0.630  
28    1       1    610   0.120826  54     0.161  0.721  0.629  
29    1       2    219   0.120031  56     0.162  0.723  0.627  
30    10      2    118   0.118796  58     0.162  0.725  0.627  
31    8       2    683   0.117377  60     0.162  0.729  0.627  
32    3       3    802   0.116671  62     0.163  0.730  0.624  
33    44      3    720   0.115855  64     0.164  0.732  0.623  
34    0       3    498   0.114898  66     0.164  0.734  0.621  
35    65      0    614   0.110237  68     0.159  0.745  0.632  
36    65      0    12    0.107699  70     0.158  0.751  0.636  
37    23      0    908   0.105582  72     0.156  0.756  0.639  
38    65      0    908   0.084437  74     0.127  0.805  0.708  
39    65      0    632   0.080874  76     0.123  0.813  0.717  
40    44      0    926   0.077058  78     0.118  0.822  0.727  
41    0       0    217   0.076130  80     0.119  0.824  0.727  
42    79      3    452   0.074296  82     0.117  0.828  0.730  
43    3       3    323   0.072917  84     0.116  0.831  0.732  
44    79      3    461   0.072057  86     0.116  0.833  0.731  
45    65      3    798   0.071163  88     0.117  0.836  0.731  
46    22      2    4     0.070408  90     0.117  0.837  0.731  
47    78      0    777   0.069694  92     0.117  0.839  0.730  
48    65      0    186   0.069140  94     0.118  0.840  0.729  
49    8       0    630   0.068592  96     0.118  0.841  0.727  
50    45      1    -1    0.068291  97     0.119  0.842  0.727  
---------------------------------------------------------------
Stopping Condition 2: Improvement below threshold
Beginning pruning pass
--------------------------------------------
iter  bf  terms  mse   gcv    rsq    grsq   
--------------------------------------------
0     -   97     0.07  0.119  0.842  0.727  
1     80  96     0.07  0.118  0.842  0.728  
2     73  95     0.07  0.117  0.842  0.730  
3     54  94     0.07  0.116  0.842  0.732  
4     7   93     0.07  0.115  0.842  0.734  
5     22  92     0.07  0.115  0.842  0.735  
6     13  91     0.07  0.114  0.842  0.737  
7     64  90     0.07  0.113  0.842  0.739  
8     83  89     0.07  0.113  0.842  0.740  
9     43  88     0.07  0.112  0.842  0.742  
10    15  87     0.07  0.111  0.842  0.744  
11    70  86     0.07  0.110  0.842  0.745  
12    9   85     0.07  0.110  0.842  0.747  
13    52  84     0.07  0.109  0.842  0.749  
14    78  83     0.07  0.108  0.842  0.750  
15    47  82     0.07  0.108  0.842  0.752  
16    92  81     0.07  0.107  0.842  0.753  
17    60  80     0.07  0.106  0.842  0.755  
18    68  79     0.07  0.106  0.842  0.756  
19    75  78     0.07  0.105  0.842  0.758  
20    17  77     0.07  0.104  0.842  0.759  
21    23  76     0.07  0.104  0.842  0.761  
22    33  75     0.07  0.103  0.842  0.762  
23    3   74     0.07  0.102  0.843  0.764  
24    21  73     0.07  0.101  0.843  0.766  
25    89  72     0.07  0.101  0.843  0.768  
26    61  71     0.07  0.100  0.843  0.769  
27    35  70     0.07  0.100  0.842  0.770  
28    87  69     0.07  0.099  0.842  0.771  
29    91  68     0.07  0.099  0.841  0.771  
30    96  67     0.07  0.099  0.841  0.771  
31    95  66     0.07  0.099  0.841  0.773  
32    28  65     0.07  0.099  0.840  0.773  
33    34  64     0.07  0.099  0.839  0.773  
34    90  63     0.07  0.099  0.838  0.773  
35    94  62     0.07  0.099  0.837  0.772  
36    51  61     0.07  0.098  0.836  0.773  
37    50  60     0.07  0.098  0.835  0.773  
38    93  59     0.07  0.098  0.834  0.773  
39    77  58     0.07  0.098  0.834  0.774  
40    53  57     0.07  0.098  0.833  0.774  
41    88  56     0.07  0.098  0.831  0.773  
42    86  55     0.07  0.099  0.829  0.772  
43    48  54     0.07  0.099  0.828  0.771  
44    49  53     0.08  0.099  0.826  0.771  
45    29  52     0.08  0.100  0.825  0.770  
46    57  51     0.08  0.100  0.823  0.769  
47    16  50     0.08  0.101  0.821  0.768  
48    63  49     0.08  0.102  0.818  0.765  
49    81  48     0.08  0.103  0.815  0.763  
50    84  47     0.08  0.103  0.814  0.762  
51    85  46     0.08  0.103  0.813  0.763  
52    82  45     0.08  0.104  0.811  0.761  
53    32  44     0.08  0.104  0.809  0.761  
54    79  43     0.08  0.104  0.807  0.760  
55    46  42     0.08  0.105  0.804  0.757  
56    42  41     0.09  0.107  0.801  0.754  
57    20  40     0.09  0.108  0.797  0.751  
58    56  39     0.09  0.110  0.793  0.747  
59    55  38     0.09  0.110  0.791  0.746  
60    4   37     0.09  0.113  0.784  0.739  
61    25  36     0.09  0.114  0.781  0.737  
62    62  35     0.10  0.117  0.775  0.731  
63    39  34     0.10  0.121  0.765  0.721  
64    58  33     0.11  0.126  0.755  0.710  
65    10  32     0.11  0.126  0.752  0.709  
66    8   31     0.11  0.126  0.751  0.708  
67    30  30     0.11  0.129  0.744  0.703  
68    36  29     0.12  0.137  0.726  0.683  
69    19  28     0.13  0.148  0.703  0.658  
70    59  27     0.13  0.154  0.689  0.644  
71    11  26     0.14  0.156  0.684  0.640  
72    18  25     0.15  0.165  0.664  0.619  
73    14  24     0.15  0.167  0.657  0.614  
74    76  23     0.16  0.178  0.633  0.589  
75    67  22     0.17  0.187  0.612  0.568  
76    24  21     0.17  0.188  0.608  0.566  
77    44  20     0.17  0.190  0.602  0.561  
78    65  19     0.18  0.194  0.591  0.552  
79    69  18     0.18  0.199  0.579  0.541  
80    41  17     0.19  0.202  0.572  0.535  
81    66  16     0.19  0.208  0.555  0.520  
82    40  15     0.20  0.215  0.538  0.504  
83    26  14     0.20  0.215  0.535  0.503  
84    27  13     0.21  0.222  0.518  0.488  
85    6   12     0.21  0.223  0.513  0.485  
86    45  11     0.22  0.235  0.485  0.458  
87    5   10     0.24  0.255  0.437  0.411  
88    2   9      0.26  0.270  0.401  0.376  
89    38  8      0.28  0.286  0.364  0.341  
90    1   7      0.29  0.302  0.324  0.303  
91    31  6      0.30  0.312  0.299  0.281  
92    72  5      0.33  0.335  0.243  0.227  
93    37  4      0.35  0.355  0.193  0.181  
94    71  3      0.35  0.357  0.186  0.178  
95    74  2      0.39  0.391  0.102  0.098  
96    12  1      0.43  0.434  -0.000  -0.000  
----------------------------------------------
Selected iteration: 0
# There appears to be no issue if the output only has one column (I checked the other columns too)
mars = Earth(max_degree=1, max_terms=100, verbose=2)
mars.fit(X, y[:, 0])
Beginning forward pass
---------------------------------------------------------------
iter  parent  var  knot  mse       terms  gcv    rsq    grsq   
---------------------------------------------------------------
0     -       -    -     0.441590  1      0.442  0.000  0.000  
1     0       1    283   0.333612  3      0.338  0.245  0.237  
2     0       1    332   0.323966  5      0.331  0.266  0.251  
3     0       2    548   0.320772  7      0.331  0.274  0.251  
4     0       3    -1    0.319791  8      0.332  0.276  0.250  
5     0       3    232   0.318910  10     0.334  0.278  0.244  
6     0       3    704   0.318466  12     0.337  0.279  0.237  
7     0       3    87    0.317710  14     0.340  0.281  0.231  
8     0       3    351   0.316300  16     0.342  0.284  0.227  
9     0       3    251   0.314989  18     0.344  0.287  0.222  
10    0       3    906   0.314378  20     0.347  0.288  0.215  
11    0       3    150   0.312958  22     0.349  0.291  0.210  
12    0       2    800   0.311299  24     0.351  0.295  0.206  
13    0       3    512   0.310804  26     0.354  0.296  0.199  
14    0       3    637   0.310363  28     0.358  0.297  0.192  
---------------------------------------------------------------
Stopping Condition 2: Improvement below threshold
Beginning pruning pass
--------------------------------------------
iter  bf  terms  mse   gcv    rsq    grsq   
--------------------------------------------
0     -   28     0.31  0.358  0.297  0.192  
1     25  27     0.31  0.356  0.297  0.196  
2     4   26     0.31  0.354  0.297  0.200  
3     6   25     0.31  0.352  0.297  0.205  
4     11  24     0.31  0.350  0.297  0.209  
5     18  23     0.31  0.348  0.297  0.213  
6     27  22     0.31  0.346  0.297  0.217  
7     13  21     0.31  0.345  0.297  0.221  
8     17  20     0.31  0.343  0.297  0.225  
9     7   19     0.31  0.341  0.297  0.229  
10    21  18     0.31  0.339  0.297  0.233  
11    15  17     0.31  0.337  0.297  0.237  
12    12  16     0.31  0.336  0.297  0.241  
13    26  15     0.31  0.334  0.296  0.244  
14    24  14     0.31  0.333  0.295  0.247  
15    8   13     0.31  0.332  0.295  0.250  
16    10  12     0.31  0.331  0.293  0.252  
17    20  11     0.31  0.331  0.290  0.253  
18    19  10     0.31  0.330  0.288  0.255  
19    22  9      0.32  0.330  0.283  0.254  
20    5   8      0.32  0.330  0.280  0.254  
21    23  7      0.32  0.330  0.277  0.255  
22    14  6      0.32  0.331  0.270  0.252  
23    16  5      0.32  0.330  0.269  0.255  
24    9   4      0.32  0.330  0.266  0.255  
25    3   3      0.33  0.338  0.245  0.237  
26    2   2      0.37  0.368  0.172  0.168  
27    1   1      0.44  0.442  -0.000  -0.000  
----------------------------------------------
Selected iteration: 24
# Same with max_degree=2 (again, I checked the other columns too)
mars = Earth(max_degree=2, max_terms=100, verbose=2)
mars.fit(X, y[:, 0])
Beginning forward pass
---------------------------------------------------------------
iter  parent  var  knot  mse       terms  gcv    rsq    grsq   
---------------------------------------------------------------
0     -       -    -     0.441590  1      0.442  0.000  0.000  
1     0       1    283   0.333612  3      0.338  0.245  0.237  
2     2       2    -1    0.293116  4      0.298  0.336  0.326  
3     1       1    564   0.283081  6      0.291  0.359  0.343  
4     0       2    548   0.275747  8      0.286  0.376  0.353  
5     7       1    45    0.202558  10     0.212  0.541  0.520  
6     1       2    484   0.177536  12     0.188  0.598  0.575  
7     7       1    483   0.161850  14     0.173  0.633  0.608  
8     6       1    937   0.154724  16     0.167  0.650  0.622  
9     6       1    217   0.139965  18     0.153  0.683  0.654  
10    1       1    932   0.117796  20     0.130  0.733  0.706  
11    1       2    318   0.111583  22     0.125  0.747  0.719  
12    1       2    736   0.108973  24     0.123  0.753  0.722  
13    1       2    219   0.107042  26     0.122  0.758  0.724  
14    6       2    118   0.104794  28     0.121  0.763  0.727  
15    0       1    251   0.103390  30     0.120  0.766  0.728  
16    29      2    683   0.095707  32     0.113  0.783  0.745  
17    7       1    317   0.093584  34     0.111  0.788  0.748  
18    29      1    914   0.091782  36     0.110  0.792  0.750  
19    1       2    798   0.090493  38     0.110  0.795  0.751  
20    7       2    740   0.089771  40     0.110  0.797  0.750  
21    7       1    442   0.089120  42     0.111  0.798  0.749  
22    7       1    353   0.088376  44     0.111  0.800  0.749  
23    28      0    -1    0.087811  45     0.111  0.801  0.749  
24    7       1    363   0.087204  47     0.112  0.803  0.748  
25    1       2    160   0.086674  49     0.112  0.804  0.746  
26    29      1    165   0.086174  51     0.113  0.805  0.745  
27    29      1    74    0.085580  53     0.113  0.806  0.744  
28    7       0    -1    0.085436  54     0.114  0.807  0.743  
---------------------------------------------------------------
Stopping Condition 2: Improvement below threshold
Beginning pruning pass
--------------------------------------------
iter  bf  terms  mse   gcv    rsq    grsq   
--------------------------------------------
0     -   54     0.09  0.114  0.806  0.743  
1     15  53     0.09  0.113  0.807  0.744  
2     33  52     0.09  0.112  0.807  0.746  
3     4   51     0.09  0.112  0.807  0.747  
4     52  50     0.09  0.111  0.807  0.749  
5     16  49     0.09  0.111  0.807  0.750  
6     23  48     0.09  0.110  0.807  0.752  
7     45  47     0.09  0.109  0.807  0.753  
8     43  46     0.09  0.109  0.807  0.754  
9     48  45     0.09  0.108  0.807  0.756  
10    35  44     0.09  0.107  0.807  0.757  
11    36  43     0.09  0.107  0.807  0.758  
12    40  42     0.09  0.106  0.807  0.760  
13    25  41     0.09  0.106  0.807  0.761  
14    7   40     0.09  0.105  0.807  0.762  
15    1   39     0.09  0.104  0.807  0.764  
16    5   38     0.09  0.104  0.807  0.766  
17    12  37     0.09  0.103  0.807  0.767  
18    39  36     0.09  0.103  0.807  0.768  
19    53  35     0.09  0.102  0.806  0.769  
20    20  34     0.09  0.102  0.805  0.768  
21    6   33     0.09  0.102  0.805  0.769  
22    46  32     0.09  0.102  0.803  0.769  
23    13  31     0.09  0.102  0.803  0.770  
24    47  30     0.09  0.102  0.801  0.769  
25    44  29     0.09  0.102  0.800  0.769  
26    51  28     0.09  0.103  0.799  0.768  
27    49  27     0.09  0.102  0.798  0.769  
28    38  26     0.09  0.103  0.796  0.768  
29    41  25     0.09  0.103  0.794  0.767  
30    42  24     0.09  0.103  0.794  0.768  
31    50  23     0.09  0.103  0.793  0.768  
32    27  22     0.09  0.103  0.790  0.767  
33    34  21     0.09  0.104  0.788  0.765  
34    29  20     0.09  0.104  0.787  0.765  
35    37  19     0.10  0.105  0.784  0.763  
36    32  18     0.10  0.107  0.778  0.758  
37    22  17     0.10  0.110  0.772  0.752  
38    26  16     0.11  0.114  0.760  0.741  
39    24  15     0.11  0.117  0.753  0.735  
40    19  14     0.12  0.124  0.739  0.721  
41    30  13     0.12  0.131  0.722  0.704  
42    21  12     0.13  0.140  0.701  0.683  
43    8   11     0.14  0.146  0.685  0.669  
44    2   10     0.15  0.153  0.670  0.654  
45    28  9      0.15  0.158  0.658  0.644  
46    14  8      0.17  0.178  0.612  0.598  
47    10  7      0.17  0.179  0.608  0.596  
48    18  6      0.20  0.210  0.537  0.525  
49    11  5      0.22  0.228  0.494  0.484  
50    3   4      0.26  0.269  0.400  0.391  
51    17  3      0.32  0.321  0.281  0.274  
52    9   2      0.43  0.436  0.019  0.014  
53    31  1      0.44  0.442  -0.000  -0.000  
----------------------------------------------
Selected iteration: 23
# Now, let's look at the effect of training size

# Somehow there's no problem with 490 examples
mars = Earth(max_degree=1, max_terms=100, verbose=2)
mars.fit(X[:490], y[:490, [0, 1]])
Beginning forward pass
---------------------------------------------------------------
iter  parent  var  knot  mse       terms  gcv    rsq    grsq   
---------------------------------------------------------------
0     -       -    -     0.429640  1      0.431  0.000  0.000  
1     0       1    322   0.373809  3      0.383  0.130  0.112  
2     0       0    20    0.334051  5      0.350  0.222  0.190  
3     0       3    100   0.329206  7      0.352  0.234  0.184  
4     0       1    295   0.325294  9      0.355  0.243  0.177  
5     0       0    407   0.321383  11     0.358  0.252  0.169  
6     0       0    415   0.317286  13     0.362  0.262  0.162  
7     0       2    -1    0.314383  14     0.362  0.268  0.160  
8     0       2    231   0.311764  16     0.367  0.274  0.149  
9     0       2    264   0.309506  18     0.373  0.280  0.136  
10    0       2    48    0.307832  20     0.379  0.284  0.121  
11    0       2    439   0.306999  22     0.387  0.285  0.103  
12    0       2    210   0.306158  24     0.395  0.287  0.085  
13    0       2    344   0.305191  26     0.403  0.290  0.066  
14    0       2    293   0.304600  28     0.412  0.291  0.046  
15    0       2    95    0.303280  30     0.420  0.294  0.027  
16    0       2    213   0.302168  32     0.428  0.297  0.007  
17    0       2    463   0.300081  34     0.436  0.302  -0.011  
18    0       2    280   0.298779  36     0.445  0.305  -0.032  
19    0       2    232   0.297759  38     0.455  0.307  -0.054  
20    0       2    329   0.296838  40     0.465  0.309  -0.078  
21    0       2    156   0.295752  42     0.475  0.312  -0.102  
22    0       2    2     0.295125  44     0.487  0.313  -0.129  
23    0       2    3     0.294592  46     0.499  0.314  -0.157  
24    0       2    202   0.294104  48     0.512  0.315  -0.186  
25    0       2    428   0.293613  50     0.525  0.317  -0.217  
26    0       2    421   0.292742  52     0.538  0.319  -0.247  
27    0       2    473   0.291933  54     0.552  0.321  -0.278  
28    0       2    460   0.291208  56     0.566  0.322  -0.312  
29    0       2    365   0.290740  58     0.581  0.323  -0.348  
30    0       2    34    0.290190  60     0.597  0.325  -0.385  
31    0       2    380   0.289577  62     0.614  0.326  -0.423  
32    0       2    358   0.288617  64     0.631  0.328  -0.462  
33    0       2    338   0.287444  66     0.647  0.331  -0.501  
34    0       2    371   0.286554  68     0.666  0.333  -0.543  
35    0       2    73    0.284718  70     0.682  0.337  -0.582  
36    0       2    410   0.284086  72     0.703  0.339  -0.629  
37    0       2    254   0.283530  74     0.725  0.340  -0.680  
38    0       2    313   0.283011  76     0.748  0.341  -0.733  
39    0       2    218   0.282547  78     0.772  0.342  -0.789  
40    0       2    284   0.281181  80     0.795  0.346  -0.842  
41    0       2    38    0.280763  82     0.821  0.347  -0.904  
----------------------------------------------------------------
Stopping Condition 2: Improvement below threshold
Beginning pruning pass
---------------------------------------------
iter  bf  terms  mse   gcv    rsq    grsq    
---------------------------------------------
0     -   82     0.29  0.848  0.325  -0.967  
1     15  81     0.28  0.807  0.347  -0.870  
2     36  80     0.28  0.793  0.347  -0.839  
3     31  79     0.28  0.780  0.346  -0.808  
4     68  78     0.28  0.766  0.347  -0.776  
5     51  77     0.28  0.753  0.347  -0.747  
6     3   76     0.28  0.741  0.347  -0.718  
7     42  75     0.28  0.729  0.347  -0.690  
8     41  74     0.28  0.717  0.347  -0.663  
9     73  73     0.28  0.706  0.347  -0.636  
10    19  72     0.28  0.694  0.347  -0.610  
11    44  71     0.28  0.683  0.347  -0.584  
12    2   70     0.28  0.672  0.347  -0.559  
13    57  69     0.28  0.662  0.347  -0.535  
14    81  68     0.28  0.652  0.347  -0.511  
15    22  67     0.28  0.642  0.347  -0.488  
16    75  66     0.28  0.632  0.347  -0.465  
17    9   65     0.28  0.623  0.347  -0.443  
18    78  64     0.28  0.613  0.347  -0.421  
19    17  63     0.28  0.604  0.347  -0.400  
20    25  62     0.28  0.595  0.347  -0.380  
21    28  61     0.28  0.586  0.347  -0.359  
22    20  60     0.28  0.578  0.347  -0.340  
23    49  59     0.28  0.570  0.347  -0.320  
24    53  58     0.28  0.561  0.347  -0.301  
25    77  57     0.28  0.553  0.347  -0.282  
26    26  56     0.28  0.546  0.347  -0.265  
27    71  55     0.28  0.538  0.347  -0.247  
28    47  54     0.28  0.530  0.347  -0.229  
29    55  53     0.28  0.523  0.347  -0.212  
30    62  52     0.28  0.516  0.347  -0.196  
31    66  51     0.28  0.509  0.347  -0.179  
32    35  50     0.28  0.502  0.347  -0.163  
33    64  49     0.28  0.495  0.347  -0.147  
34    58  48     0.28  0.488  0.347  -0.132  
35    60  47     0.28  0.482  0.347  -0.117  
36    33  46     0.28  0.474  0.349  -0.099  
37    48  45     0.28  0.468  0.349  -0.085  
38    59  44     0.28  0.462  0.348  -0.071  
39    80  43     0.28  0.457  0.347  -0.058  
40    70  42     0.28  0.452  0.346  -0.047  
41    56  41     0.28  0.447  0.344  -0.036  
42    63  40     0.28  0.442  0.343  -0.024  
43    43  39     0.28  0.437  0.342  -0.014  
44    52  38     0.28  0.433  0.340  -0.003  
45    45  37     0.28  0.428  0.339  0.007  
46    29  36     0.28  0.424  0.337  0.017  
47    38  35     0.29  0.420  0.335  0.026  
48    13  34     0.29  0.416  0.334  0.036  
49    40  33     0.29  0.412  0.332  0.045  
50    18  32     0.29  0.408  0.330  0.054  
51    74  31     0.29  0.404  0.328  0.063  
52    72  30     0.29  0.400  0.328  0.073  
53    34  29     0.29  0.396  0.326  0.081  
54    37  28     0.29  0.393  0.323  0.089  
55    23  27     0.29  0.390  0.320  0.095  
56    54  26     0.29  0.387  0.318  0.103  
57    16  25     0.29  0.385  0.313  0.108  
58    50  24     0.30  0.383  0.309  0.113  
59    24  23     0.30  0.379  0.308  0.121  
60    21  22     0.30  0.375  0.307  0.131  
61    76  21     0.30  0.373  0.304  0.136  
62    67  20     0.30  0.371  0.299  0.140  
63    6   19     0.30  0.369  0.294  0.144  
64    79  18     0.31  0.368  0.289  0.147  
65    46  17     0.31  0.364  0.288  0.156  
66    65  16     0.31  0.363  0.283  0.159  
67    61  15     0.31  0.359  0.282  0.167  
68    69  14     0.31  0.356  0.281  0.175  
69    32  13     0.31  0.355  0.275  0.177  
70    30  12     0.31  0.351  0.275  0.186  
71    39  11     0.31  0.350  0.270  0.189  
72    14  10     0.32  0.349  0.264  0.192  
73    11  9      0.32  0.348  0.258  0.193  
74    12  8      0.32  0.347  0.253  0.196  
75    5   7      0.32  0.346  0.246  0.197  
76    27  6      0.33  0.346  0.238  0.198  
77    7   5      0.33  0.349  0.224  0.191  
78    8   4      0.35  0.362  0.187  0.162  
79    4   3      0.37  0.380  0.137  0.119  
80    10  2      0.39  0.399  0.083  0.074  
81    1   1      0.43  0.431  0.000  0.000  
--------------------------------------------
Selected iteration: 76
# Since the selected model has only 6 terms, the coefficients have relatively small values
print(mars.summary())
Earth Model
-------------------------------------------------------
Basis Function   Pruned  Coefficient 0  Coefficient 1  
-------------------------------------------------------
(Intercept)      No      -2.10119       -0.56552       
h(x1-0.331269)   No      4.20773        1.60426        
h(0.331269-x1)   Yes     None           None           
h(x0-0.317983)   Yes     None           None           
h(0.317983-x0)   No      0.0475068      3.85766        
h(x3-0.945302)   Yes     None           None           
h(0.945302-x3)   Yes     None           None           
h(x1-0.925395)   No      -10.8064       -4.18702       
h(0.925395-x1)   No      2.42295        1.05407        
h(x0-0.845365)   Yes     None           None           
h(0.845365-x0)   No      -0.131217      -1.68163       
h(x0-0.562066)   Yes     None           None           
h(0.562066-x0)   Yes     None           None           
x2               Yes     None           None           
h(x2-0.725574)   Yes     None           None           
h(0.725574-x2)   Yes     None           None           
h(x2-0.715143)   Yes     None           None           
h(0.715143-x2)   Yes     None           None           
h(x2-0.739551)   Yes     None           None           
h(0.739551-x2)   Yes     None           None           
h(x2-0.207513)   Yes     None           None           
h(0.207513-x2)   Yes     None           None           
h(x2-0.107211)   Yes     None           None           
h(0.107211-x2)   Yes     None           None           
h(x2-0.219861)   Yes     None           None           
h(0.219861-x2)   Yes     None           None           
h(x2-0.983854)   Yes     None           None           
h(0.983854-x2)   Yes     None           None           
h(x2-0.951874)   Yes     None           None           
h(0.951874-x2)   Yes     None           None           
h(x2-0.919507)   Yes     None           None           
h(0.919507-x2)   Yes     None           None           
h(x2-0.930126)   Yes     None           None           
h(0.930126-x2)   Yes     None           None           
h(x2-0.962395)   Yes     None           None           
h(0.962395-x2)   Yes     None           None           
h(x2-0.90315)    Yes     None           None           
h(0.90315-x2)    Yes     None           None           
h(x2-0.774748)   Yes     None           None           
h(0.774748-x2)   Yes     None           None           
h(x2-0.764562)   Yes     None           None           
h(0.764562-x2)   Yes     None           None           
h(x2-0.791725)   Yes     None           None           
h(0.791725-x2)   Yes     None           None           
h(x2-0.0710361)  Yes     None           None           
h(0.0710361-x2)  Yes     None           None           
h(x2-0.357425)   Yes     None           None           
h(0.357425-x2)   Yes     None           None           
h(x2-0.154841)   Yes     None           None           
h(0.154841-x2)   Yes     None           None           
h(x2-0.188732)   Yes     None           None           
h(0.188732-x2)   Yes     None           None           
h(x2-0.112752)   Yes     None           None           
h(0.112752-x2)   Yes     None           None           
h(x2-0.175276)   Yes     None           None           
h(0.175276-x2)   Yes     None           None           
h(x2-0.458723)   Yes     None           None           
h(0.458723-x2)   Yes     None           None           
h(x2-0.652103)   Yes     None           None           
h(0.652103-x2)   Yes     None           None           
h(x2-0.555938)   Yes     None           None           
h(0.555938-x2)   Yes     None           None           
h(x2-0.493407)   Yes     None           None           
h(0.493407-x2)   Yes     None           None           
h(x2-0.57759)    Yes     None           None           
h(0.57759-x2)    Yes     None           None           
h(x2-0.531494)   Yes     None           None           
h(0.531494-x2)   Yes     None           None           
h(x2-0.511319)   Yes     None           None           
h(0.511319-x2)   Yes     None           None           
h(x2-0.519985)   Yes     None           None           
h(0.519985-x2)   Yes     None           None           
h(x2-0.588639)   Yes     None           None           
h(0.588639-x2)   Yes     None           None           
h(x2-0.598316)   Yes     None           None           
h(0.598316-x2)   Yes     None           None           
h(x2-0.316543)   Yes     None           None           
h(0.316543-x2)   Yes     None           None           
h(x2-0.329651)   Yes     None           None           
h(0.329651-x2)   Yes     None           None           
h(x2-0.12382)    Yes     None           None           
h(0.12382-x2)    Yes     None           None           
-------------------------------------------------------
MSE: 0.3272, GCV: 0.3460, RSQ: 0.2383, GRSQ: 0.1979
# Adding just one example (thus 491 examples), problem occurs...
mars = Earth(max_degree=1, max_terms=100, verbose=2)
mars.fit(X[:491], y[:491, [0, 1]])
Beginning forward pass
---------------------------------------------------------------
iter  parent  var  knot  mse       terms  gcv    rsq    grsq   
---------------------------------------------------------------
0     -       -    -     0.430390  1      0.432  0.000  0.000  
1     0       1    137   0.373551  3      0.383  0.132  0.114  
2     0       0    20    0.333996  5      0.349  0.224  0.191  
3     0       3    100   0.329206  7      0.352  0.235  0.186  
4     0       0    407   0.325306  9      0.355  0.244  0.178  
5     0       0    415   0.321239  11     0.358  0.254  0.171  
6     0       1    226   0.317639  13     0.362  0.262  0.163  
7     0       2    -1    0.314851  14     0.363  0.268  0.161  
8     0       2    236   0.312362  16     0.368  0.274  0.149  
9     0       2    114   0.310429  18     0.374  0.279  0.135  
10    0       2    147   0.308155  20     0.379  0.284  0.122  
11    0       2    13    0.307349  22     0.387  0.286  0.104  
12    0       2    436   0.306555  24     0.395  0.288  0.086  
13    0       2    103   0.305531  26     0.403  0.290  0.067  
14    0       2    293   0.304861  28     0.412  0.292  0.047  
15    0       2    95    0.303738  30     0.420  0.294  0.028  
16    0       2    213   0.302433  32     0.428  0.297  0.008  
17    0       2    463   0.300238  34     0.436  0.302  -0.009  
18    0       2    280   0.298927  36     0.445  0.305  -0.029  
19    0       2    232   0.298038  38     0.455  0.308  -0.052  
20    0       2    4     0.297077  40     0.465  0.310  -0.076  
21    0       2    444   0.296600  42     0.476  0.311  -0.102  
22    0       2    363   0.296129  44     0.488  0.312  -0.129  
23    0       2    485   0.295641  46     0.500  0.313  -0.157  
24    0       2    354   0.294809  48     0.512  0.315  -0.185  
25    0       2    214   0.294172  50     0.525  0.316  -0.215  
26    0       2    339   0.293754  52     0.539  0.317  -0.247  
----------------------------------------------------------------
Stopping Condition 2: Improvement below threshold
Beginning pruning pass
---------------------------------------------
iter  bf  terms  mse   gcv    rsq    grsq    
---------------------------------------------
0     -   52     0.29  0.539  0.317  -0.248  
1     14  51     0.29  0.531  0.318  -0.229  
2     46  50     0.29  0.524  0.318  -0.213  
3     24  49     0.29  0.517  0.318  -0.197  
4     2   48     0.29  0.510  0.318  -0.181  
5     9   47     0.29  0.503  0.318  -0.165  
6     36  46     0.29  0.497  0.318  -0.150  
7     33  45     0.29  0.490  0.318  -0.134  
8     42  44     0.29  0.484  0.318  -0.120  
9     28  43     0.29  0.478  0.318  -0.105  
10    3   42     0.29  0.472  0.318  -0.091  
11    49  41     0.29  0.465  0.318  -0.077  
12    39  40     0.29  0.460  0.318  -0.063  
13    21  39     0.29  0.454  0.318  -0.050  
14    23  38     0.29  0.448  0.318  -0.037  
15    17  37     0.29  0.443  0.318  -0.024  
16    45  36     0.29  0.437  0.318  -0.011  
17    13  35     0.29  0.432  0.317  0.001  
18    26  34     0.29  0.426  0.318  0.013  
19    50  33     0.29  0.421  0.318  0.025  
20    41  32     0.29  0.416  0.317  0.037  
21    35  31     0.29  0.408  0.322  0.055  
22    40  30     0.29  0.402  0.324  0.069  
23    51  29     0.29  0.398  0.323  0.079  
24    22  28     0.29  0.394  0.322  0.088  
25    43  27     0.29  0.390  0.321  0.097  
26    29  26     0.29  0.387  0.319  0.105  
27    31  25     0.30  0.386  0.313  0.107  
28    34  24     0.30  0.382  0.311  0.115  
29    37  23     0.30  0.379  0.308  0.122  
30    38  22     0.30  0.376  0.306  0.130  
31    18  21     0.30  0.373  0.304  0.137  
32    25  20     0.30  0.370  0.302  0.144  
33    20  19     0.30  0.367  0.300  0.152  
34    47  18     0.30  0.364  0.298  0.159  
35    48  17     0.30  0.361  0.296  0.165  
36    44  16     0.30  0.358  0.294  0.172  
37    6   15     0.31  0.356  0.289  0.176  
38    7   14     0.31  0.355  0.283  0.177  
39    10  13     0.31  0.354  0.278  0.180  
40    15  12     0.31  0.354  0.270  0.180  
41    30  11     0.32  0.354  0.262  0.181  
42    32  10     0.32  0.350  0.262  0.189  
43    19  9      0.32  0.349  0.258  0.193  
44    16  8      0.32  0.347  0.253  0.196  
45    5   7      0.32  0.347  0.246  0.198  
46    27  6      0.33  0.346  0.239  0.199  
47    11  5      0.33  0.349  0.225  0.192  
48    12  4      0.35  0.362  0.189  0.163  
49    4   3      0.37  0.380  0.138  0.120  
50    8   2      0.39  0.399  0.086  0.076  
51    1   1      0.43  0.432  -0.000  -0.000  
----------------------------------------------
Selected iteration: 0
print(mars.summary())
Earth Model
-------------------------------------------------------
Basis Function   Pruned  Coefficient 0  Coefficient 1  
-------------------------------------------------------
(Intercept)      No      3.98638e+13    -2.37968e+13   
h(x1-0.333965)   No      -3.66554e+12   2.18815e+12    
h(0.333965-x1)   No      3.66554e+12    -2.18815e+12   
h(x0-0.317983)   No      -3.22491e+12   1.92512e+12    
h(0.317983-x0)   No      3.22491e+12    -1.92512e+12   
h(x3-0.945302)   No      -3.31348       13.2036        
h(0.945302-x3)   No      -0.163574      0.157959       
h(x0-0.845365)   No      3.38884e+12    -2.02298e+12   
h(0.845365-x0)   No      -3.38884e+12   2.02298e+12    
h(x0-0.562066)   No      -1.6393e+11    9.78586e+10    
h(0.562066-x0)   No      1.6393e+11     -9.78586e+10   
h(x1-0.918546)   No      3.66554e+12    -2.18815e+12   
h(0.918546-x1)   No      -3.66554e+12   2.18815e+12    
x2               No      -1.28533e+13   7.6728e+12     
h(x2-0.718626)   No      2.58545e+12    -1.54339e+12   
h(0.718626-x2)   No      -2.58545e+12   1.54339e+12    
h(x2-0.730709)   No      2.73698e+12    -1.63385e+12   
h(0.730709-x2)   No      -2.73698e+12   1.63385e+12    
h(x2-0.739884)   No      2.85204e+12    -1.70253e+12   
h(0.739884-x2)   No      -2.85204e+12   1.70253e+12    
h(x2-0.208877)   No      -3.80718e+12   2.27271e+12    
h(0.208877-x2)   No      3.80718e+12    -2.27271e+12   
h(x2-0.107301)   No      -5.08101e+12   3.03312e+12    
h(0.107301-x2)   No      5.08101e+12    -3.03312e+12   
h(x2-0.221161)   No      -3.65312e+12   2.18074e+12    
h(0.221161-x2)   No      3.65312e+12    -2.18074e+12   
h(x2-0.983854)   No      5.91159e+12    -3.52894e+12   
h(0.983854-x2)   No      -5.91159e+12   3.52894e+12    
h(x2-0.951874)   No      5.51055e+12    -3.28954e+12   
h(0.951874-x2)   No      -5.51055e+12   3.28954e+12    
h(x2-0.919507)   No      5.10464e+12    -3.04723e+12   
h(0.919507-x2)   No      -5.10464e+12   3.04723e+12    
h(x2-0.930126)   No      5.23781e+12    -3.12673e+12   
h(0.930126-x2)   No      -5.23781e+12   3.12673e+12    
h(x2-0.962395)   No      5.64248e+12    -3.3683e+12    
h(0.962395-x2)   No      -5.64248e+12   3.3683e+12     
h(x2-0.90315)    No      4.8995e+12     -2.92477e+12   
h(0.90315-x2)    No      -4.8995e+12    2.92477e+12    
h(x2-0.778157)   No      3.332e+12      -1.98905e+12   
h(0.778157-x2)   No      -3.332e+12     1.98905e+12    
h(x2-0.0741245)  No      -5.49707e+12   3.28149e+12    
h(0.0741245-x2)  No      5.49707e+12    -3.28149e+12   
h(x2-0.271551)   No      -3.0212e+12    1.80351e+12    
h(0.271551-x2)   No      3.0212e+12     -1.80351e+12   
h(x2-0.16026)    No      -4.41687e+12   2.63666e+12    
h(0.16026-x2)    No      4.41687e+12    -2.63666e+12   
h(x2-0.193236)   No      -4.00332e+12   2.38979e+12    
h(0.193236-x2)   No      4.00332e+12    -2.38979e+12   
h(x2-0.140316)   No      -4.66698e+12   2.78597e+12    
h(0.140316-x2)   No      4.66698e+12    -2.78597e+12   
h(x2-0.766591)   No      3.18696e+12    -1.90247e+12   
h(0.766591-x2)   No      -3.18696e+12   1.90247e+12    
-------------------------------------------------------
MSE: 0.2940, GCV: 0.5393, RSQ: 0.3169, GRSQ: -0.2481
jcrudy commented 5 years ago

@charleszxiong Thanks for reporting this. It definitely looks like a bug. I'll try to figure out what's going on as soon as I can. In the mean time, if you have any additional thought or observations, please post here.

To anyone reading this, additional reports are welcome, as are pull requests.