Closed tawe141 closed 4 months ago
I've been getting this error and have the same question please?
When using
random_forest_error()
with a dataset in which the features range between 0 and 1 and of datatypefloat64
, I get a bunch of overflow errors like so:/Users/erictaw/forest-confidence-interval/forestci/calibration.py:86: RuntimeWarning: overflow encountered in exp g_eta_raw = np.exp(np.dot(XX, eta)) * mask /Users/erictaw/forest-confidence-interval/forestci/calibration.py:101: RuntimeWarning: overflow encountered in exp g_eta_raw = np.exp(np.dot(XX, eta_hat)) * mask /Users/erictaw/forest-confidence-interval/forestci/calibration.py:102: RuntimeWarning: invalid value encountered in true_divide g_eta_main = g_eta_raw / sum(g_eta_raw)
Turning off calibration eliminates these errors, of course. Is this something I should be worried about?
I have the same probrem anthe the errors are gone after turning off calibration. Have you found other solutions?
When using
random_forest_error()
with a dataset in which the features range between 0 and 1 and of datatypefloat64
, I get a bunch of overflow errors like so:/Users/erictaw/forest-confidence-interval/forestci/calibration.py:86: RuntimeWarning: overflow encountered in exp g_eta_raw = np.exp(np.dot(XX, eta)) * mask /Users/erictaw/forest-confidence-interval/forestci/calibration.py:101: RuntimeWarning: overflow encountered in exp g_eta_raw = np.exp(np.dot(XX, eta_hat)) * mask /Users/erictaw/forest-confidence-interval/forestci/calibration.py:102: RuntimeWarning: invalid value encountered in true_divide g_eta_main = g_eta_raw / sum(g_eta_raw)
Turning off calibration eliminates these errors, of course. Is this something I should be worried about?
@tawe141
When turning off calibration, the V_IJ_unbias array will contain negetive values, which was mentioned in #25 . If not, all the output is NaN. Do you have any solutions to this?
Thanks.
I am still experiencing this issue.
The method in this library estimate V
, the infenitesimal jackknife variance. It is valid if you have a lot of data (large n
) and a lot of trees in the forest (large B
). They are really only exactly valid in the limit when n
and B
go to infinity.
When you have finite number of trees, you get a bias. The library uses bias correction to attempt and fix this bias. The bias correction is also valid only for large enough n
and B
. If n
or B
is too small, the bias correction can result in negative variance estimates for V
.
The calibration routine tries to fix this problem. It uses a empirical bayes hierarchical model to adjust the variance estimates V
. If your distribution of uncalibrated V
does not correspond to the parametrical modelling assumptions, the calibration routine will not help. And if n
or B
is too small, the empirical distribution of V
does likely not follow the parametric model.
In conclusion: Collect more data and increase the size of the random forest.
When using
random_forest_error()
with a dataset in which the features range between 0 and 1 and of datatypefloat64
, I get a bunch of overflow errors like so:Turning off calibration eliminates these errors, of course. Is this something I should be worried about?