The predictions matrix all_pred initialised by np.zeros(..., dtype=np.int) in line 73 of bias_variance_decomp() is truncating predictions (casting to integer):
This causes wildly inaccurate results if the target variable is small, as predictions are truncated as integers. Regardless, casting predictions to integers doesn't strike me as a desired feature of the bias_variance_decomp() function.
See this gist for a full reproducible example of this, but below are the differences in results in a regression case with a small target variable:
Wow, good catch. Yeah, the examples and unit tests for the MSE loss were all with relatively large numbers so I didn't notice that. That's going to be fixed via #749. Many thanks.
Bug description
The predictions matrix
all_pred
initialised bynp.zeros(..., dtype=np.int)
in line 73 ofbias_variance_decomp()
is truncating predictions (casting to integer):Example of
numpy
behaviour causing the issue:This causes wildly inaccurate results if the target variable is small, as predictions are truncated as integers. Regardless, casting predictions to integers doesn't strike me as a desired feature of the
bias_variance_decomp()
function.See this gist for a full reproducible example of this, but below are the differences in results in a regression case with a small target variable:
Unchanged function results:
Results after removing
dtype=np.int
fromnp.zeros()
inall_pred
initialisation:Steps/Code to Reproduce
See this gist.
Versions
MLxtend 0.17.3 macOS-10.15.6-x86_64-i386-64bit Python 3.8.3 (v3.8.3:6f8c8320e9, May 13 2020, 16:29:34) [Clang 6.0 (clang-600.0.57)] Scikit-learn 0.23.2 NumPy 1.19.2 SciPy 1.5.2