Closed odedbd closed 3 years ago
I guess that the results of Repeated cv is not considered a partition, since each sample appears multiple times?
Yup you got it right. Since we use cross_val_predict
, each sample must appear exactly once in the test sets for the split to be a partition. It would not make sense to have 0 or more than one prediction for a given sample. If you have too few samples to have reliable estimates with ensemble=False, I'd suggest to just use ensemble=True
Ok, got it, thanks! I'm closing the issue.
Describe the bug
I have script optimizing the params of a classifier (HGBT) wrapped by CalibratedClassifierCV, with RepeatedStratifiedKFold cross validation. This works fine, but when I tried to use the new ensemble=False option, I got the below error.
It works fine with StratifiedKFold, so I guess that the results of Repeated cv is not considered a partition, since each sample appears multiple times? Can this be supported? I can change my code to not use repeated KFold for the calibration, but for small datasets it may be useful to be able to do so.
Steps/Code to Reproduce
Expected Results
No error is thrown.
Actual Results
Traceback (most recent call last): File "C:\Users\foo\Miniconda3\envs\bar\lib\site-packages\IPython\core\interactiveshell.py", line 3343, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 4, in
calibrated_clf.fit(X, y)
File "C:\Users\foo\Miniconda3\envs\bar\lib\site-packages\sklearn\calibration.py", line 325, in fit
predictions = _compute_predictions(pred_method, X, n_classes)
File "C:\Users\foo\Miniconda3\envs\bar\lib\site-packages\sklearn\calibration.py", line 501, in _compute_predictions
predictions = pred_method(X=X)
File "C:\Users\foo\Miniconda3\envs\bar\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\foo\Miniconda3\envs\bar\lib\site-packages\sklearn\model_selection_validation.py", line 845, in cross_val_predict
raise ValueError('cross_val_predict only works for partitions')
ValueError: cross_val_predict only works for partitions
Versions
System: python: 3.6.10 |Anaconda, Inc.| (default, May 7 2020, 19:46:08) [MSC v.1916 64 bit (AMD64)] executable: C:\Users\foo\Miniconda3\envs\bar\python.exe machine: Windows-10-10.0.19041-SP0 Python dependencies: pip: 20.2.2 setuptools: 49.6.0.post20200814 sklearn: 0.24.0 numpy: 1.19.2 scipy: 1.5.2 Cython: None pandas: 1.1.3 matplotlib: 3.3.2 joblib: 0.16.0 threadpoolctl: 2.1.0 Built with OpenMP: True