uclamii / model_tuner

A library to tune the hyperparameters of common ML models. Supports calibration and custom pipelines.
Apache License 2.0
3 stars 0 forks source link

Balanced Bootstrap Fix For Newer Pandas Version #42

Closed lshpaner closed 3 weeks ago

lshpaner commented 1 month ago

Compatibility Issue with Pandas 2.23+ in Sampling Method (y[y == class_label] vs y[y.values == class_label])

In the function sampling_method, there is an issue when using higher versions of Pandas (2.23+). Specifically, the comparison y[y == class_label] causes a failure due to changes in how Pandas Series objects handle boolean indexing.

Pandas 2.23+ enforces the use of .values when performing boolean indexing involving scalar values like class_label. Therefore, instead of using y[y == class_label], it is necessary to modify the code to use y[y.values == class_label] for compatibility.

In versions of Pandas 2.23+, using y[y == class_label] will throw an error or yield unexpected behavior because Pandas does not implicitly convert the Series object to a NumPy array for the boolean comparison. The error is referenced below:

ValueError: Input y_true contains NaN.

After fixing this bug, the method should work as expected when performing the comparison between y and class_label, by explicitly using y.values == class_label to comply with newer Pandas behavior.

Proposed Change

Modify the following line:

class_samples = y[y == class_label]

to:

class_samples = y[y.values == class_label]