Is it possible to implement stratified sampling for use in the bootstrapping process? In sklearn.utils.resample there is an extra parameter stratify which takes an array of the same shape of the data and samples the data in proportion to the stratify parameter. The current method bootstraps indices which are not linked to the class of the data.
Attempt at solution:
from scipy.stats import bootstrap
import numpy as np
from typing import List, Callable, Optional, Tuple
from sklearn.utils import resample
from dataclasses import dataclass
@dataclass
class BootstrapResult:
bootstrap_distribution: np.ndarray
bootstrap_methods = [
'bootstrap_bca',
'bootstrap_percentile',
'bootstrap_basic']
class BootstrapParams:
n_resamples: int
random_state: Optional[np.random.RandomState]
def bootstrap_ci(y_true: List[int],
y_pred: List[int],
metric: Callable,
confidence_level: float = 0.95,
n_resamples: int = 9999,
method: str = 'bootstrap_bca',
random_state: Optional[np.random.RandomState] = None,
strata: Optional[List[int]] = None) -> Tuple[float, Tuple[float, float]]:
def statistic(*indices):
indices = np.array(indices)[0, :]
#return metric(np.array(y_true)[indices], np.array(y_pred)[indices])
try:
return metric(np.array(y_true)[indices], np.array(y_pred)[indices])
except:
print('I failed lol', indices, np.unique(np.array(y_true)[indices]))
pass
assert method in bootstrap_methods, f'Bootstrap ci method {method} not in {bootstrap_methods}'
indices = (np.arange(len(y_true)), )
bootstrap_distribution = [metric(*resample(y_true, y_pred, stratify=y_true))
for _ in range(n_resamples)]
bootstrap_res_test = BootstrapResult(bootstrap_distribution=np.array(bootstrap_distribution))
#print(bootstrap_res)
bootstrap_res = bootstrap(indices,
statistic=statistic,
n_resamples=0,
confidence_level=confidence_level,
method=method.split('bootstrap_')[1],
bootstrap_result=bootstrap_res_test,
random_state=random_state)
#print(bootstrap_res.bootstrap_distribution)
np.testing.assert_equal(bootstrap_res.bootstrap_distribution, bootstrap_res_test.bootstrap_distribution)
result = metric(y_true, y_pred)
ci = bootstrap_res.confidence_interval.low, bootstrap_res.confidence_interval.high
return result, ci
The main idea I tried was to use resample with stratify=y_true and input that bootstrapped distribution into the scipy.stats.bootstrap function. This fails when the bootstrap method is not "percentile" because the bootstrap function calls on statistic for the evaluation of the confidence limits.
A more simple example to test when statistic is called is the following:
from scipy.stats import bootstrap
from dataclasses import dataclass
@dataclass
class BootstrapResult:
bootstrap_distribution: np.ndarray
def noisy_mean(arr):
print("HI", arr)
return np.mean(arr)
bootstrap(([1,2,3,4],), noisy_mean, n_resamples=0, #method='percentile',
bootstrap_result=BootstrapResult(bootstrap_distribution=np.array([5,6,7,8,9])))
Context:
I would like to use this package for multi class AUROC. However, there are no easy to find methods which compute analytical confidence intervals for the one-vs-rest and one-vs-one cases. This means I would use bootstrapping to compute the confidence interval. Sometimes the bootstrapping method would select (randomly) a subset of y_true with all of the same classes. This happens more frequently with imbalanced datasets (which are common in healthcare). This would break the AUROC (as it is not defined in that case) hence throw an error in my code. It seems like using stratified bootstrapping (where the proportion of classes after resampling stays the same) avoids this issue (because there would be more than one class in the sample). Hence, I would like to introduce this feature. However I am having difficulty actually constructing the solution.
Thank you for this fantastic package. It is very helpful and I believe it to be a new gold standard for ML evaluation.
Aloha @jacobgil,
Concern:
Is it possible to implement stratified sampling for use in the bootstrapping process? In
sklearn.utils.resample
there is an extra parameterstratify
which takes an array of the same shape of the data and samples the data in proportion to the stratify parameter. The current method bootstraps indices which are not linked to the class of the data.Attempt at solution:
The main idea I tried was to use
resample
withstratify=y_true
and input that bootstrapped distribution into thescipy.stats.bootstrap
function. This fails when the bootstrap method is not "percentile" because the bootstrap function calls on statistic for the evaluation of the confidence limits.A more simple example to test when
statistic
is called is the following:Context: I would like to use this package for multi class AUROC. However, there are no easy to find methods which compute analytical confidence intervals for the one-vs-rest and one-vs-one cases. This means I would use bootstrapping to compute the confidence interval. Sometimes the bootstrapping method would select (randomly) a subset of
y_true
with all of the same classes. This happens more frequently with imbalanced datasets (which are common in healthcare). This would break the AUROC (as it is not defined in that case) hence throw an error in my code. It seems like using stratified bootstrapping (where the proportion of classes after resampling stays the same) avoids this issue (because there would be more than one class in the sample). Hence, I would like to introduce this feature. However I am having difficulty actually constructing the solution.Thank you for this fantastic package. It is very helpful and I believe it to be a new gold standard for ML evaluation.