juaml / julearn

Forschungszentrum Jülich Machine Learning Library
https://juaml.github.io/julearn
GNU Affero General Public License v3.0
30 stars 19 forks source link

[BUG]: StratifiedBootstrap can give the same sample on train and test set #254

Open fraimondo opened 6 months ago

fraimondo commented 6 months ago

Is there an existing issue for this?

Current Behavior

Here we can see when the random choice is made and then split into train/test.

https://github.com/juaml/julearn/blob/2e30b6e0eaeae095c39a3a68c315c7535b10aea9/julearn/model_selection/stratified_bootstrap.py#L100-L102

Expected Behavior

Basically, whatever gets chosen as test, should not be in the train.

This does not go with the Out of Bag Boostrap defitinion.

We should resample with repetition and whatever sample is not in the train set, is the test.

This can also allow us to implement the .632 and .632+ scoring correction methods.

Steps To Reproduce

latest julearn

Environment

not relevant

Relevant log output

No response

Anything else?

No response