online-ml / river

🌊 Online machine learning in Python
https://riverml.xyz
BSD 3-Clause "New" or "Revised" License
4.89k stars 538 forks source link

Add a trained_on_dist to RandomSampler #1511

Closed niccolopetti closed 4 months ago

niccolopetti commented 4 months ago

When performing RandomSampling using RandomUnderSampler, RandomOverSampler or RandomSampler from random.py we set a desired class distribution,

but in an online setting we know we can't be 100% sure the sampling will give us the exact distribution we wanted,

so a variable to track that might be useful, as we have _actual_dist to keep track of all the data that went through the model, I believe a _trained_on_dist might also be useful, to track the distribution of the data that was used to train the base model with the sampling technique chosen