Open PierreQuinton opened 1 year ago
Hi @PierreQuinton ,
It seems like what you need is a custom Sampler. IIUC, https://github.com/ufoym/imbalanced-dataset-sampler should be pretty close to what you're looking for?
@NicolasHug Thanks for your answer, yes this is exactly what I am looking for. I'm not sure if you would like to add something similar to torch or if you would close the issue, I leave it up to you.
Thanks @PierreQuinton . I'll keep the issue open and rename it for clarity. Ultimately, what is needed to enable that is:
i) is definitely in scope for torchvision and this is something we'd be doing if we ever re-start our work on a dataset revamp (CC @pmeier ). For ii), we can decide when the time comes, but I don't see why not
🚀 The feature
For each classification datasets with balanced distribution on the classes (MNIST, CIFAR-N, etc...), it would be very useful to provide a standard dataset for the imbalanced version of the dataset. For a dataset with $n$ classes, define the imbalance factor $a\in [0,1]$, then the proportion of class $i$ is typically be proportional to $a^{i/(n-1)}$, we need to normalize so that the proportions sums to $1$. For $a=1$ this is uniform and the smaller the imbalance coefficient the more imbalanced the dataset is.
I am not sure if torch vision should provide with the datasets or provide a data loader that imbalance the dataset.
Motivation, pitch
Many papers are published on the problem of training on an imbalanced dataset and testing on a balanced dataset, for instance see this. As far as I know, there is no systematic way of generating such data sets for people using Pytorch. Here are few very similar implementations that are not fully satisfying :
Such datasets seems to exist on TensorFlow, for instance section 3 of the readme of this repo provides with links to download
tfrecord
datasets.I feels like it could be a very nice feature of torchvision to either contain such datasets or be able to craft them easily.
Alternatives
No response
Additional context
No response
cc @pmeier