How to deal with imbalanced data?

tensorflow / probability

Probabilistic reasoning and statistical analysis in TensorFlow

https://www.tensorflow.org/probability/

Apache License 2.0

4.27k stars 1.1k forks source link

How to deal with imbalanced data? #750

Open mcourteaux opened 4 years ago

mcourteaux commented 4 years ago

I'm new to TFP and probabilistic models. With deterministic NNs, I could balance my data by oversampling. However, my intuition says one shouldn't do this with probabilistic networks.

Currently, I'm working on a regression problem with imbalanced data. I'd like to attempt TFP for this. Are there any guidelines or references to deal with this?

mcourteaux commented 4 years ago

What about using Dropout? I have so many questions... Is there a best-practices document?

gtancev commented 4 years ago

Do you have any information about the true class distribution? Because then you can set a prior distribution over the classes in the output using a activity_regularizer in the output layer.

activity_regularizer=tfp.layers.KLDivergenceRegularizer(prior, weight=1/n_batches)

But I am not an expert in TFP (yet).