Closed FrankWork closed 6 years ago
This was done to balance the classes because the negative class is much larger than the positive classes. Addressing class imbalances is important for improving recall of the minority classes. At each step, we sample either a positive or negative batch with equal probability.
There are other ways one could address this. For example, as you suggested, you could mix all samples together and then down weight the loss for negative samples. This is an equally valid strategy and would just require some tuning to calibrate with the existing hyper-parameters.
Thanks a lot. I finally got it.
hi, I'm confused. Why do you separate positive and negative examples? Why not mix them together and shuffle?