How to handle imbalanced data?

One way to deal with this is to use "random negative peptides", which are random peptides generated at each training epoch and set to a non-binder affinity. The number of random negative peptides can be specified using the random_negative_constant and random_negative_rate hyperparameters (see e.g. here). The total number of random peptides used is random_negative_rate * N + random_negative_constant where N is the size of the (real) training dataset. Let us know how this goes for you and if it addresses your issue.

openvax / mhcflurry

How to handle imbalanced data? #126