Closed 0x1orz closed 5 years ago
One way to deal with this is to use "random negative peptides", which are random peptides generated at each training epoch and set to a non-binder affinity. The number of random negative peptides can be specified using the random_negative_constant
and random_negative_rate
hyperparameters (see e.g. here). The total number of random peptides used is random_negative_rate * N + random_negative_constant
where N is the size of the (real) training dataset. Let us know how this goes for you and if it addresses your issue.
After loaded the data from IEDB , Kim2014, Abelin2017 or some published MHC I ligands data identified by mass-spec, the positive data is a dozen times then the negative data. How to handle the imbalance for training our models.