torch subsampling - Githubissues

Because the classes has-edge and no-edge have a very high imbalance (class has-edge is less than one percent), the train data has to be sampled.

First I tried to use a self implemented padding (with the keras-model) where I duplicate edges until both classes have the same size. This didn't seem to improve the result much.

Now (with the pytorch-model) I'm using the WeightedRandomSampler from torch.utils.data. Instead of just adding data points, it samples the original dataset with random data points with weights. The weights I provide ensure that the sampled dataset is expected to have the same classes size. This did seem to improve the result quite a bit. But no direct comparison was made.

wagpa / embedding-eval-framework

torch subsampling #33