Unable to reproduce Monkey experiment

thulas / dac-label-noise

Label de-noising for deep learning

58 stars 9 forks source link

Unable to reproduce Monkey experiment #3

Closed RIMcKinley closed 4 years ago

RIMcKinley commented 4 years ago

If I run

python train_dac.py --datadir <path-to-stl10-data> --dataset stl10-c --train_y train_y_downshifted_random_monkeys.bin --nesterov --net_type vggnet -use-gpu --epochs 200 --loss_fn dac_loss --learn_epochs 20 --seed 0

which I believe is the correct command to reproduce the monkey experiment, as documented in the paper, then by about epoch 150 the network is heavily overfitting: almost no abstentions on either train or test, train accuracy of 99 per cent and validation accuracy of about 3%. What should I expect?

thulas commented 4 years ago

@RIMcKinley

If you train long enough, abstention will go to zero -- we discuss this in the paper. So abstention hitting zero at 200 epochs is not surprising. For the random monkeys, best abstention occurs around epoch 75. (I'm working on a feature that will stabilize abstention at a pre-specified rate indefinitely. Hoping to release it soon).

However, the 3% validation accuracy you're seeing indicates a bug somewhere; It should be no worse than about 60%. Can you check the labels on your validation set, and see if they are in the [0,9] range? If I remember the original STL-10 test labels were in the [1,10] range so you would need to downshift those.

RIMcKinley commented 4 years ago

Hi , I checked, and you're right: the command suggested for reproducing the experiment uses downshifted labels for the training data, but not for the testing data. I was able to downshift manually, and it works now. For the sake of others who want to reproduce, it would probably be best to either add a downshifted test label binary to the data folder of this repo, or to make a change in the dataloader code.

thulas commented 4 years ago

@RIMcKinley

I'll update the repo with the downshifted labels.

thulas commented 4 years ago

Repo has been updated with the training labels required to reproduce the random monkey experiments.

New updates also include Python 3 support and ability to stabilize abstention behavior to a pre-determined setpoint.