Open sunfanyunn opened 5 years ago
python train.py -a=wideresnet -m=baseline -o=adam -b=225 --dataset=cifar10_zca --gpu=6,7 --lr=0.003 --boundary=0
Epoch: [1199][0/18] Time 0.677 (0.677) Data 0.580 (0.580) Loss 0.0085 (0.0085) Prec@1 99.556 (99.556) Prec@5 100.000 (100.000)
Epoch: [1199][0/18] Time 0.585 (0.585) Data 0.491 (0.491) Loss 0.0145 (0.0145) Prec@1 99.556 (99.556) Prec@5 100.000 (100.000)
Epoch: [1199][0/18] Time 0.659 (0.659) Data 0.580 (0.580) Loss 0.0081 (0.0081) Prec@1 100.000 (100.000) Prec@5 100.000 (100.000)
Epoch: [1199][0/18] Time 0.692 (0.692) Data 0.611 (0.611) Loss 0.0058 (0.0058) Prec@1 100.000 (100.000) Prec@5 100.000 (100.000)
Epoch: [1199][0/18] Time 0.637 (0.637) Data 0.548 (0.548) Loss 0.0158 (0.0158) Prec@1 100.000 (100.000) Prec@5 100.000 (100.000)
Epoch: [1199][0/18] Time 0.629 (0.629) Data 0.546 (0.546) Loss 0.0251 (0.0251) Prec@1 99.556 (99.556) Prec@5 100.000 (100.000)
Epoch: [1199][0/18] Time 0.609 (0.609) Data 0.518 (0.518) Loss 0.0037 (0.0037) Prec@1 100.000 (100.000) Prec@5 100.000 (100.000)
Epoch: [1199][0/18] Time 0.599 (0.599) Data 0.516 (0.516) Loss 0.0107 (0.0107) Prec@1 100.000 (100.000) Prec@5 100.000 (100.000)
Epoch: [1199][0/18] Time 0.629 (0.629) Data 0.544 (0.544) Loss 0.0212 (0.0212) Prec@1 98.667 (98.667) Prec@5 100.000 (100.000)
Epoch: [1199][0/18] Time 0.647 (0.647) Data 0.566 (0.566) Loss 0.0350 (0.0350) Prec@1 98.667 (98.667) Prec@5 100.000 (100.000)
Valid: [0/23] Time 0.383 (0.383) Loss 1.0591 (1.0591) Prec@1 78.667 (78.667) Prec@5 97.778 (97.778)
****** Prec@1 78.360 Prec@5 97.440 Loss 1.198
Test: [0/45] Time 0.457 (0.457) Loss 1.1571 (1.1571) Prec@1 77.333 (77.333) Prec@5 98.222 (98.222)
****** Prec@1 75.990 Prec@5 97.670 Loss 1.342
Best test precision: 77.950
I have just finished running the baseline code, and the top precision is 77.95% (I would expect around 80% according to the paper result). I suspect the L1 regularization has little effect since it is averaged over millions of parameters.
Dear @heitorrapela , what is the result you've got?
P.S. just found out that the widen factor in the code is 3... why..?
@Jongchan , I didn't finish the training. But it seems to be good the result that you got.
WRN-28-2 (not 3!)
@heitorrapela Just got the result over the weekend.
Mean Teacher
python train.py -a=wideresnet -m=mt -o=adam -b=225 --dataset=cifar10_zca --gpu=6,7 --lr=0.0004 --boundary=0
Last log
Learning rate schedule for Adam
Learning rate: 0.000080
Mean Teacher model
Epoch: [1199][0/367] Time 1.057 (1.057) Data 0.948 (0.948) Loss 0.0087 (0.0087) LossCL 0.0040 (0.0040) Prec@1 100.000 (100.000) Prec@5 100.000 (100.000) PrecT@1 100.000 (100.000) PrecT@5 100.000 (100.000)
Epoch: [1199][100/367] Time 0.091 (0.119) Data 0.000 (0.023) Loss 0.0083 (0.0083) LossCL 0.0052 (0.0063) Prec@1 100.000 (99.982) Prec@5 100.000 (100.000) PrecT@1 100.000 (99.973) PrecT@5 100.000 (100.000)
Epoch: [1199][200/367] Time 0.096 (0.115) Data 0.000 (0.020) Loss 0.0063 (0.0086) LossCL 0.0054 (0.0068) Prec@1 100.000 (99.978) Prec@5 100.000 (99.996) PrecT@1 100.000 (99.973) PrecT@5 100.000 (100.000)
Epoch: [1199][300/367] Time 0.098 (0.115) Data 0.000 (0.020) Loss 0.0075 (0.0084) LossCL 0.0043 (0.0065) Prec@1 100.000 (99.982) Prec@5 100.000 (99.997) PrecT@1 100.000 (99.982) PrecT@5 100.000 (100.000)
Valid: [0/23] Time 0.471 (0.471) Loss 0.7086 (0.7086) Prec@1 84.889 (84.889) Prec@5 97.333 (97.333)
****** Prec@1 78.800 Prec@5 96.380 Loss 1.038
Test: [0/45] Time 0.483 (0.483) Loss 0.8086 (0.8086) Prec@1 82.667 (82.667) Prec@5 96.000 (96.000)
****** Prec@1 77.630 Prec@5 96.810 Loss 1.077
Valid: [0/23] Time 0.467 (0.467) Loss 0.7076 (0.7076) Prec@1 84.444 (84.444) Prec@5 96.444 (96.444)
****** Prec@1 78.820 Prec@5 96.440 Loss 1.033
Test: [0/45] Time 0.437 (0.437) Loss 0.8259 (0.8259) Prec@1 81.333 (81.333) Prec@5 96.444 (96.444)
****** Prec@1 77.920 Prec@5 96.690 Loss 1.073
Best test precision: 79.490
PI model
python train.py -a=wideresnet -m=pi -o=adam -b=225 --dataset=cifar10_zca --gpu=6,7 --lr=0.0003 --boundary=0
Last log
Learning rate schedule for Adam
Learning rate: 0.000060
Pi model
Epoch: [1199][0/367] Time 1.046 (1.046) Data 0.924 (0.924) Loss 0.0090 (0.0090) LossPi 0.0022 (0.0022) Prec@1 100.000 (100.000) Prec@5 100.000 (100.000)
Epoch: [1199][100/367] Time 0.146 (0.138) Data 0.000 (0.024) Loss 0.0109 (0.0089) LossPi 0.0018 (0.0028) Prec@1 100.000 (99.982) Prec@5 100.000 (100.000)
Epoch: [1199][200/367] Time 0.119 (0.141) Data 0.000 (0.022) Loss 0.0072 (0.0085) LossPi 0.0009 (0.0030) Prec@1 100.000 (99.987) Prec@5 100.000 (100.000)
Epoch: [1199][300/367] Time 0.123 (0.141) Data 0.000 (0.021) Loss 0.0079 (0.0084) LossPi 0.0011 (0.0029) Prec@1 100.000 (99.991) Prec@5 100.000 (100.000)
Valid: [0/23] Time 0.509 (0.509) Loss 0.7784 (0.7784) Prec@1 84.000 (84.000) Prec@5 96.889 (96.889)
****** Prec@1 79.180 Prec@5 97.220 Loss 0.947
Test: [0/45] Time 0.463 (0.463) Loss 0.9564 (0.9564) Prec@1 80.444 (80.444) Prec@5 97.333 (97.333)
****** Prec@1 78.330 Prec@5 97.420 Loss 1.002
Best test precision: 78.650
Haven't run the baseline WRN 28-2, but it seems far away from the numbers reported in the paper. On the positive side, the results must be better than WRN28-2, because they are already better than WRN28-3 performance.
Wonder if the official TensorFlow code can reproduce the paper result.. :(
In the TensorFlow version, I was able to get the numbers reported in the paper. I can safely think the official code works well with SSL methods, too. I will validate that after, tho. One thing to notice in the official code is that they don't use any regularizations. Kinda different from the NeurIPS paper (and suppl).
It may take a while to thoroughly go over the code, but currently I suspect that the inferior result in this repository is due to possibly different normalization or different split. I will report back after running with the same data and split.
@Jongchan Hi, i am working on the reproduction of the paper in pytorch too. I've checked the wideresnet implementation and find "activation before res" set wrong in comparison with the TF code. I think you can have a try on this issue.
@CheukNgai Hi, currently I am not using the code from this repository. But as you pointed out, it seems they are using the opposite settings from TF implementation. I missed that point. I am currently using codes from https://github.com/xternalz/WideResNet-pytorch/blob/master/wideresnet.py and changed ReLU to LeakyReLU.
Now I can safely achieve baseline performance in the NeurIPS paper, around 20.2~20.4% error rate with plain WRN28-2. Mostly I've changed data pre-processing (the normalization part), and removed L1 / L2 regularization.
Counter-intuitively, removing L1/L2 regularizations improves top-1 accuracy, even if CIFAR-10 is a highly-overfittable dataset.
@Jongchan could you give more details on the your implementation on preprocessing? It would help me a lot! Thank you!
@CheukNgai Sorry, my comment must have mislead you. I just used the same preprocessing as mentioned in this repository's readme (GCN and ZCA) with gaussian noise with std 0.15. I used mean-std normalization before.
Anyways, try removing L1/L2, or set L1 as a very very small number.
Good question, I updated and run the baseline but I had to stop the training. I need to run for 1200 epochs? With 35 I was having good results on CIFAR-10