junjieliu2910 / DynamicSparseTraining

[ICLR-2020] Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers.
32 stars 5 forks source link

training on cifar10 does not shows performance reported in paper #2

Open rnd810 opened 3 years ago

rnd810 commented 3 years ago

Hi. I downloaded and run your code.

Your code was neat and intuitive. But the thing is that training didn't work well on my environment.

I used python 3.7 and pytorch 1.2 as recommended but vgg16 printed NaN in every print statement after training of 50 epoch.

Wide-resnet made it, showing over 95% accuracy but model keep ratio was over 15%.

I just downloaded and ran the code as in README file but still got no idea why.

Do you have any idea why? Or could you share the trained model which was reported in the paper?

Many thanks! :)

junjieliu2910 commented 3 years ago

If the log print NaN, you can try to reduce the alpha value or change the random seed. Since the command in the README file should get a VGG16 model with 93% accuracy and over 90% sparsity (less than 10% keep ratio). The NaN case happens sometimes when we ran the experiments.

For the wide resnet case, the keep ratio is 15%, then the sparsity is 85%. This point is a little bit confusing since the keep ratio is how many portions of weights are kept. If you want to get higher sparsity, you can carefully increase the alpha value.

rnd810 commented 3 years ago

Thank you for reply. I already tried reducing alpha for vgg16 (alpha = 5,4,3,2,1). I could train vgg with alpha=1 and 2. Best performance was 93.46% but I couldn't get keep ratio under 5%, just like reported in the paper. I believe model keep ratio printed during training means the term 'model remaining percentage' in paper. Adjusting alpha could achieve meaningful result but couldn't get the compatible result with paper. Did you encounter similar problem in experiment? I tried to train vgg with given hyperparameter (alpha = 5) for several times so I don't think it's a coincidence. Thanks!

junjieliu2910 commented 3 years ago

For VGG16, the alpha value that could get keep ratio under 5% is over 1e-5. When we ran the experiments, the training is stable even with alpha=5e-5. I just rerun the code with pytorch 1.8 and cuda 11 and found the same problem for VGG. I have checked the original data and will test the code with the original setting (pytorch 1.1 and cuda 9.1) first. Since we used a different server to train the VGG model, the pytorch version used is 1.1.