Closed Tsings04 closed 3 years ago
I only have one 1660ti cards and set the batch_size to 64. and it runs successful.
It is the normal case currently for addernet.
It is the normal case currently for addernet.
Hi Hanting, I am trying to reproduce the experiments. When i train the BNN networks for cifar dataset using the setup in paper (SGD lr 0.1, momentum 0.9, weight decay 0.0005; batch size 256, epochs 400), but all BNN networks(VGG, ResNet20, ResNet32) just not achieve the accuracy in paper. Do you have any advise for the training for BNN? I have never tried BNN before :p
It is the normal case currently for addernet.
Hi Hanting, I am trying to reproduce the experiments. When i train the BNN networks for cifar dataset using the setup in paper (SGD lr 0.1, momentum 0.9, weight decay 0.0005; batch size 256, epochs 400), but all BNN networks(VGG, ResNet20, ResNet32) just not achieve the accuracy in paper. Do you have any advise for the training for BNN? I have never tried BNN before :p
We use the Dorefa-Net for training BNN. Which method do you use?
It is the normal case currently for addernet.
Hi Hanting, I am trying to reproduce the experiments. When i train the BNN networks for cifar dataset using the setup in paper (SGD lr 0.1, momentum 0.9, weight decay 0.0005; batch size 256, epochs 400), but all BNN networks(VGG, ResNet20, ResNet32) just not achieve the accuracy in paper. Do you have any advise for the training for BNN? I have never tried BNN before :p
We use the Dorefa-Net for training BNN. Which method do you use?
I see! I just follow the original binary networks to build those networks, but hard to train them. I would try the dorefa net to do so. Thx!
It is the normal case currently for addernet.
Hi Hanting, I am trying to reproduce the experiments. When i train the BNN networks for cifar dataset using the setup in paper (SGD lr 0.1, momentum 0.9, weight decay 0.0005; batch size 256, epochs 400), but all BNN networks(VGG, ResNet20, ResNet32) just not achieve the accuracy in paper. Do you have any advise for the training for BNN? I have never tried BNN before :p
We use the Dorefa-Net for training BNN. Which method do you use?
I have tested the training and validation time of LeNet with different filters on MNIST, but it seems that the LeNet with adder filters lags so much behind even on validation using CPU without backward. It seems that the actual performance of adder filter doesn't match its theoretical improvement in speed. :(
Have you also got this result in your experiments?
It is the normal case currently for addernet.
Hi Hanting, I am trying to reproduce the experiments. When i train the BNN networks for cifar dataset using the setup in paper (SGD lr 0.1, momentum 0.9, weight decay 0.0005; batch size 256, epochs 400), but all BNN networks(VGG, ResNet20, ResNet32) just not achieve the accuracy in paper. Do you have any advise for the training for BNN? I have never tried BNN before :p
We use the Dorefa-Net for training BNN. Which method do you use?
I have tested the training and validation time of LeNet with different filters on MNIST, but it seems that the LeNet with adder filters lags so much behind even on validation using CPU without backward. It seems that the actual performance of adder filter doesn't match its theoretical improvement in speed. :(
Have you also got this result in your experiments?
Yes. The implement of conv is acclerated by several techniques so that the adder filter cannot achieve accleration without these techniques for now.
It is the normal case currently for addernet.
Hi Hanting, I am trying to reproduce the experiments. When i train the BNN networks for cifar dataset using the setup in paper (SGD lr 0.1, momentum 0.9, weight decay 0.0005; batch size 256, epochs 400), but all BNN networks(VGG, ResNet20, ResNet32) just not achieve the accuracy in paper. Do you have any advise for the training for BNN? I have never tried BNN before :p
We use the Dorefa-Net for training BNN. Which method do you use?
I have tested the training and validation time of LeNet with different filters on MNIST, but it seems that the LeNet with adder filters lags so much behind even on validation using CPU without backward. It seems that the actual performance of adder filter doesn't match its theoretical improvement in speed. :( Have you also got this result in your experiments?
Yes. The implement of conv is acclerated by several techniques so that the adder filter cannot achieve accleration without these techniques for now.
Thank you for your reply! It confused me in the experiments of addernet but now i got it :)
It is the normal case currently for addernet.
Hi Hanting, I am trying to reproduce the experiments. When i train the BNN networks for cifar dataset using the setup in paper (SGD lr 0.1, momentum 0.9, weight decay 0.0005; batch size 256, epochs 400), but all BNN networks(VGG, ResNet20, ResNet32) just not achieve the accuracy in paper. Do you have any advise for the training for BNN? I have never tried BNN before :p
We use the Dorefa-Net for training BNN. Which method do you use?
I have tested the training and validation time of LeNet with different filters on MNIST, but it seems that the LeNet with adder filters lags so much behind even on validation using CPU without backward. It seems that the actual performance of adder filter doesn't match its theoretical improvement in speed. :( Have you also got this result in your experiments?
Yes. The implement of conv is acclerated by several techniques so that the adder filter cannot achieve accleration without these techniques for now.
These are the experiments on cifar10, and weird is that AdderNet seems to overfit the train set and perform worse on the val set during the early period, and only start to learn some overall feature later after 300 epochs training. Do you know why the training curves look like that?
It is the normal case currently for addernet.
Hi Hanting, I am trying to reproduce the experiments. When i train the BNN networks for cifar dataset using the setup in paper (SGD lr 0.1, momentum 0.9, weight decay 0.0005; batch size 256, epochs 400), but all BNN networks(VGG, ResNet20, ResNet32) just not achieve the accuracy in paper. Do you have any advise for the training for BNN? I have never tried BNN before :p
We use the Dorefa-Net for training BNN. Which method do you use?
I have tested the training and validation time of LeNet with different filters on MNIST, but it seems that the LeNet with adder filters lags so much behind even on validation using CPU without backward. It seems that the actual performance of adder filter doesn't match its theoretical improvement in speed. :( Have you also got this result in your experiments?
Yes. The implement of conv is acclerated by several techniques so that the adder filter cannot achieve accleration without these techniques for now.
These are the experiments on cifar10, and weird is that AdderNet seems to overfit the train set and perform worse on the val set during the early period, and only start to learn some overall feature later after 300 epochs training. Do you know why the training curves look like that?
The magnitude of outputs in AdderNets are large, so the varience and mean counted in BN is inaccurate when the learning rate is not small.
It is the normal case currently for addernet.
Hi Hanting, I am trying to reproduce the experiments. When i train the BNN networks for cifar dataset using the setup in paper (SGD lr 0.1, momentum 0.9, weight decay 0.0005; batch size 256, epochs 400), but all BNN networks(VGG, ResNet20, ResNet32) just not achieve the accuracy in paper. Do you have any advise for the training for BNN? I have never tried BNN before :p
We use the Dorefa-Net for training BNN. Which method do you use?
I have tested the training and validation time of LeNet with different filters on MNIST, but it seems that the LeNet with adder filters lags so much behind even on validation using CPU without backward. It seems that the actual performance of adder filter doesn't match its theoretical improvement in speed. :( Have you also got this result in your experiments?
Yes. The implement of conv is acclerated by several techniques so that the adder filter cannot achieve accleration without these techniques for now.
These are the experiments on cifar10, and weird is that AdderNet seems to overfit the train set and perform worse on the val set during the early period, and only start to learn some overall feature later after 300 epochs training. Do you know why the training curves look like that?
Could you also show your training trajectory and achieved test ac curacies for CIFAR-100 dataset, thanks!
It is the normal case currently for addernet.
Hi Hanting, I am trying to reproduce the experiments. When i train the BNN networks for cifar dataset using the setup in paper (SGD lr 0.1, momentum 0.9, weight decay 0.0005; batch size 256, epochs 400), but all BNN networks(VGG, ResNet20, ResNet32) just not achieve the accuracy in paper. Do you have any advise for the training for BNN? I have never tried BNN before :p
We use the Dorefa-Net for training BNN. Which method do you use?
I have tested the training and validation time of LeNet with different filters on MNIST, but it seems that the LeNet with adder filters lags so much behind even on validation using CPU without backward. It seems that the actual performance of adder filter doesn't match its theoretical improvement in speed. :( Have you also got this result in your experiments?
Yes. The implement of conv is acclerated by several techniques so that the adder filter cannot achieve accleration without these techniques for now.
These are the experiments on cifar10, and weird is that AdderNet seems to overfit the train set and perform worse on the val set during the early period, and only start to learn some overall feature later after 300 epochs training. Do you know why the training curves look like that?
Could you also show your training trajectory and achieved test ac curacies for CIFAR-100 dataset, thanks!
It is not easy for me to transport any code or document to outside, which would cost a long time for auditing.....
You can modify the training code to train CIFAR-100 by yourself. And you can ask me if you have any problem.
It is the normal case currently for addernet.
Hi Hanting, I am trying to reproduce the experiments. When i train the BNN networks for cifar dataset using the setup in paper (SGD lr 0.1, momentum 0.9, weight decay 0.0005; batch size 256, epochs 400), but all BNN networks(VGG, ResNet20, ResNet32) just not achieve the accuracy in paper. Do you have any advise for the training for BNN? I have never tried BNN before :p
We use the Dorefa-Net for training BNN. Which method do you use?
I have tested the training and validation time of LeNet with different filters on MNIST, but it seems that the LeNet with adder filters lags so much behind even on validation using CPU without backward. It seems that the actual performance of adder filter doesn't match its theoretical improvement in speed. :( Have you also got this result in your experiments?
Yes. The implement of conv is acclerated by several techniques so that the adder filter cannot achieve accleration without these techniques for now.
These are the experiments on cifar10, and weird is that AdderNet seems to overfit the train set and perform worse on the val set during the early period, and only start to learn some overall feature later after 300 epochs training. Do you know why the training curves look like that?
Could you also show your training trajectory and achieved test ac curacies for CIFAR-100 dataset, thanks!
Yes, here you go. This is the reproduction from us, but i would note that the VGG model we use is different from the original VGG_small of the paper. We reduce the filter number to shorten the training time, otherwise we have to train the VGG_small with adder filters for about 9 days on our server.
I try to train the addernet using resnet18 for ImageNet from scratch, with 4 1080Ti cards, but it just occupies too much memory that i could only set the batch_size to 16, and it's also too too slow..
For comparision, I have tired to replace the adder filters with normal conv filters and the 4 gpu cards could load 128 batch size. Did i setup wrong, or is that the normal case currently for addernet?
Have you guys tried to train with ImageNet?