why training from scratch is worse than caffe implementation?

isht7 / pytorch-deeplab-resnet

DeepLab resnet v2 model in pytorch

MIT License

602 stars 118 forks source link

why training from scratch is worse than caffe implementation? #12

Closed gaopeng-eugene closed 7 years ago

gaopeng-eugene commented 7 years ago

As said in your README, training from scratch by using your code is 3 point worse than the caffe training. Can you point out the reason?

Best Wishes

isht7 commented 7 years ago

It is not trained from scratch, both in caffe and pytorch, we use an MSCOCO pretrained initialization, more details about which you can find in the readme. I am not sure as to why the 3 point difference exists, it maybe due to some differences in the internals of pytorch and caffe. I have tried many variations of training regime including variations on iter_size, weight_decay but they didnt help improve results. If you are able to improve results, please let me know.