Closed gaopeng-eugene closed 7 years ago
It is not trained from scratch, both in caffe and pytorch, we use an MSCOCO pretrained initialization, more details about which you can find in the readme. I am not sure as to why the 3 point difference exists, it maybe due to some differences in the internals of pytorch and caffe. I have tried many variations of training regime including variations on iter_size, weight_decay but they didnt help improve results. If you are able to improve results, please let me know.
As said in your README, training from scratch by using your code is 3 point worse than the caffe training. Can you point out the reason?
Best Wishes