Closed mo-morikawa closed 5 years ago
Hi @mo-morikawa,
That sounds strange. If you use the pretrained model, and sync the Caffe version to the suggested version, and use the provided evaluation code, you should get the same metrics (with minor difference resulting from GPU).
If all of the above are satisfied and you still get different results, try running the inference on CPU.
I tried the Stanford Online Products Dataset's test set on the pretrained model and obtained the following results:
Recall at 1 = 0.368929952729 Recall at 10 = 0.635218670457 Recall at 100 = 0.79597368682 Recall at 1000 = 0.907060923606
These values seem to be less than those reported on the CVPR paper, Fig. 9. What could be the reason for this?