Sorry for bothering you. I have reproduced your work several times, including using the pre-trained model you provided in https://github.com/igashov/RetroBridge/blob/main/configs/retrobridge.yaml as well as retraining the model with the best parameters you suggested (the batch_size is compressed to 32 due to limitation of GPU memory). The following tables show our experimental results. Though the round-trip accuracy is on par with the published results, the exact match accuracy is significantly lower than what was reported.
Round-Trip Accuracy on USPTO-50k (%)
Top-1
Top-3
Top-5
Top-10
Retrobridge (directly evaluate the offered checkpoint)
83.96
72.75
70.47
70.22
Retrobridge (re-train from scratch, batch-size: 32)
83.66
72.46
69.81
69.40
Exact Match Accuracy on USPTO-50k (%)
Top-1
Top-3
Top-5
Top-10
Retrobridge (directly evaluate the offered checkpoint)
47.79
67.01
71.28
73.74
Retrobridge (re-train from scratch, batch-size: 32)
48.37
66.95
70.94
72.82
I wonder whether there is something different from the hyperparameter setting of your best model, or is there something wrong in the evaluation codes?
Sorry for bothering you. I have reproduced your work several times, including using the pre-trained model you provided in https://github.com/igashov/RetroBridge/blob/main/configs/retrobridge.yaml as well as retraining the model with the best parameters you suggested (the batch_size is compressed to 32 due to limitation of GPU memory). The following tables show our experimental results. Though the round-trip accuracy is on par with the published results, the exact match accuracy is significantly lower than what was reported.
I wonder whether there is something different from the hyperparameter setting of your best model, or is there something wrong in the evaluation codes?