USPTO-full top-50 acc results

fiberleif commented 5 days ago

Dear authors,

Thanks for releasing your code. Regarding the top-20, and top-50 results in the Readme file, can you tell how did you obtain this?

fiberleif commented 5 days ago

since in the uspto-Full dataset, augmentation size = 5, and the default beam size = 10, so ideally we have 50 outputs for each test sample, then l used the below command to evaluate the released checkpoint:

python score.py \ -beam_size 10 \ -n_best 50 \ -augmentation 5 \ -targets ./USPTO_full_PtoR_aug5/test/tgt-test.txt \ -predictions ./USPTO_full_PtoR-translate-results-20240705.txt \ -process_number 8 \ -score_alpha 1 \ -save_file ./full_eval_results.txt \ -source ./USPTO_full_PtoR_aug5/test/src-test.txt

fiberleif commented 5 days ago

It seems that for k=1, 3, 5, 10, my result is close to the report number from the paper, but for k=20, 50, the difference seems to be large, so l am wondering if there are some misunderstandings from my side?

otori-bird commented 5 days ago

The argument n_best and beam_size usually are the same. You could try using beam_size=50 when predicting and scoring.

fiberleif commented 2 days ago

The argument n_best and beam_size usually are the same. You could try using beam_size=50 when predicting and scoring.

Thanks for your kind reply, l successfully reproduced the top-50 results using beam_size = 50 and your processed test set (size = 96023, not 101311 in raw_test.csv).

Did you use this filtered test set for all the baselines in Table 5, or not?

otori-bird commented 2 days ago

The argument n_best and beam_size usually are the same. You could try using beam_size=50 when predicting and scoring.

Thanks for your kind reply, l successfully reproduced the top-50 results using beam_size = 50 and your processed test set (size = 96023, not 101311 in raw_test.csv).

Did you use this filtered test set for all the baselines in Table 5, or not?

If the result was implemented by us, it should be the same test set.

fiberleif commented 2 days ago

Thank you! Does "implemented by us" mean "marked with a 'c'" symbol (which is explained in the paper as "Denotes that the result is implemented by the open-source code with well-tuned hyperparameters")?

In this case that would be only LocalRetro?

otori-bird commented 2 days ago

Thank you! Does "implemented by us" mean "marked with a 'c'" symbol (which is explained in the paper as "Denotes that the result is implemented by the open-source code with well-tuned hyperparameters")?

In this case that would be only LocalRetro?

Yes. If you are interested in the result of original dataset, you could be free to have a try.

danielFarfan18 commented 1 day ago

Dear @otori-bird and @fiberleif I'm trying to replicate R-SMILES on the USPTO FULL dataset as well. I followed all the instructions mentioned in the paper, but I couldn't achieve the same results. I used a beam size and n-best of 50, and the training was conducted on a V100. Do you have any recommendations or suggestions for improving the results or identifying potential issues in my approach?

R-SMILES_results

fiberleif commented 19 hours ago

Full dataset l was using (from Google drive shared by authors):

Training script l was using:

https://github.com/otori-bird/retrosynthesis/blob/main/train-from-scratch/PtoR/PtoR-Full-aug5-config.yml

After training, l also used the average checkpoint script to obtain the final checkpoint for inference & scoring:

https://github.com/otori-bird/retrosynthesis/blob/main/train-from-scratch/PtoR/PtoR-Full-aug5-average_models.sh

Inference script l was using:

https://github.com/otori-bird/retrosynthesis/blob/main/train-from-scratch/PtoR/PtoR-Full-aug5-translate.yml

danielFarfan18 commented 12 hours ago

Thank you for your assistance. I will repeat the experiment, as the training was interrupted three times, which might have caused the issue. Interestingly, when I replicated the USPTO-MIT model, I achieved the expected results.

fiberleif commented 56 minutes ago

@otori-bird Dear author, l found that the released checkpoint of USPTO full dataset (aka, USPTO_full_PtoR.pt) has total parameter size of 44,529,405, but when l was using the default train from scratch script: https://github.com/otori-bird/retrosynthesis/blob/main/train-from-scratch/PtoR/PtoR-Full-aug5-config.yml. The model has 44,501,739 total parameters.

l used the same OpenNMT-py==2.2.0 as denoted in the Readme.txt. Could you please tell why they are different size, or how did you train your released checkpoint for the uspto-full dataset?

otori-bird / retrosynthesis

USPTO-full top-50 acc results #16