otori-bird / retrosynthesis

MIT License
55 stars 13 forks source link

Dataset sources #14

Open scottmreed opened 3 days ago

scottmreed commented 3 days ago

Thanks for sharing this work. Could you clarify what files are the sources for the three training sets:

USPTO-50K: dataset/USPTO_50K/raw_train.csv USPTO-MIT: dataset/USPTO-MIT/train.txt USPTO_full: dataset/USPTO_full/raw_train.csv

There are multiple files named raw_train.csv in the GLN dropbox link for uspto_multi and schneider50k but no files named train.txt.

otori-bird commented 3 days ago

schneider50k in the GLN is the source of uspto-50k and uspto-multi is the source of uspto-full.