Training dataset of rxnmapper?

rxn4chemistry / rxnmapper

RXNMapper: Unsupervised attention-guided atom-mapping. Code complementing our Science Advances publication on "Extraction of organic chemistry grammar from unsupervised learning of chemical reactions" (https://advances.sciencemag.org/content/7/15/eabe4166).

http://rxnmapper.ai

MIT License

286 stars 68 forks source link

Training dataset of rxnmapper? #23

Closed Sunggi99 closed 2 years ago

Sunggi99 commented 2 years ago

I wonder whether the trained rxnmapper model was trained by the 1k data if USPTO50K described in the paper. If not, is it possible to provide the model trained by the 1k reactions? Any help about above questions would be greatly appreciated.

pschwllr commented 2 years ago

The 1k data was just used to evaluate the different trained models and select one to test on the remaining USPTO50k reactions. They were not part of the training.

The model was trained using an unsupervised task (masked language modelling) on the USPTO reactions without atom-mapping that you find here: https://ibm.ent.box.com/v/RXNMapperData/folder/112973143084.