rxn4chemistry / rxnmapper

RXNMapper: Unsupervised attention-guided atom-mapping. Code complementing our Science Advances publication on "Extraction of organic chemistry grammar from unsupervised learning of chemical reactions" (https://advances.sciencemag.org/content/7/15/eabe4166).
http://rxnmapper.ai
MIT License
286 stars 68 forks source link

Script to finetune model? #18

Closed chaoyan1037 closed 3 years ago

chaoyan1037 commented 3 years ago

Hi @pschwllr,

Thanks for sharing your mdoel! We found the trained model's performance on our dataset is not satisfactory and would like to finetune the model on our dataset. Could you please also share the training scripts?

pschwllr commented 3 years ago

What kind of dataset is it?

The beauty of our work on rxnmapper is that the model is not fine-tuned on atom-mapping but learned the atom-mapping signal without supervision.

What we have used for training is a simple masked language modelling task (where we corrupt the reaction SMILES and let the model predict the masked tokens). We have tutorials how to train reaction language models in our rxnfp repo (https://rxn4chemistry.github.io/rxnfp/), e.g: https://github.com/rxn4chemistry/rxnfp/blob/master/nbs/08_training_smiles_language_model_from_scratch.ipynb

Please note that for rxnmapper we trained an ALBERT model and not a BERT model.