Script to finetune model?

rxn4chemistry / rxnmapper

RXNMapper: Unsupervised attention-guided atom-mapping. Code complementing our Science Advances publication on "Extraction of organic chemistry grammar from unsupervised learning of chemical reactions" (https://advances.sciencemag.org/content/7/15/eabe4166).

MIT License

286 stars 68 forks source link

What kind of dataset is it?

The beauty of our work on rxnmapper is that the model is not fine-tuned on atom-mapping but learned the atom-mapping signal without supervision.

What we have used for training is a simple masked language modelling task (where we corrupt the reaction SMILES and let the model predict the masked tokens). We have tutorials how to train reaction language models in our rxnfp repo (https://rxn4chemistry.github.io/rxnfp/), e.g: https://github.com/rxn4chemistry/rxnfp/blob/master/nbs/08_training_smiles_language_model_from_scratch.ipynb

Please note that for rxnmapper we trained an ALBERT model and not a BERT model.

rxn4chemistry / rxnmapper

Script to finetune model? #18