Closed Nanotekton closed 3 years ago
I guess I found a solution, though it's counterintuitive. In training_scripts/launch_suzuki_miyaura_training.py
, parameter "evaluate_during_training"
in model_args
(line 87) should be set to True
.
However, my question about the meaning of tilde is still open.
I agree, it's counterintuitive but I'm glad you found a solution.
In the Buchwald dataset, the aim was to describe the original catalyst structure (Ahneman et al., Science 360, 186–190 (2018)):
The tricky part is the Nitrogen atom (RDKit will raise an exception if the explicit valence is 4).
I've seen that the canonical version of that SMILES generated by RDKit results in:
Hence, this choice was not ideal. Now, I would probably go for another catalyst representation like: O=S(=O)(O[Pd-]1[NH2+]C2C=CC=CC=2C2C=CC=CC1=2)C(F)(F)F
Anyhow, I would not expect this to significantly change the performance of the models.
For the Suzuki dataset, the ~
in CC(=O)O~CC(=O)O~[Pd]
is used as fragment group bond to keep the "Pd(OAc)2" compound together in the reaction string. We introduced this fragment group bond in Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. We discuss it in the SI of that article.
Thanks, now I know everything I wanted. Closing issue.
Hi! I've successfully trained from scratch the Buchwald model with your training/evaluation scripts. However, in the case of Suzuki reaction I'm getting negative R2, the model seems to not learn at all. Could you confirm the hyperparameters provided are correct? (the data should be ok, as I was able to get nice results with saved models downloaded from this repo).
As a side note: what's the meaning of '~' in SMILES representation of Pd complexes? Buchwald dataset suggest sth like a coordination bond, however, in the Suzuki dataset it resembles more a special separator. I think it was in described in some paper, but I couldn't find it.