Closed mikolajsacha closed 1 year ago
Hi,
We are doing this because we notice few reactions in USPTO-50K dataset show the inconsistency in stereochemistry between reactant and product. For instance if you use rdkit to check the 270th reaction in test set (I removed the atom-mapping for clearer display):
COC(=O)[C@@H](C)NC(=O)C(Cc1c(C)cc(O)cc1C)NC(=O)OC(C)(C)C>>Cc1cc(O)cc(C)c1C[C@H](NC(=O)OC(C)(C)C)C(=O)N[C@H](C)C(=O)O
You can see the stereochemistry of the atoms in the reactant far from the reaction center is changed to unspecified compared to the product, which is not chemically reasonable (at least in this demethylation reaction). Therefore we consider Cc1cc(O)cc(C)c1C[C@H](NC(=O)OC(C)(C)C)C(=O)N[C@H](C)C(=O)OC
is also a correct answer.
With the function isomer_match
, we do not simply "ignore" the stereochemistry of the ground truth. We consider the prediction is correct if the prediction is "included" in the stereo configurations of ground truth reactant or the configurations of prediction include the ground truth. If the prediction is not included in the ground truth (let's say if the ground truth is Cc1cc(O)cc(C)c1C[C@@H](NC(=O)OC(C)(C)C)C(=O)N[C@H](C)C(=O)OC
), then the prediction is considered as wrong.
With this function the comparison to other methods may be unfair, but the practical evaluation of prediction accuracy considering the reaction data quality is important to know the model performance.
Shuan
I am closing this issue since there is no further discussion on this.
Hello again. Sorry for reopening the issue after several months (I took a break from working on this topic for a while).
While I understand your point of view and might agree that your evaluation method makes more sense, I think it is misleading that you put the results of your evaluation in the same table as other methods, which use the "strict" comparison in evaluation, without mentioning the difference. A proper approach would be to create a separate table where all the baselines are compared using your evaluation method. I verified the difference on a method that I am currently working on. The difference between comparing a "strict" match and using the comparison from your code is ~2-3% in Top K accuracy. Mikołaj
Hi @mikolajsacha,
Thanks for the comment, I updated the code and put the two different accuracies at the bottom of README.md.
Hi, congratulations on a great publication and very good results.
I see in your evaluation script (under
https://github.com/kaist-amsg/LocalRetro/blob/main/Top_K_accuracy.ipynb
) that you enumerate all possible stereoisomers during evaluation, effectively ignoring the stereochemistry of ground truth reactants. This is realized by theisomer_match
function. You also write in the paper:If I understand correctly, the methods compared in your paper don't do this and treat a prediction as incorrect even if it differs from the ground truth only in stereochemistry. Could you verify whether I am right on this, and if so, is the comparison to other methods fair?
Best regards, Mikołaj