Ignoring stereochemistry in evaluation

mikolajsacha commented 1 year ago

Hi, congratulations on a great publication and very good results.

I see in your evaluation script (under https://github.com/kaist-amsg/LocalRetro/blob/main/Top_K_accuracy.ipynb) that you enumerate all possible stereoisomers during evaluation, effectively ignoring the stereochemistry of ground truth reactants. This is realized by the isomer_match function. You also write in the paper:

We note that some ground-truth reactants with stereocenters do not have the exact stereoinformation specified, and for these cases, the predicted reactants are considered to be correct as long as all the atoms and bond connectivities are correct.

If I understand correctly, the methods compared in your paper don't do this and treat a prediction as incorrect even if it differs from the ground truth only in stereochemistry. Could you verify whether I am right on this, and if so, is the comparison to other methods fair?

Best regards, Mikołaj

shuan4638 commented 1 year ago

Hi, We are doing this because we notice few reactions in USPTO-50K dataset show the inconsistency in stereochemistry between reactant and product. For instance if you use rdkit to check the 270th reaction in test set (I removed the atom-mapping for clearer display): COC(=O)[C@@H](C)NC(=O)C(Cc1c(C)cc(O)cc1C)NC(=O)OC(C)(C)C>>Cc1cc(O)cc(C)c1C[C@H](NC(=O)OC(C)(C)C)C(=O)N[C@H](C)C(=O)O You can see the stereochemistry of the atoms in the reactant far from the reaction center is changed to unspecified compared to the product, which is not chemically reasonable (at least in this demethylation reaction). Therefore we consider Cc1cc(O)cc(C)c1C[C@H](NC(=O)OC(C)(C)C)C(=O)N[C@H](C)C(=O)OC is also a correct answer.

With the function isomer_match, we do not simply "ignore" the stereochemistry of the ground truth. We consider the prediction is correct if the prediction is "included" in the stereo configurations of ground truth reactant or the configurations of prediction include the ground truth. If the prediction is not included in the ground truth (let's say if the ground truth is Cc1cc(O)cc(C)c1C[C@@H](NC(=O)OC(C)(C)C)C(=O)N[C@H](C)C(=O)OC), then the prediction is considered as wrong.

With this function the comparison to other methods may be unfair, but the practical evaluation of prediction accuracy considering the reaction data quality is important to know the model performance.

Shuan

shuan4638 commented 1 year ago

I am closing this issue since there is no further discussion on this.

mikolajsacha commented 1 year ago

Hello again. Sorry for reopening the issue after several months (I took a break from working on this topic for a while).

While I understand your point of view and might agree that your evaluation method makes more sense, I think it is misleading that you put the results of your evaluation in the same table as other methods, which use the "strict" comparison in evaluation, without mentioning the difference. A proper approach would be to create a separate table where all the baselines are compared using your evaluation method. I verified the difference on a method that I am currently working on. The difference between comparing a "strict" match and using the comparison from your code is ~2-3% in Top K accuracy. Mikołaj

shuan4638 commented 1 year ago

Hi @mikolajsacha,

Thanks for the comment, I updated the code and put the two different accuracies at the bottom of README.md.

kaist-amsg / LocalRetro

Ignoring stereochemistry in evaluation #12