Accounting for multiple reactant sets in top-k accuracy calculations

ghost commented 1 year ago

Hello, thanks for this amazing work!

I wanted to know how is top-k accuracy calculated if a (retrosynthesis) template application produces more than one set of reactants? For example, a Suzuki reaction template application at a specific C1(aryl)-C2(aryl) bond might produce two sets of reactants: C1Br.C2B(O)O and C1B(O)O.C2Br

How does your code account for this in exact match top-k accuracy calculations? Thank you.

shuan4638 commented 1 year ago

The Suzuki coupling is recorded by the template [c:1]-[c:2]>>Br-[c:1].O-B(-O)-[c:2]

Note that the bond in the input of LocalRetro (dglgraph) is directional （edge AB and edge BA are different), so prediction on edge AB is different from prediction in edge BA. That is, if the reaction template is predicted on edge [C1, C2], you will get C1Br.C2B(O)O. If the reaction template is predicted on edge [C2, C1], you will get C2Br.C1B(O)O.

With that being said, I am not sure where did you saw both reactant sets at the same prediction. If you've seen any, it is not what we intended and we will fix it right away.

Could you provide me an example of such case?

Thanks,

ghost commented 1 year ago

Hi, thanks for your reply. I didn't realize that the GNN was directional - this makes sense. Thank you.

shuan4638 commented 1 year ago

More precisely, the edge of graph is directional, not GNN (GNN is also directional but not related to this case). Since this is not an acutally existing issue so I will just close it.

kaist-amsg / LocalRetro

Accounting for multiple reactant sets in top-k accuracy calculations #16