Closed jaewanlee93 closed 1 year ago
Some of the reactions have multiple correct mappings (even after canonicalisation), e.g.:
CC(C)(C)N(CC(=O)[O-])C(=O)C1=C(O)C2(CCOCC2)c2c(Cl)cccc2C1=O>>C.C.C.C.O=C(O)CNC(=O)C1=C(O)C2(CCOCC2)c2c(Cl)cccc2C1=O
With 4 equivalent carbon atoms in the product.
We tried to capture that in correct_maps
from the json file. Using those, you should be able to reproduce the results from Figure 4 (make sure to load the versions of the modules we had back then - no guarantee for newer versions of RDKit, etc..).
@pschwllr Thanks for quick answering. But the question remains unsolved.
1) How did you get correct_maps
then? I mean, there should be the preprocessing for making the correct_maps
from reactions in the json file. And in the correct_maps
from the json file, there are some empty lists and the length of lists are different (some are 0 some are 1 and some are 2). if list is empty, is that means no ground truth?
2) Then when you get the results of figure4, did you compare atom indices (output of process_reaction_with_product_maps_atoms
) and correct_maps
?
Hi all, I’m hoping to get some help/ information to reproduce the result of figure 4(a) in the paper.(Extraction of organic chemistry grammar from unsupervised learning of chemical reactions) I want to utilize rxnmapper for making a pipeline, but before using this, I tested the performance of rxnmapper by reproducing the figure 4(a). But the accuracy is lower than I expected. Therefore I want to ask 2 questions.
In that file, there are ‘rxn’ and ‘CORRECT MAPPING’. I used ‘CORRECT MAPPING’ values as ground truths and used ‘rxn’ values as input for rxnmapper model. And I compared outputs of rxnmapper and ‘CORRECT MAPPING’ values to get accuracy.
In this case accuracy was False. And there were 248 False case out of 682 cases. (process_reaction_with_product_maps_atoms function is imported from smiles_utils.py in the rxnmapper directory.) And accuracy is following(Among USPTO 281 cases used in figure 4(a)): Number of bond changes : accuracy 1: 78% 2: 88% 3: 74% 4: 78% 5: 58% 6: 87%
So, is there a difference between the way I compared and the way you compared? or did you do further preprocess? if so, can you let me know?
accuracy
as the ratio of identical cases out of the total cases. Any help would be appreciated.