thomas0809 / MolScribe

Robust Molecular Structure Recognition with Image-to-Graph Generation
MIT License
154 stars 33 forks source link

regarding stereochemistry #15

Closed shubbey closed 8 months ago

shubbey commented 10 months ago

Really cool program! It successfully converts most of my test cases. However, it often gets stereochemistry wrong, even for simple molecules and a reported high-degree of confidence. I just want to make sure I'm not doing anything wrong. Here is an example of what I mean:

Input: in

Output: out

(flipping the mirror image shows we have incorrect stereochemistry).

I scaled the input to 384x384 before running although it's not clear to me if that's necessary. Importing the same image in different sizes leads to different results (and sometimes it gets it right).

Is this just the nature of the model or is there something more I should be doing here? Thanks!

thomas0809 commented 10 months ago

Hi,

Thanks for your feed back and bringing this error to my attention! In MolScribe, we have designed techniques to explicitly verify stereochemistry on top of model predicted atoms and bonds, which significantly improve its performance on stereochemistry (see the Stereochemistry section and Figure 5 in our paper). However, our current model might still make some mistakes. In the example you shared, I see the model may make a mistake on predicting the type of the C-O bond, which leads to the final error. The model could be further improved with a better training pipeline.

In general, I would suggest taking a look at the molfile predicted by MolScribe, which contains more information than SMILES.

shubbey commented 10 months ago

Thank you for the quick response. I will take a look at the molfiles to see if they provide anything I can use here. A couple of unrelated questions (sorry if wrong thread):

thomas0809 commented 10 months ago

Does the input image size matter?

I believe the input image size won't make a big difference if within a reasonable range, but the model may have some variance in its predictions.

Is this supposed to be compatible with torchscript? I see a few notes in the files and there are some torch.jit annotations, but I've had to make some initial corrections to get anything to work via torch.jit.script. My use case is running this in a windows program so I need to port the model accordingly (of course the image processing part will be done separately).

I am not familiar with TorchScript. So unfortunately I cannot give you too much advice on this question.