thomas0809 / MolScribe

Robust Molecular Structure Recognition with Image-to-Graph Generation
MIT License
153 stars 31 forks source link

How to generate Synthetic data using ChemDraw? & How to randomly change bond length using Indigo? #27

Closed yyaaa1 closed 4 months ago

yyaaa1 commented 4 months ago

Hello, thank you for your great work!

After reading your paper and code, I have two questions:

  1. In the SI, you mentioned that one of the benchmark dataset was automatically generated by ChemDraw. Would you be willing to share how you accomplished this? I was not aware of the existence of a ChemDraw Drawing API.

  2. In the data augmentation section, you stated, "We varied the rendering options from Indigo when generating the synthetic data, such as the font, bond width, and bond length," but I am unable to locate the code that randomly changes the bond length. Could you please point me in the right direction? Thank you.

thomas0809 commented 4 months ago

Hi,

Thanks for the questions!

  1. The ChemDraw images are generated by simulating mouse clicking on the ChemDraw software.
  2. We alter relative bond thickness, which effectively serves a way to change bond length. https://github.com/thomas0809/MolScribe/blob/main/molscribe/dataset.py#L285
yyaaa1 commented 4 months ago

Thank you for your detailed response. I have one more request: could you possibly release the source code of your customized version of Indigo, as I would like to continue developing it further?

thomas0809 commented 4 months ago

Could you please send an email to yujieq@csail.mit.edu and let's communicate there?

yyaaa1 commented 4 months ago

of course