Open Waterdrop-One opened 9 months ago
Nice catch! We have tried to implement a postprocessing algorithm to cover common patterns of abbreviations, but I do think it is challenging to cover all cases. If you manage to design a more principled and robust method, I believe it would be a significant contribution.
Thank you for your MolScribe model. It is very powerful and has high accuracy. However, during testing, I discovered a BUG in the post-processing. In this example, when (CH2)5 replaces the R group, two bonds are detected around the R group [chemistry.py line 434], resulting in line 435 get_smiles_from_symbol(symbol, mol_w, atom, bonds) return '(=C([H]))C([H])C([H])C([H])C([H])' Two single bonds were merged into one double bond, causing the mol conversion to fail. I'm trying to fix this bug but I don't have a clue yet.