thomas0809 / MolScribe

Robust Molecular Structure Recognition with Image-to-Graph Generation
MIT License
153 stars 31 forks source link

Problem with color-highlighted image #30

Open sistar2020 opened 2 months ago

sistar2020 commented 2 months ago

MolScribe does not give a correct SMILES when a molecule is highlighted. E.g., If a molecule is highlighted,

with-highlights

I get an incorrect SMILES:

NC(S)=C1C(C2=CC=CC(C(F)(F)F)=C2)=C2=C34(=C=CC5=C6C=3C6(=C=4)C(=CC=CC3=C4C6=CC4(NC4=NC=NC7=C4N=CN7C4OC(CO)C(O)C4O)=C=63C=CC=CNC5=O)C3(=O)=O=C(C=C2)N3)C2=C(C3=CC=CC(C(F)(F)F)=C3)C(C(=O)C3=CC=CC=C3)=C1(N)S2
incorrect-smiles

But if the highlights are removed:

without-hightlights

I get a correct SMILES:

Nc1sc(-c2ccc(C(=O)N/C=C/C=C/CCNc3ncnc4c3ncn4[C@@H]3O[C@H](CO)[C@@H](O)[C@H]3O)cc2)c(-c2cccc(C(F)(F)F)c2)c1C(=O)c1ccccc1
correct-smiles

\

Can you fix this issue?

LingjieBao1998 commented 2 months ago

Can I ask you for the code about how to remove highlights?

sistar2020 commented 2 months ago

I don’t have a code of my own for removing highlights from a molecule image. But I was able to do this by using Kohulan’s DECIMER-Image-Segmentation (https://github.com/Kohulan/DECIMER-Image-Segmentation). After creating a conda environment for DECIMER-Image-Segmentation, I just executed ‘python segment_structures_in_document.py highlighted-image.png’. Then I got a few images with highlighted areas removed. That’s it! Because MolScribe runs 5-10 times faster than DECIMER, I’ll be very happy to see MolScribe does some pre-treatments on images to separate molecules from the rest and predict SMILES with a higher confidence on such images. I think MolScribe is already fantastic. Keep up the good work.

iOS용 Outlookhttps://aka.ms/o0ukef 다운로드


보낸 사람: LingjieBao1998 @.> 보낸 날짜: Saturday, September 7, 2024 2:45:23 PM 받는 사람: thomas0809/MolScribe @.> 참조: sistar2020 @.>; Author @.> 제목: Re: [thomas0809/MolScribe] Problem with color-highlighted image (Issue #30)

Can I ask you for the code about how to remove highlights?

— Reply to this email directly, view it on GitHubhttps://github.com/thomas0809/MolScribe/issues/30#issuecomment-2335053542, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A2WCQIKZGFKT2V4JGHMXX3LZVKHHHAVCNFSM6AAAAABNZG4H2KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZVGA2TGNJUGI. You are receiving this because you authored the thread.Message ID: @.***>