moleculemaker / chemscraper-frontend

MIT License
1 stars 0 forks source link

confirm successful pipeline execution for previously troublesome examples #93

Open matthewberry opened 6 months ago

matthewberry commented 6 months ago

https://drive.google.com/file/d/1XdcIxnAAZN604CCjfbA2CCtpc-hpaCA5/view?usp=drive_link

https://drive.google.com/file/d/1k3TWhGPj-2n4HykelRnC-kPuUUhkety5/view?usp=drive_link

KastanDay commented 5 months ago

Still bad outputs:

BAD: PDF Bimetallic Oxidation -- total failure in localization and parsing/interpretation of chemicals

Image

Image

PDF Pathway design -- better localization, but bad parsing/interpretation of chemicals

Image

Near-perfect localization (see failure case below), but very bad parsing of the structure.

Image

Very bad parsing... looks wrong. Image

Improper localization when arrows are included: Image

Simple drawings worked: Image

KastanDay commented 4 months ago

Using GPT-4 Vision to find SMILES strings

Image SMILES from GPT: c1cc(c(cc1C(=O)O)O)C(=O)O Image generated from smiles string. ✅ It looks correct! Image

Image

SMILES from GPT: O=c1cc(O)c(O)c(O)c1 Image generated from smiles string: ❌ it looks close but slightly wrong. Image

2nd attempt using GPT: C1=C(C=C(C(=C1[N+](=O)[O-])[N+](=O)[O-])C(=O)O)[N+](=O)[O-] GPT said: "Upon reviewing the structure of the molecule in the image, it seems to be 2,4,6-trinitrobenzoic acid, also known as picric acid with an additional carboxylic acid group on the benzene ring." ❌ Still wrong :( Image

Code to render molecules:

# SMILES stirng TO IMAGE 
from rdkit import Chem
from rdkit.Chem import Draw

smiles = 'C1=C(C=C(C(=C1[N+](=O)[O-])[N+](=O)[O-])C(=O)O)[N+](=O)[O-]'

# Attempt to create and sanitize the molecule
molecule = Chem.MolFromSmiles(smiles, sanitize=False)  # Create the molecule without sanitization
Chem.SanitizeMol(molecule)  # Attempt to sanitize the molecule
image = Draw.MolToImage(molecule)
image.show()