unmtransinfo / CFChemApps

Web application for visualizing chemical structures
http://3.145.25.193/depict/
1 stars 0 forks source link

Depict app file upload menu #2

Closed jeremyjyang closed 7 months ago

jeremyjyang commented 1 year ago

The Depict file upload menu should support two formats, based on RDKit capabilities:

  1. SMILES
  2. MDL Molfile

The SMILES format is specified with some variation by Daylight, ChemAxon, and RDKit. However, at minimum we should support a file with lines which begin with SMILES, and optionally, followed by a whitespace (space or tab) and name/ID.

Priyansh-Kedia commented 1 year ago

@jeremyjyang I have fixed the CSV file variations as mentioned.

I could not find multi molecule Mol Files over the internet to test the variations with MDL file. Please let me know where I can find such files to test.

Thanks

jeremyjyang commented 1 year ago

See the rdkit repo for lots of example files, such as https://github.com/rdkit/rdkit/blob/master/Projects/DbCLI/testData/pubchem.200.sdf, and https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/test_data/compounds.smi, and many more SMILES and MDL Molfile (SDF) format files.

Priyansh-Kedia commented 1 year ago

Thank you Professor

On Wed, May 24, 2023, 12:01 Jeremy J Yang @.***> wrote:

See the rdkit repo for lots of example files, such as https://github.com/rdkit/rdkit/blob/master/Projects/DbCLI/testData/pubchem.200.sdf, and https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/test_data/compounds.smi, and many more SMILES and MDL Molfile (SDF) format files.

— Reply to this email directly, view it on GitHub https://github.com/unmtransinfo/CFChemApps/issues/2#issuecomment-1561465858, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMRYYAIZXZB7UWWVAEKUESDXHYWHLANCNFSM6AAAAAAYCXAI7Q . You are receiving this because you commented.Message ID: @.***>

Priyansh-Kedia commented 1 year ago

The code is now fixed so that I can output the multiple compounds along with the structures.

jeremyjyang commented 1 year ago

drugs.smi.gz

The app fails using the attached drugs.smi, due to the line with "CCO" and no space nor name. This should be supported. SMILES files may have only one column, just the SMILES.

Priyansh-Kedia commented 1 year ago

drugs.smi.gz

The app fails using the attached drugs.smi, due to the line with "CCO" and no space nor name. This should be supported. SMILES files may have only one column, just the SMILES.

The SMI and SDF files have a lot of different formats as I checked on the internet. I will confirm this in the meeting tomorrow and work on the fix

Thank you

jeremyjyang commented 1 year ago

Good plan. Please prioritize learning from documentation and examples from cheminformatics community/industry leaders: (1) RDKit, (2) OpenEye, (3) ChemAxon, and (4) Daylight (https://www.daylight.com/), no longer in business, but inventors of SMILES.

Jack-42 commented 7 months ago

Although it was being inferred before, pull request #17 allows for explicitly setting the file type for SMILES/MDL MolFile. We're using RDKit to read both kinds of files, which should be robust to differences in formatting (within reason). I'm going to close this as completed for now.