isayevlab / Auto3D_pkg

Auto3D generates low-energy conformers from SMILES/SDF
MIT License
146 stars 32 forks source link

Header in .smi file causes a type error - add a check for header? #31

Closed cjgalvin closed 1 year ago

cjgalvin commented 1 year ago

In order to create a .smi file, I used RDKit's PandasTools features. This was the first search result for me when searching "rdkit create smi file". The SaveSMILESfromFrame does exactly what it says, but it includes a header row with 'SMILES' and 'ID'.

When I subsequently tried to run Auto3D on the .smi file, it got the following error:

Checking input file...
        There are 1652 SMILES in the input file data/to_conform.smi. 
        All SMILES and IDs are valid.
[12:37:05] SMILES Parse Error: syntax error while parsing: SMILES
[12:37:05] SMILES Parse Error: Failed parsing SMILES 'SMILES' for input: 'SMILES'

Deleting the header row led to a successful run of Auto3D.

However, if I try to create a pipeline where I first load a CSV of SMILES and other data into a dataframe, then create a temporary .smi file to use with Auto3D, I'm going to run into this problem every time.

It does not appear that RDKit has an option to not include the header. Since that is also the first search result, I suspect several users may try to use the same approach I did.

So I have two questions: 1) Is there some other preferred way to generate the .smi file that avoids this issue? I suspect the developers did not have this issue, otherwise there would be a header row check. 2) As a solution, does it make sense to add a header row check from the point in the link below? https://github.com/isayevlab/Auto3D_pkg/blob/f463e4fd072d3e219b709cfb2b127146db05339c/src/Auto3D/utils.py#L93

LiuCMU commented 1 year ago

Hi, thanks for the clear description!

  1. You can control whether or not to include a header in the smi file using RDKit. To not include the header, add includeHeader=False in your SmilesWriter. An example is shown below:
    
    from rdkit import Chem
    from rdkit.Chem import SmilesWriter

create a simple molecule list

mols.append(Chem.MolFromSmiles("CC")) mols.append(Chem.MolFromSmiles("NCC"))

write them into the .smi file

with SmilesWriter("mols_no_header.smi", includeHeader=False) as f: for m in mols: f.write(m)



There are many ways to write a smi file due to its simple format. I usually treat it as a text file and write it with Python line by line, where each line is a SMILES + space + ID. 

2. Thanks for the proposal. I could definitely improve it in the future :) 
cjgalvin commented 1 year ago

Thank you for providing the SmilesWriter solution. I will use that as it solves the exact problem I described. Thank you!