mSorok / NaturalProductsOnline

Website code for COCONUT
https://coconut.naturalproducts.net/
33 stars 11 forks source link

Single SDF file #118

Closed Smita-Biophysicslab closed 11 months ago

Smita-Biophysicslab commented 11 months ago

Respected Sir, I am a beginner, trying to use the LOTUS database for drug discovery. I am trying to split single sdf file using python script. In single SDF file, i observed 0 coordinates. So the single sdf file is splitted but there is no 3d structure in individual sdf. Another problem is while splitting the file name is not according to LOTUS ID. So its little difficult to recognize the molecule properties individually.

How to use the single sdf file.

Waiting for your response. Thank you.

steinbeck commented 11 months ago

Dear Smita-Biophysicslab, I can confirm that the SDF file contains no coordinates. The reason is that for none of the structures, we have any experimental coordinates, so whatever we put in there, you can as well generate them yourself. I am not sure that I understand your problem with the LOTUS ID. Whatever it is, it is due to your python script, and without seeing the script, we cannot decide the problem.

Smita-Biophysicslab commented 11 months ago

Thank you for your response.

I followed a python script as follows:

f= "LOTUS_2021_03_simple" split_number= 100 number_of_sdfs = splitnumber i=0 j=0 f2=open(f+''+str(j)+'.sdf','w') for line in open(f+'.sdf'): f2.write(line) if line[:4] == "$$$$": i+=1 if i > number_of_sdfs: number_of_sdfs += splitnumber f2.close() j+=1 f2=open(f+''+str(j)+'.sdf','w') print(I)

But with this code I am not able to generated 3d structure of molecules. How to modify this python script to get the 3d coordinate of molecules?

Again I used open babel to split and command is

obabel -isdf LOTUS_DB.sdf -osdf -O *.sdf --split --gen3d

With this command I command I could get the 3d structure of all the molecules but the file name was as per the molecule name. So its difficult to know the lotus id of each molecule one by one to know the details of the molecule..

It will be really helpful if you can share any tutorial video or reference to use the database.

Thank you.

Kohulan commented 11 months ago

Hi @Smita-Biophysicslab ,

For splitting the SDF and creating individual SDFs you could use RDKit. Something like this could be used:

from rdkit import Chem
from rdkit.Chem import AllChem

# Load the input SDF file
sdf_file = 'LOTUS_2021_03_simple.sdf'

# Create a molecule supplier to iterate through the molecules in the SDF file
supplier = Chem.SDMolSupplier(sdf_file)

# Iterate through the molecules and save each one as a separate SDF file
for idx, molecule in enumerate(supplier):
    if molecule is not None:
        # Extract the molecule's ID (if available)
        mol_id = molecule.GetProp('lotus_id') or str(idx)

        # Create a new SDF file for the current molecule
        output_sdf = f'{mol_id}.sdf'

        # Write the molecule to the output SDF file
        w = Chem.SDWriter(output_sdf)
        w.write(molecule)
        w.close()

print("SDF files have been split and saved.")

For generating 3D coordinates check our API here: https://api.naturalproducts.net/latest/docs#/convert/Create3D_Coordinates_convert_mol3D_get But probably for this, you have to provide SMILES.

RDKit can also generate 3D coordinates. If you would like more information, refer to RDKit.

-Kohulan

diallobakary4 commented 1 month ago

Is the API here https://api.naturalproducts.net/ down? I am getting a "404 page not found"

Kohulan commented 1 month ago

@diallobakary4

Thanks for reaching out. It is up now. API related issues should be logged here: https://github.com/Steinbeck-Lab/cheminformatics-microservice