patrickbryant1 / Umol

Protein-ligand structure prediction
164 stars 16 forks source link

Can ligand be inputted via sdf files instead of LIGAND_SMILES format? #18

Closed CykaZH closed 3 months ago

CykaZH commented 3 months ago

Can ligand be inputted via sdf files instead of LIGAND_SMILES format? In other words, can the matrix file (sdf) be converted into the corresponding LIGAND_SMILES?

patrickbryant1 commented 3 months ago

Hi,

You can do this with RDKit. I have added a function that does this in 'make_ligand_feats.py':

def sdf_to_smiles(input_sdf): """Read sdf and convert to SMILES """ with Chem.SDMolSupplier(input_sdf) as suppl: for mol in suppl: return AllChem.MolToSmiles(mol)

You can input your sdf now by changing the path in the predict.sh script.

SMILES. Alt: --input_sdf 'path_to_input_sdf'

python3 ./src/make_ligand_feats.py --input_smiles $LIGAND_SMILES \ --outdir $OUTDIR

alt: python3 ./src/make_ligand_feats.py --input_sdf 'path to your sdf' \ --outdir $OUTDIR

Best,

Patrick

CykaZH commented 3 months ago

Hi,

You can do this with RDKit. I have added a function that does this in 'make_ligand_feats.py':

def sdf_to_smiles(input_sdf): """Read sdf and convert to SMILES """ with Chem.SDMolSupplier(input_sdf) as suppl: for mol in suppl: return AllChem.MolToSmiles(mol)

You can input your sdf now by changing the path in the predict.sh script. #SMILES. Alt: --input_sdf 'path_to_input_sdf' python3 ./src/make_ligand_feats.py --input_smiles $LIGAND_SMILES --outdir $OUTDIR

alt: python3 ./src/make_ligand_feats.py --input_sdf 'path to your sdf' --outdir $OUTDIR

Best,

Patrick

Thank you very much for providing this method to convert SDF to SMILES. Given that we are currently engaged in docking studies involving short peptides and proteins, I would like to inquire whether it's also possible to use PDB files of short peptides as input instead. Appreciate your continued support and assistance. Best regards, ZH

CykaZH commented 3 months ago

Hi,

You can do this with RDKit. I have added a function that does this in 'make_ligand_feats.py':

def sdf_to_smiles(input_sdf): """Read sdf and convert to SMILES """ with Chem.SDMolSupplier(input_sdf) as suppl: for mol in suppl: return AllChem.MolToSmiles(mol)

You can input your sdf now by changing the path in the predict.sh script. #SMILES. Alt: --input_sdf 'path_to_input_sdf' python3 ./src/make_ligand_feats.py --input_smiles $LIGAND_SMILES --outdir $OUTDIR

alt: python3 ./src/make_ligand_feats.py --input_sdf 'path to your sdf' --outdir $OUTDIR

Best,

Patrick

hmm, I just downloaded the latest 'make_ligand_feats.py' and tried to replace the input with an sdf file instead of smiles in 'predict.sh', but in the end, I got this error message,

[10:08:36] SMILES Parse Error: syntax error while parsing: ./data/test_case/0317/histamine.sdf
[10:08:36] SMILES Parse Error: Failed parsing SMILES './data/test_case/0317/histamine.sdf' for input: './data/test_case/0317/histamine.sdf'
Traceback (most recent call last):
  File "/root/Umol/./src/relax/align_ligand_conformer.py", line 197, in <module>
    best_conf, best_conf_pos, best_conf_err, atoms, nonH_inds, mol, best_conf_id  = generate_best_conformer(pred_ligand['chain_coords'], ligand_smiles)
  File "/root/Umol/./src/relax/align_ligand_conformer.py", line 70, in generate_best_conformer
    m = Chem.AddHs(Chem.MolFromSmiles(ligand_smiles))
Boost.Python.ArgumentError: Python argument types in
    rdkit.Chem.rdmolops.AddHs(NoneType)
did not match C++ signature:
    AddHs(RDKit::ROMol mol, bool explicitOnly=False, bool addCoords=False, boost::python::api::object onlyOnAtoms=None, bool addResidueInfo=False)
The unrelaxed predicted protein can be found at ./data/test_case/0317//0317'_pred_protein.pdb' and the ligand at ./data/test_case/0317//0317'_pred_ligand.sdf'
Traceback (most recent call last):
  File "/root/Umol/./src/relax/openmm_relax.py", line 1, in <module>
    import openmm as mm
ModuleNotFoundError: No module named 'openmm'
/root/miniconda3/lib/python3.10/site-packages/Bio/PDB/PDBParser.py:388: PDBConstructionWarning: Ignoring unrecognized record 'END' at line 3153
  warnings.warn(
The final relaxed structure can be found at ./data/test_case/0317//0317'_relaxed_plddt.pdb'

I am a newbie in this field and would like to know how to solve this problem. Sincerely, ZH

patrickbryant1 commented 3 months ago

Hi, It seems that many issues are present. Running these packages locally is unfortunately not for beginners. This is why there is a Colab notebook that you can run in the web: https://colab.research.google.com/github/patrickbryant1/Umol/blob/master/Umol.ipynb

Please make sure that your sdf file contains a viable ligand that can be parsed into SMILES. There seems to be an issue for RDKit to parse it.

CykaZH commented 3 months ago

Hi, It seems that many issues are present. Running these packages locally is unfortunately not for beginners. This is why there is a Colab notebook that you can run in the web: https://colab.research.google.com/github/patrickbryant1/Umol/blob/master/Umol.ipynb

Please make sure that your sdf file contains a viable ligand that can be parsed into SMILES. There seems to be an issue for RDKit to parse it.

Hello, thank you for your reply. I believe I've resolved my issue. The error occurred due to the presence of some chemically unreasonable covalent bonds in the short peptide I provided. When I perform docking using the compounds you provided or other compounds in my experiment, the prediction runs smoothly. Besides using the RDKit package for compound file conversion, I discovered the OPENBABEL website (https://www.cheminfo.org/Chemistry/Cheminformatics/FormatConverter/index.html), which offers various formats for converting structural information freely. I think the requirement I mentioned earlier of using the pdb file of the compound peptide as input can be achieved by encoding it into SMILES format using this website.

Best, ZH

patrickbryant1 commented 3 months ago

Hi,

Great you solved the problem. Yes, OpenBabel is an alternative to RDKit. RDKit can be picky about having chemistry it understands which is probably why the error was raised.

Best of luck running Umol.

I will close this issue.