Closed AY-LIANG closed 7 months ago
Hi ! Thanks for bringing this up. As of now, DeepPBS input has to have one DNA helix. As the output suggests, yours has three. The simplest solution is to create three separate files (using a tool like pymol/biopython) with only one helix in each and running them separately. Please let me know how that goes.
update: I created the files for you as an example, see here: https://drive.google.com/drive/folders/1rSg6YV35cfBrQK_aF1Vl-2EPJsxqKVSM?usp=sharing
Example output is here: https://rohslab.usc.edu/deeppbs/link/171180933088 PS: you can also use this webserver now, instead of code ocean ! (https://rohslab.usc.edu/deeppbs/)
Thank's for your reply. I have tried the separate file and it worked well. I have another pdb file generated by HDOCK(http://hdock.phys.hust.edu.cn/),and it's a protein-DNA docking model. The file contains information about the 3'/5' ends of the DNA like DT5/DC3, and I get error massage:
Processing file 'dna.tmp.pdb'
total number of nucleotides: 176
total number of base pairs: 88
total number of helices: 1
total number of stems: 1
total number of non-pairing interactions: 178
boundary for lvector(): [1 to 0]
Time used: 00:00:00:00
done with cleaning up files.
Time used: 00:00:00:00
Traceback (most recent call last):
File "../process_co_crystal.py", line 71, in <module>
dna_data = processDNA(dna, quiet=False)
File "/opt/conda/lib/python3.8/site-packages/deeppbs/process_dna.py", line 1547, in processDNA
n = getNucleotideData(nt, model, D.chem_components)
File "/opt/conda/lib/python3.8/site-packages/deeppbs/process_dna.py", line 319, in getNucleotideData
"chemical_name": COMPONENTS[nt["nt_name"].strip()]['_chem_comp.name']
KeyError: 'DC5'
so I remove the terminal information but still encounter an error:
Processing file 'dna.tmp.pdb'
total number of nucleotides: 176
total number of base pairs: 88
total number of helices: 1
total number of stems: 1
total number of isolated WC/wobble pairs: 2
total number of non-pairing interactions: 178
boundary for lvector(): [1 to 0]
Time used: 00:00:00:00
done with cleaning up files.
Time used: 00:00:00:00
Traceback (most recent call last):
File "../process_co_crystal.py", line 71, in <module>
dna_data = processDNA(dna, quiet=False)
File "/opt/conda/lib/python3.8/site-packages/deeppbs/process_dna.py", line 1547, in processDNA
n = getNucleotideData(nt, model, D.chem_components)
File "/opt/conda/lib/python3.8/site-packages/deeppbs/process_dna.py", line 324, in getNucleotideData
nucleotide = getNucleotideById(model, nid)
File "/opt/conda/lib/python3.8/site-packages/deeppbs/process_dna.py", line 156, in getNucleotideById
return model[ch][rid]
File "/opt/conda/lib/python3.8/site-packages/Bio/PDB/Entity.py", line 45, in __getitem__
return self.child_dict[id]
KeyError: ''
Here are my input files: https://drive.google.com/file/d/1pWrLywvxXdc_Auik3GKELmkr4XQajmqb/view?usp=drive_link
https://drive.google.com/file/d/11DwEFWRlGcWXpobQ7o94gNMI3r0YZOgW/view?usp=drive_link
Hi, glad the first one worked out ! The next file may not be following the PDB format property. I am happy to take a look for you, but the drive links are inaccessible to me. Please make them visible to anyone with the link.
Sorry for the mistake. Links are available now. https://drive.google.com/drive/folders/1t0lm0iamodCFOLWfX67fj4axKrUkHq-x
Hello there ! Thanks for the update. I went ahead and took a look. There was something weird about the way you did the removal. I wrote a simple biopython script for you to do the same and it works. Please run this and use the output pdb file.
from Bio.PDB import PDBParser, PDBIO
parser = PDBParser()
model = parser.get_structure("model_1", "./model_1.pdb")[0]
for res in model['B'].child_list:
rid = res.get_id()
res.resname = res.resname[:2]
print(res, res.resname)
io = PDBIO()
io.set_structure(model)
io.save("./fixed.pdb")
Output link : https://rohslab.usc.edu/deeppbs/link/171202305522
You can open both your "model_1_removed_terminal.pdb" and this "fixed.pdb" and compare them through pymol Sequence viewer to see the differences.
PS: The docking for the homeodomains in the structure does not look very good. You may want to somehow refine them.
Let me know if you have any further questions.
The script is useful, and now I can run successfully. The webserver is quite convenient. Thank you very much for your help!
Great ! Thanks for reaching out. Just a note though that the webserver is still under development. But more news and updates will follow. Closing the issue now.
I run the code on Code Ocean. When I replace the input file with my own pdb, there will be some problem and no npz file is generated. For example, I used 3hos.pdb(Molecular architecture of the Mos1 paired-end complex: the structural basis of DNA transposition in a eukaryote) as input, The output information is as follows. The output information is as follows.