timkartar / DeepPBS

Geometric deep learning of protein–DNA binding specificity
BSD 3-Clause "New" or "Revised" License
37 stars 4 forks source link

AF3 docked model as input #8

Closed Drjackxiaoyuchen closed 5 days ago

Drjackxiaoyuchen commented 1 week ago

Hi, Team,

Great work on the model, however, is it possible to directly use the docked DNA-Protein model as input for DeepPBS to run a mutagenesis scan? I have a biological background with little computational experience, so I tried but the input format is incompatible any suggestions? And I have a 2nd question regarding retrieving the RI data shown in your figure 4f, is there any easy way to retrieve it from the final pse file or an easier way to visualize the npz file?

timkartar commented 1 week ago

Hi, for your first question, you need to provide me with the file and the full error log, if you would like me take a look.

For the second one, here's some code to help you do it from the output files for 3q05.pdb (biological assembly). Please note that this prints average and max aggregations. You need to implement the log sum aggregation, which should be easy.

from Bio.PDB import PDBParser, PDBIO
import numpy as np
import sys, os
import matplotlib.pyplot as plt
from matplotlib import rc
import seaborn as sns
sns.set_theme()
sns.set_style(style="ticks")

rc('font',**{'family':'sans-serif','sans-serif':['Helvetica']})
#rc('text', usetex=True)
parser = PDBParser()

pdb = "3q05"
model = parser.get_structure("3q05","3q05.pdb")[0]
cwd = os.getcwd()

#atoms = list(model.get_atoms())
#for item in atoms:
#    item.set_bfactor(0)
atoms = list(filter(lambda a: a.element != 'H', model.get_atoms()))
v_prot_all = np.load(cwd + "/"+ pdb + "_v_prot.npy")
interface_atoms = np.load(cwd + "/{}_edge_index.npy".format(pdb))[0,:]
v_prot = v_prot_all[interface_atoms]
diffs = np.load(cwd + "/{}_diffs.npy".format(pdb))
diffs = diffs/diffs.max()

atom_dict = {}
for i in range(len(atoms)): 
    k1 = "{:.2f}".format(atoms[i].coord[0])
    k2 = "{:.2f}".format(atoms[i].coord[1])
    #print((k1, k2))
    atom_dict[(k1, k2)] = atoms[i]

#print(atom_dict)
plt_dict = {}
for i in range(len(v_prot)):
    k1 = "{:.2f}".format(v_prot[i,:][0])
    k2 = "{:.2f}".format(v_prot[i,:][1])
    try:
        atom = atom_dict[(k1, k2)]
    except:
        print(res.get_resname(),res.get_id()[1], res.get_parent().get_id())
        continue
    res = atom.get_parent()
    key = "{}{}{}".format(res.get_resname(),res.get_id()[1], res.get_parent().get_id())
    if key not in plt_dict:
        plt_dict[key] = [diffs[i]]
    else:
        plt_dict[key].append(diffs[i])
    print("{},{},{}".format(atom.name, key, diffs[i]))

final_dict = dict()
for key in plt_dict:
    k = list(key)
    k[1] = k[1].lower()
    k[2] = k[2].lower()
    k = "".join(k)
    final_dict[k] = [np.mean(plt_dict[key]), np.max(plt_dict[key])]

keys = np.array(list(final_dict.keys()))
vals = [final_dict[i][1] for i in keys]

order = keys[np.argsort(vals)[::-1]]
import pandas as pd
plt_df = pd.DataFrame.from_dict(final_dict, orient="index", columns=["Average","Max"])
plt_df = plt_df.reset_index().melt(id_vars=["index"])

fig, ax = plt.subplots(figsize=(8,3.5))
plt_df.sort_values(by=['value'])
print(plt_df.to_csv())
palette = {"Average": "black", "Max":"firebrick"}
sns.barplot(x="index", y="value", hue="variable", data=plt_df, order=order[:20], palette=palette)
#print(plt_df.to_csv())
#ax.set_ylabel("Aggregated Network importance \n(atoms within 5 $\AA$) of CG DNA")
#ax.set_xlabel("Interface residues")
ax.legend()
xticklabels = ax.get_xticklabels()
ax.set_xticklabels(xticklabels, rotation = 45, ha="right", fontsize=15)
#yticklabels = ax.get_yticklabels()
#ax.set_yticklabels(yticklabels, fontsize=15)
plt.tight_layout()
#plt.show()

#plt.savefig("./interpret_residue_wise.svg")
plt.close()
Drjackxiaoyuchen commented 1 week ago

thanks so much! here is the error msg i got, i tried to use cif (direct ouput by AF3 or convert them into pdb using pymol) but both didnt quite work out.

[debug] process_co_crystal.py, main: line: af3_test.pdb [debug] process_co_crystal.py, main: loading pdb_file af3_test.pdb, path is ./pdb/af3_test.pdb [debug] structure_data.py StructureData: loading structure with PDB parser, name=co_crystal, path=., structure=./pdb/af3_test.pdb rm: cannot remove '.par': No such file or directory rm: cannot remove '.pqr': No such file or directory rm: cannot remove '.r3d': No such file or directory rm: cannot remove '.dat': No such file or directory rm: cannot remove '*.log': No such file or directory [debug] load_data.py, _processData: loading data from ./npz/af3_test.npz Traceback (most recent call last): File "/data/jmwang/rna/rbp/baselines/deeppbs/run/process/../predict.py", line 94, in dataset, transforms, info, datafiles = loadDataset(datafiles, C["nc"], C["labels_key"], C["data_dir"], ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/jmwang/rna/rbp/baselines/deeppbs/deeppbs/nn/utils/load_data.py", line 275, in loadDataset dataset, transforms, data_files = _processData(data_files, nc, labels_key, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/jmwang/rna/rbp/baselines/deeppbs/deeppbs/nn/utils/load_data.py", line 148, in _processData data_arrays = np.load(f, allow_pickle=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/jmwang/.conda/envs/deeppbs/lib/python3.12/site-packages/numpy/lib/npyio.py", line 427, in load fid = stack.enter_context(open(os_fspath(file), "rb")) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: './npz/af3_test.npz'

———

timkartar commented 1 week ago

The npz file not being found means the preprocessing step didn't happen correctly (which converts the pdb to npz, line 3 in process_and_predict.sh). I don't see any error log from that line. Please make sure your environement is correctly set up and x3dna-dssr/Curves etc. is available in system path. It's definitely not a general issue with AF3 files, because people often upload such files to the deeppbs webserver, which works fine. I am unable to help further without seeing the full cif/pdb file.

timkartar commented 5 days ago

Closing this as inactive. Feel free to reopen !

Drjackxiaoyuchen commented 3 days ago

fold_2024_08_26_10_39_plei (1).zip hi, sry for the late reply, but i was trying to upload this dockked model by af3 onto the web server version, could you point me to why this wont work? here is the error msg i got Log: Processing co-crystal...

Processing file '172619982560.tmp.pdb' total number of nucleotides: 96 total number of base pairs: 46 total number of helices: 2 total number of stems: 6 total number of isolated WC/wobble pairs: 8 total number of non-pairing interactions: 102 total number of splayed-apart dinucleotides: 4 consolidated into units: 2 total number of hairpin loops: 1 total number of bulges: 2 total number of internal loops: 11 total number of non-loop single-stranded segments: 1

Time used: 00:00:00:00 done with cleaning up files.

Time used: 00:00:00:00

......Processing structure #1: <172619982560_entity_0.inp>...... This structure has broken O3'[i] to P[i+1] linkages

Time used: 00:00:00:00

......Processing structure #1: <172619982560_entity_0.inp>...... This structure has broken O3'[i] to P[i+1] linkages

Time used: 00:00:00:00 Helix score: 0.5181818181818182 172619982560_entity_0/172619982560_entity_0.pdb /srv/www/deeppbs.usc.edu/deeppbs-webserver/deeppbs/run/process/172619982560_entity_0 172619982560_entity_0.inp Helix score: 0.8416666666666667 172619982560_entity_0/172619982560_entity_0.pdb /srv/www/deeppbs.usc.edu/deeppbs-webserver/deeppbs/run/process/172619982560_entity_0 172619982560_entity_0.inp ERROR: helix count problem 2 172619982560.pdb rm: cannot remove '.par': No such file or directory rm: cannot remove '.r3d': No such file or directory rm: cannot remove '.dat': No such file or directory Running prediction... Traceback (most recent call last): File "/srv/www/deeppbs.usc.edu/deeppbs-webserver/deeppbs/run/process/../predict.py", line 93, in <module> dataset, transforms, info, datafiles = loadDataset(datafiles, C["nc"], C["labels_key"], C["data_dir"], File "/srv/www/deeppbs.usc.edu/deeppbs-webserver/deeppbs/deeppbs/nn/utils/load_data.py", line 261, in loadDataset dataset, transforms, data_files = _processData(data_files, nc, labels_key, kwargs) File "/srv/www/deeppbs.usc.edu/deeppbs-webserver/deeppbs/deeppbs/nn/utils/load_data.py", line 145, in _processData data_arrays = np.load(f, allow_pickle=True) File "/srv/www/deeppbs.usc.edu/conda/lib/python3.9/site-packages/numpy/lib/npyio.py", line 427, in load fid = stack.enter_context(open(os_fspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: '../../../backend/media/uploads/../../../backend/media/uploads/172619982560.npz' Calculating heavy atom relative importance (RI) scores... Traceback (most recent call last): File "/srv/www/deeppbs.usc.edu/deeppbs-webserver/deeppbs/run/process/../interpret.py", line 90, in <module> dataset, transforms, info, datafiles = loadDataset(datafiles, C["nc"], C["labels_key"], C["data_dir"], File "/srv/www/deeppbs.usc.edu/deeppbs-webserver/deeppbs/deeppbs/nn/utils/load_data.py", line 261, in loadDataset dataset, transforms, data_files = _processData(data_files, nc, labels_key, kwargs) File "/srv/www/deeppbs.usc.edu/deeppbs-webserver/deeppbs/deeppbs/nn/utils/load_data.py", line 145, in _processData data_arrays = np.load(f, allow_pickle=True) File "/srv/www/deeppbs.usc.edu/conda/lib/python3.9/site-packages/numpy/lib/npyio.py", line 427, in load fid = stack.enter_context(open(osfspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: '../../../backend/media/uploads/172619982560.npz' Running Pymol scripts... Traceback (most recent call last): File "/srv/www/deeppbs.usc.edu/conda/lib/python3.9/site-packages/pymol/parsing.py", line 467, in run run(path, ns_pymol, ns_pymol) File "/srv/www/deeppbs.usc.edu/conda/lib/python3.9/site-packages/pymol/parsing.py", line 516, in run_file execfile(file,global_ns,local_ns) File "/srv/www/deeppbs.usc.edu/conda/lib/python3.9/site-packages/pymol/parsing.py", line 511, in execfile exec(co, global_ns, local_ns) File "../plot_scripts/vis_interpret.py", line 25, in <module> v_prot_all = np.load(npy_path + "_v_prot.npy") # coords all protein atoms File "/srv/www/deeppbs.usc.edu/conda/lib/python3.9/site-packages/numpy/lib/npyio.py", line 427, in load fid = stack.enter_context(open(os_fspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: '../plot_scripts//interpret_output/172619982560.npz_v_prot.npy' Error: unsupported file type: Error: Argument processing aborted due to exception (above). mv: cannot stat '../plot_scripts/interpret_output/172619982560.': No such file or directory mv: cannot stat '../../../backend/media/output/npzs/172619982560.npz_predict.npz': No such file or directory mv: cannot stat '../../../backend/media/uploads/172619982560.npz': No such file or directory

timkartar commented 3 days ago

Hi ! The two reasons it is not working are :

  1. This structure has broken O3'[i] to P[i+1] linkages. (You can fix it first)
  2. The structure has two helices instead of one. (you can create separate files with one helix each)
Drjackxiaoyuchen commented 3 days ago

thanks so much !!!