swanss / peptide_design

This repository contains code for the paper: "Tertiary motifs as building blocks for the design of protein-binding peptides"
Other
15 stars 9 forks source link

Error in dTERMen step #10

Closed wlawler45 closed 2 years ago

wlawler45 commented 2 years ago

I am having an error when calling the dTERMen step as shown in the example code, I am running the following line: ./design --p "5u1c_fused-path-and-context_55.pdb" --c "/home/williamubuntu/peptide_design/example/input_files/dTERMen.configfile" --s "chain B" --o "5U1C" and am getting the following output:

Error in dTERMen::buildBackgroundPotentials(): property phi is not defined in the FASST database ./design(+0x45054)[0x55da270a7054] ./design(+0x45480)[0x55da270a7480] ./design(+0xdbf82)[0x55da2713df82] ./design(+0xd6a13)[0x55da27138a13] ./design(+0xd5702)[0x55da27137702] ./design(+0x76ec)[0x55da270696ec] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7ff8cfb25083] ./design(+0x530e)[0x55da2706730e] terminate called after throwing an instance of 'int' Aborted (core dumped)

For reference my dTERMen.configfile looks like this:

fasstdb=/home/williamubuntu/peptide_design/singlechain_22188_sim30_STRIDE.db #path to fasstDB for dTERMen design rotlib=/home/williamubuntu/Mosaist/testfiles/rotlib.bin #update this to match your installation efun=35 selfCorrMaxCliqueSize=2

Am I missing something here? Any help would be appreciated.

swanss commented 2 years ago

Hi Will,

dTERMen requires a specific kind of FASST database file with properties such as phi/psi angles and other values pre-computed. I realize it would be much easier if this was provided, so I'm uploading a database to zenodo now. I'll let you know when it's done!

swanss commented 2 years ago

Working with zenodo to see if they'll let me update the record. For now, try downloading it from dropbox https://www.dropbox.com/s/iwc9dmked2jyopn/multichain_PDB190102_23643structures.db?dl=0

wlawler45 commented 2 years ago

Thank you so much, I'll try that right now, do you mind if I also ask, after running through all the example scripts in this library what is the format of the output? Like where can I find the amino acid sequences of the generated peptides?

swanss commented 2 years ago

After running the program, there will be a PDB file *.red.pdb, this will have an identical backbone structure to whatever you provided as input, with the sequence field replaced with the designed sequence at whatever positions were selected for design. You can also see the sequence in final lines of the output.

It will also output an etab (i.e. energy table, the Potts model energy function that is obtained by dTERMen). You can use the Mosaist program enerTable to load this and repeat the sequence optimization with different parameters

Btw, I noticed that your input is just the fused path + local structural context. For a more accurate sequence, you'll need to take the fused path structure and combine it with the original target structure into a single file.

wlawler45 commented 2 years ago

Okay, that makes a lot more sense. I don't think that was totally clear in the instructions, you might want to specify that. Or I could write a python script to append the result onto the PDB correctly. And I should just take the data from the fused path file not the local structural alignment file?

swanss commented 2 years ago

Fair point, btw did you score the sampled peptide backbones? And if so, did you score the fused path + context file? Perhaps I'll add a note about this at that step. We shouldn't need a script, it should be enough to just concatenate the target and the designed peptide backbone PDB

wlawler45 commented 2 years ago

I did score them, using the example script in example 6, in the readme you mentioned the scores should be summed with the most negative value being the best result, right? I expanded the excel sheet from the score_structures.tsv to sum all of the values for each path and then sort them by those values, I found fused_path_55 to be the best one that way. I'm still slightly confused because when I did append them together I get a plane-like structure next to my protein, doesnt look much like a peptide. I'm including the PDB files individually and joined if you want to take a look. Did I not use the correct pdb file? Thank you again for your help 5U1Cpdbs.zip .

wlawler45 commented 2 years ago

Oh I see what I did wrong, I overthought it, But I think the peptide is in the wrong position relative to the complex. I will try the dTERMen on the result, I got it by just copying and pasting without changing the atom numbers or anything.

swanss commented 2 years ago

There shouldn't be any need to change the atom numbers before running dTERMen, so should be good to go.

By the way, did you filter out the peptide backbones that didn't have enough interface contacts? This one looks like it doesn't make many interactions with the target. Also, in the paper we took the average of the contact scores, but after some more thought, I think summing should work better.

wlawler45 commented 2 years ago

Okay, I want to be clear, the score in score_structures.tsv is the contact score, right? I believe I did do a summation here, Where would I find the information to filter based on interface contacts? Is it in frag_scores.csv?

swanss commented 2 years ago

Yes, that's the score I'm referring to (in the paper we call it the TERM interface score). If you've already computed that for all of the backbones of interest, you can calculate the contact density (# of interface contacts / total number of residues in the peptide) for each design. In the paper we only consider peptide backbones with a contact density of 2.5 or more.

wlawler45 commented 2 years ago

What should I do to calculate the contact density you mention? The score_structures.tsv doesnt list the total number of residues in each peptide? I could write another script to find how many total residues from the PDBs, is that what you meant?

swanss commented 2 years ago

The *fused_paths.csv file from when you ran samplePaths has a column titled "path_len", which is the number of residues in each peptide backbone.

wlawler45 commented 2 years ago

I see that. Okay, Thank you. I'll try that.

swanss commented 2 years ago

For sure, I'm happy to help. Closing now but feel to reopen new issues as other things arise

wlawler45 commented 2 years ago

So I noticed some things, Sorry for the delay, so there are about 200 more fused structures in path_structures than are listed in fused_paths.csv (csv ends at 150, but there are 350 in path_structures and in score_structures.tsv) also it seems when dividing the number of contacts in score_structures.tsv by the path_len in fused_paths.csv for each structure I dont get any structures with values above 1. Does that mean I didn't generate enough structures? What should I do about this?

wlawler45 commented 2 years ago

I'm including here my fused_paths.csv, I see a column labeled num_interface_contacts, but all values seem to be 0, which is confusing 5u1c_fused_paths.csv .

swanss commented 2 years ago

So as far as 'num_interface_contacts', that column is 0 because contacts are not calculated by default when sampling paths (you can change this with the --countContacts argument). Defining potential contacts is rather slow, so I did it with a separate program, countContacts, which can divide the computation over multiple jobs to speed things up.

I'm not sure why there isn't any rows for any of the paths beyond 150. Does the output of samplePaths mention anything going awry?

In the paper we sampled 4,000 backbones with sample paths and only 1,000 of them had more than 2.5 contacts per residue. 1/4 is actually more than I would normally expect, in part because the sampled backbones were enriched for beta-strand structure, which have more contacts on average. Alpha-helical peptides will have less, this because they have more residues that are facing away from the protein

wlawler45 commented 2 years ago

Okay, so on closer inspection I was able to find a few better candidates from the data files, however their PDB data places them in the wrong region of the protein, but their shape looks like it could be accurate. I ran the next step the rosettaRelax successfully on the best candidate, but I am seg faulting on the run_FPDrefine step. So I have a few questions, should the peptide be close to the region I specified in my initial commands by this point or will FPDrefine move it there in the PDB model? I am slightly confused by the segmentation fault, I'm running it on my own laptop, but I've been able to get this far without having that issue. I also tried running on the FlexPepDock server online but that also seemed to fail. Any suggestions or guidance on this would be greatly appreciated.

The command I used to run FPDrefine is as follows: ./FlexPepDocking.default.linuxgccrelease -database /home/williamubuntu/Downloads/rosetta_src_2021.16.61629_bundle/main/database -in:file:s "./5U1C_362.red_0001.pdb" -native "/home/williamubuntu/Mosaist/bin/5U1C_362.red.pdb" -nstruct 100 -flexPepDocking:peptide_chain "C" -flexPepDocking:receptor_chain "A" -rep_ramp_cycles 10 @ /home/williamubuntu/peptide_design/example/9_rosettaFPDRefine/refine_flags

and I also tried it without specifying the native conformation: ./FlexPepDocking.default.linuxgccrelease -database /home/williamubuntu/Downloads/rosetta_src_2021.16.61629_bundle/main/database -in:file:s "./5U1C_362.red_0001.pdb" -nstruct 100 -flexPepDocking:peptide_chain "C" -flexPepDocking:receptor_chain "A" -rep_ramp_cycles 10 @ /home/williamubuntu/peptide_design/example/9_rosettaFPDRefine/refine_flags

And the last few lines it prints before the seg fault are these: protocols.flexPepDocking.FlexPepDockingFlags: Receptor chain: A protocols.flexPepDocking.FlexPepDockingFlags: Receptor first res: 1 protocols.flexPepDocking.FlexPepDockingFlags: Receptor nres: 288 protocols.flexPepDocking.FlexPepDockingFlags: Peptide chain: C protocols.flexPepDocking.FlexPepDockingFlags: Peptide first res: 577 protocols.flexPepDocking.FlexPepDockingFlags: Peptide nres: 16 protocols.flexPepDocking.FlexPepDockingFlags: Peptide anchor: 584 protocols.flexPepDocking.FlexPepDockingFlags: # peptide anchors: 1 protocols.flexPepDocking.FlexPepDockingFlags: # peptide cuts: 0 protocols.flexPepDocking.FlexPepDockingFlags: Receptor anchor: 192 FlexPepDockingProtocol: Old FoldTree : FOLD_TREE EDGE 1 288 -1 EDGE 1 289 1 EDGE 289 576 -1 EDGE 1 577 2 EDGE 577 592 -1

swanss commented 2 years ago

Oh hey, I totally missed this! My bad.

If you haven't already figured it out, could you send the PDB file (./5U1C_362.red_0001.pdb)?