Open wlawler45 opened 2 years ago
Hi,
It looks like paths are being sampled, but they do not meet the acceptance criteria.
I would think a little bit about your goal and then adjust the parameters accordingly
By the way, I just peeked back at the structure you sent in the other issue thread (5U1C_dimer) and noticed it's not the same as the structure in the PDB and that it had B-factors value consistent with being the pLDDT from an alphafold model. If possible, I would highly suggest trying to focus your designs to the region of the structure that is indicated to have high confidence (pLDDT > 90). A design like the fused path you sent will have a lot going against it: limited hydrophobic contact surface area and a highly dynamic binding site. All of that is even assuming this dimeric state occurs in solution. Is there experimental evidence for it?
I did target certain residues in that protein, I used the initial command: ./generateSeeds --targetPDB ../example/input_files/5U1C_dimer.pdb --paramsFile ../example/1_generateSeeds/genSeeds-HIV-IN.params --targetSel "resid 115-153" --peptideChainID 'A' , would this not do that?
Yes, that should only generate seeds around residues 115-153, could you please check the output file to verify that's what happened? How many did you generate per residue?
Check Extendedfragments.bin?
extendedfragmentsinfo.zip It seems to have generated a great many seeds for each residue, ~90 for each residue?
I meant the standard output from when you ran the job, but this file works too! It looks like some residues have as many as ~500 seeds. I know this seems like a lot, but the space of possible > 5 residue binding structures is very large, so this is a relatively small sample of that. If I recall correctly, we searched for up to 5,000 matches for each binding site fragment when designing binders of TRAF6. If I was choosing to focus on a specific binding site I would probably bump that up to at least ~10,000. This will make the subsequent steps slower, but will increase the odds that you find a good backbone. Do you have access to a cluster, or do you need to run this locally?
I'm running locally, my cluster has a 6 hour time limit that this will exceed unfortunately.
Would it help if I narrowed the selection of residues further? There is only really 3 residues in the whole protein that I'm interested in, but they lie spread out in the range I gave that command.
Would it help if I narrowed the selection of residues further? There is only really 3 residues in the whole protein that I'm interested in, but they lie spread out in the range I gave that command.
Ah okay, so I would do (resid i or resid j or resid k) around d
, where d is the maximum distance in angstroms at which you consider neighboring residues (I would start with d = 10.0 or so and increase if necessary). It's important to include more than just the 3 residues since you will likely want seeds that can interact with nearby residues to get enough binding energy.
For most steps (with the exclusion of dTERMen) you should be able to run the jobs in less than 6 hours. Even the jobs that are slower, like findOverlaps, can be broken up into array jobs that will individually be short enough to run on your cluster.
Okay, let me try to do this on the cluster then. I appreciate your help again.
Of course! I'm available here if you have more questions about the code and would also be happy to set up a zoom call to chat about your specific design problem if you'd like.
I appreciate the offer of a zoom call, if I have some more significant trouble I will let you know and we can set something up. So just to clarify, the gen_seeds command should be as follows? I'm getting an error saying fragment_type not recognized.
srun -N 1 -n 10 -t 360 ./generateSeeds --targetPDB ../example/input_files/5U1C_dimer.pdb --paramsFile ../example/1_generateSeeds/genSeeds.params --targetSel "(resid 64 or resid 116 or resid 152) around 10" --peptideChainID 'A'
Nevermind, that issue was related to an EOL conversion since I had to download the library to windows and move it onto the UNIX cluster.
Okay, so I ran the commands up until run_samplepath.sh on the cluster, the other commands took on the order of minutes to complete, run_samplepaths.sh took the whole 6 hours and failed to finish, and was consistently saying path too short. Is there any information that I could send you that would help to identify what this issue might be? I also would like to schedule a zoom call with you sometime this week if possible. Thank you for any assistance you can give me with this.
It's very likely that there are not enough overlaps between the seeds in the graph. The easiest way to confirm this would be for you to send me the standard output file from buildSeedGraph
, which reports how many edges in the graph are between residues in the same seed vs. residues in distinct seeds.
Shoot me an email swans@mit.edu and we can set up a call!
I am trying to run the samplepaths command as described in the examples, but the command does not seem to complete no matter how low I set the number of paths to. I ran the following command: ./samplePaths --targetPDB ../example/input_files/5U1C_dimer.pdb --seedBin ../example/1_generateSeeds/output/extendedfragments.bin --seedGraph /home/williamubuntu/peptide_design/example/3_buildSeedGraph/5U1C_seedgraph.adj --numPaths 500 --config ../example/input_files/singlechainDB.configfile --base 5U1C --writeTopology for a few days with no results, after changing numPaths to 5 it still did not complete and gave the following file sample_paths.zip as output from console. Any advice on what might be going wrong would be appreciated. Thank you. sample_paths.zip