julie-forman-kay-lab / IDPConformerGenerator

Build conformational representations of Intrinsically Disordered Proteins and Regions by a guided sampling of the protein torsion space
https://idpconformergenerator.readthedocs.io/
Apache License 2.0
15 stars 6 forks source link

Problems packing sidechains with MCSCE #282

Open erik-s99 opened 2 weeks ago

erik-s99 commented 2 weeks ago

Hey, I have a Problem packing sidechains to backbone-only conformers of a Linker-IDR. After generating a definite number of backbone-only conformers (10.000) with build, I used your python scripts case_shortcut.py and case_stiching.py. The scripts work correctly. The generated conformers are correctly attached to the folded domains. Then I wanted to add the side chains with the help of MCSCE, but I get a certain error message. The error: _Traceback (most recent call last): File "/home/eriks/IDPConformerGenerator/miniconda3/bin/mcsce", line 33, in sys.exit(load_entry_point('mcsce', 'console_scripts', 'mcsce')()) File "/home/eriks/MCSCE/src/mcsce/cli.py", line 38, in maincli cli(parser, main) File "/home/eriks/MCSCE/src/mcsce/cli.py", line 33, in cli main(**vars(cmd)) File "/home/eriks/MCSCE/src/mcsce/cli.py", line 197, in main initialize_func_calc(partial(prepare_energy_function, batch_size=batch_size, File "/home/eriks/MCSCE/src/mcsce/core/side_chain_builder.py", line 90, in initialize_func_calc structure.add_side_chain(idx + structure.res_nums[0], template, chain_id) File "/home/eriks/MCSCE/src/mcsce/libs/libstructure.py", line 501, in add_side_chain N_CA_C_coords = self.get_sorted_minimal_backbone_coords(filtered=True) File "/home/eriks/MCSCE/src/mcsce/libs/libstructure.py", line 419, in get_sorted_minimal_backbone_coords coords = atoms[:, colscoords] IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

The settings I used: _mcsce ./G3BP1_results 10 -o ./G3BP1sidechains -f 1-136+337-466 -m simple

menoliu commented 2 weeks ago

Hi Erik,

Thanks for reaching out! It seems like your input command for mcsce is correct. Could you send me one conformer so I can run MC-SCE on my side and try to debug this issue? I do not remembering seeing that error so maybe it's a rare case... I'm wondering if there are not-supported atoms for MC-SCE in the original folded domain PDB file.

Best, Nemo

erik-s99 commented 2 weeks ago

Thanks for the answer. Here is the example data.

Example Conformer.zip

menoliu commented 2 weeks ago

Aha @erik-s99 seems like I've found the error and it has to do with atom naming. Please feel free to use the resre module in idpconfgen to rename all the HIS to HIP since MC-SCE's residue and atom naming convention uses the same forcefield as IDPConfGen and residues/atom names may not match up if the folded region comes from another source.

Furthermore, I've attached a sample MC-SCE script based on your command G3BP1_mcsce.txt. Please note that you should change the extension to .sh and I've added the -s argument since they are all the "same" structure with varying IDR conformers. Furthermore, I've indicated that the batch size should be 100 (since we should try different rotamers at least 100 times for the best success:compute time ratio). Furthermore, -w 5 indicates 5 CPUs or 5 workers.

I've tested this on a fresh installation of MC-SCE from the THGLab GitHub in its own environment, though, it should also work with your MC-SCE installation within the idpconfgenenvironment. Let me know if you still run into issues after removing problematic atoms and/or renaming residues.

erik-s99 commented 2 weeks ago

I changed all HIS to HIP, but the error still exists. If the information helps, the calculation always stops at 72%. I noticed also that after stitching some Amino acids at the N-terminal Linker-IDR are missing. Maybe this can be the reason why I get this error. How can I change the extension to .sh ?

menoliu commented 1 week ago

Hi Erik, sorry for the delayed response but I am currently at a conference so I will try to give you a protocol for your system when I have some time to debug. I have modeled many different systems with MC-SCE and there can be some atoms that are not recognized and we just have to remove them...

To change the file extension, you would have to enable viewing extensions if you're on Windows and when renaming the file, just change the .txtto .shand the process is similar on Mac.

I'm not sure if you mean the energy calculation stops at 72% or the processing of backbones? Hopefully with the script it should solve some issues.

erik-s99 commented 1 week ago

Thanks a lot for your help.

Yes the energy calculation stops always at 72%.

menoliu commented 1 week ago

Hi @erik-s99, I've had some time now to look more carefully at this problem. Seems like there is a residue numbering issue between 333...337 where the backbone looks like it's connected but the residue numbering is wrong. Furthermore, there is a missing residue at position 378 which may be making MC-SCE unhappy...

Furthermore, it looks like to me these two folded regions are not supposed to be fixed in place? I am unfamiliar with the system but my experience tells me this seems like a "beads-on-a-string" scenario. How I usually approach modeling these cases is modeling the L-IDR initially as a C-IDR for the N-terminal folded domain and appending the remainder of the protein to the C-IDR. In my experience, this is much quicker to model than the L-IDR approach.

I will be here if you need more assistance :)