julie-forman-kay-lab / IDPConformerGenerator

Build conformational representations of Intrinsically Disordered Proteins and Regions by a guided sampling of the protein torsion space
https://idpconformergenerator.readthedocs.io/
Apache License 2.0
19 stars 6 forks source link

`contacts` module for building complexes of disordered proteins #236

Closed menoliu closed 1 year ago

menoliu commented 1 year ago

Please note in the commit a4d0a3d I have removed torsions output for intermolecular contacts as sometimes torsions cannot be calculated due to broken? chains.

menoliu commented 1 year ago

@joaomcteixeira Thanks for the support! I just wanted to let you know that I've found 3,799,914 intramolecular contacts where the CA atoms are <= 6 A apart and the residues are > 5 apart. While only detecting 2988 intermolecular contacts where the CA atoms are <= 12 A apart within our database of 24003 PDB IDs.

I am now working on the building algorithm but realized we need a critical function of building disordered protein between two fixed points to continue with this project. This will be helpful also in other applications. Cheers :)

joaomcteixeira commented 1 year ago

Some tips:

  1. if you need to change a function or work to tweak it too much, it is okay to copy that function to a new one, tweak the new function, and have two functions (similar) in the code.
  2. There are many broken chains in the PDB. In IDPCG, a function cleans that by identifying a break and splitting the structure into two. The chains are cleaned when we create the torsion database to build conformers. Sometimes the break happens near the terminals, while chains are split in two in other cases. For the torsion.json db, there's no worry about splitting the chains as long as when we calculate the torsions the chain is a whole peptide.
  3. That critical function is really critical :wink:. It is the same as modeling loop flexibility without MD. Here's a crazy idea: draw a curve between two points in space; luckily, you could achieve that with a variation of this protocol. Curves should be somehow random and have the length of your desired IDR. Then, build the IDP chain by enforcing that a residue needs to be between some distance threshold of the drawn line.
menoliu commented 1 year ago

Note: I will also be working on building "long" IDPs here as I require some functions in fldrs_helper.py. The basic idea is to have a user parameter (switch) --long and (input) --custom-long that has a similar input to resptm where the user will specify what residue ranges will be built as fragments. If --custom-long is not given, we divide the sequence into chunks where the maximum length is 200 AA and the last fragment can exceed by 50 AA. (For user-input with >400 AA we will not turn on --long automatically but will give them a hint in the std_out.

joaomcteixeira commented 1 year ago

Nice! To build very long idps I suggest calculating the energy parameters on the fly instead of precalculating them as we currently do to avoid memory issues. At least that was my experience when I made the first prototypes. Best!

menoliu commented 1 year ago

Thanks for the heads up! I noticed that SLICEDICT_XMERS and GET_ADJ took especially long too, I might split up those calculations to save time (hopefully then I don't have to worry about modifying the energy function)