julie-forman-kay-lab / IDPConformerGenerator

Build conformational representations of Intrinsically Disordered Proteins and Regions by a guided sampling of the protein torsion space
https://idpconformergenerator.readthedocs.io/
Apache License 2.0
19 stars 6 forks source link

Effect of sequence length on conformer generation speed and energies #133

Open SeanR22 opened 3 years ago

SeanR22 commented 3 years ago

I have generated many conformers for a number of different sequences of different length. The following are graphs showing how the rate of conformer generation slows dramatically and the energies of the conformers explode logarithmically with increasing sequence length.

Length effect on rate Length effect on 1st quartile energy Length effect on median energy

SeanR22 commented 3 years ago

In thinking about ways to improve the method to be able generate conformers for longer sequences I wonder if the chunk search method could be "trained" from runs of shorter sequences with the same sequence character? For example, if we created an idpconfgen_database.json file from low energy structures of runs on a particular sequence would it run more efficiently when searching using this trained database. How hard would it be to code for the ability to create a database from a folder of conformers generated by IDPconfgen?

Furthermore, if we have a database of low energy structures that were built using the chunk method, could we then scale the method up to add pieces as "fragments", which would be longer than 5 residues in length.

In conclusion, the chunk method could be used to create a representative data set for shorter sequences (Ie up to 150 amino acids) then longer sequences could be built in fragments from this representative data set.

Another way to create longer conformers would be to build on to low energy conformers generated for shorter sequences.