julie-forman-kay-lab / IDPConformerGenerator

Build conformational representations of Intrinsically Disordered Proteins and Regions by a guided sampling of the protein torsion space
https://idpconformergenerator.readthedocs.io/
Apache License 2.0
19 stars 6 forks source link

Effect of -etbb value on conformer generation rate #140

Open SeanR22 opened 3 years ago

SeanR22 commented 3 years ago

I generated conformers for a high complexity ~120 amino acid protein varying the backbone energy cutoff -etbb value (shown in the figure legend) and kept the sidechain energy cutoff constant at -etss 5000. I also used -xp 0 0 1 1 1.

I measured the rate at which idpconfgen build could generate conformers that met a certain energy cutoff (shown on the x axis).

Interstingly, the -etbb value seems to only limit the rate at which low energy conformers can be generated. Likely because it gets stuck cycling and has a hard time finding a final solution.

low etbb values slows generation rate

So, it begs the question... Would it be more efficient to generate a bunch of random conformers from chunks then screen the energy of each conformer at the end of each run rather than constantly measuring the energy as the chunks are added? Is the -etbb value necessary or does it just make the process more convoluted and less efficient? There may be some upper -etbb value where the process becomes less efficient at finding low energy structures but I haven't found it yet.

From the runs I have done so far it seems that what really limits the ability to find low energy solutions is the particular arrangement of the amino acids in the sequence, its length and the ability to find low energy solutions from the PDB and not the input cutoff values.

SeanR22 commented 3 years ago

To add to the above comment, even though it may seem counter intuitive, if you are aiming to keep only conformers of 1000 kcal/mol or less you should set -etbb 3000 and -etss 1000. The particular amount that -etbb should be set above -etss will be sequence dependent.

joaomcteixeira commented 3 years ago

Hi @SeanR22

Would it be more efficient to generate a bunch of random conformers from chunks then screen the energy of each conformer at the end of each run rather than constantly measuring the energy as the chunks are added?

Definitively not. Despite you don't see, before each chunk is added, idpconfgen screen dozens, even hundreds of possibilities. If we don't block the process to a feasible energy cutoff we would not be able to get out of clashes. I would suggest the other way around: set -etbb to a higher value and then filter for only those conformers with lower energy. It could be a strategy.

I don't think -etbb is not needed, it allows more flexibility because backbone construction is an independent process of side-chain construction.

There may be some upper -etbb value where the process becomes less efficient at finding low energy structures but I haven't found it yet.

it can be. However, I would like to keep in focus what you suggested last time. How high can we set the bb cutoff such that a minimization afterwards is enough to relax the conformer without provoking pronounced deviations from its original structure?

From the runs I have done so far it seems that what really limits the ability to find low energy solutions is the particular arrangement of the amino acids in the sequence, its length and the ability to find low energy solutions from the PDB and not the input cutoff values.

Absolutely. Some sequence chunks are very rare. Or put in another way, sample very similar conformations despite the number of counts in the database (!!!). So, if you are sampling one of those regions at a position where there's a clash, idpconfgen will have a hard time finding a way out of it. That's why some sequences build very fast and others not.

To add to the above comment, even though it may seem counter intuitive, if you are aiming to keep only conformers of 1000 kcal/mol or less you should set -etbb 3000 and -etss 1000. The particular amount that -etbb should be set above -etss will be sequence dependent.

You are right, but not the trick also. The quality of the backbone will be 3000, while the quality of the whole conformer considering sidechains will be 1000. The protocol does not improves the quality of the backbone. Simply, it states that FASPR (currently) was able to pack sidechains such that the Lennard-Jones contributions have a positive (negative energy) impact of (at least) 2000.

Always keep in mind that in the current implementation whole LJ profile is calculated and individual energies summed. There is actually not a hard limit for a clash.

Also, while -etbb defines the energy threshold only for the backbone, the -etss defines the cutoff for the whole conformer (all-atoms). Is this clear from the documentation? Do you think there should be another parameter there for the energy of only the sidechains? Don't know if that makes sense at all.

To my experience I would say -etbb 3000 for ~150 residue protein is good. Is exactly the same you have found.

Thanks so much Sean for putting forward so many experiments. Cheers!

SeanR22 commented 3 years ago

Thanks @joaomcteixeira

Yes, how the energy thresholds work is clear to me. It makes sense that there should be a separate energy threshold for building the backbone and then a total energy threshold at the end after sidechains are built.

I still don't understand how allowing the user to set the backbone threshold helps obtain a better result but we can discuss further. The comments made in my previous post are solely based on the observations from my runs and not the inner workings of the code. I'm obviously missing something here.

I do wonder if the -etbb value needs have a dynamic quality to speed up the build process when the user sets it to a value that is too low? I think the expectation of any user will be that if they set this threshold to a particular value that it should improve the ability to obtain structures below that value. This is why the opposite result was so unexpected for me!

Cheers!

joaomcteixeira commented 3 years ago

I do wonder if the -etbb value needs have a dynamic quality to speed up the build process when the user sets it to a value that is too low?

This is interesting point. But I can't think of all implications right away. Let me finalize some implementations I am working on and then we will return to these discussions. Thanks so much @SeanR22 !!

SeanR22 commented 3 years ago

Got it @joaomcteixeira - I'll leave you alone for a bit! ;)