Closed menoliu closed 2 years ago
the matrix are not gettnig the expected size. So there's something there that MCSCE is sending to idpconfgen that is different for that protein compared with other examples. Run idpconfgen only backbone, and try MCSCE separately to see what happens.
note that you have a mismatch in parameters (n_trials
is not being used)
--mcsce-n_trials MCSCE_N_TRIALS
Sampling trials in exhaustive mode. Defaults to 10.
Regarding the --mcsce-n_trials
we've always used a custom number (128) in simple
mode running mcsce standalone, maybe I had initial misunderstandings about understood "# trials"... However It seems to run slower though with greater success when I implement a higher number for --mcsce-n_trials
than default
unless I am missing something critical n_trails
will only be used if --mcsce-mode exhaustive
. That's what I understood from MCSCE code and wrote in the docs. @JerryJohnsonLee help please :wink:
unless I am missing something critical
n_trails
will only be used if--mcsce-mode exhaustive
. That's what I understood from MCSCE code and wrote in the docs. @JerryJohnsonLee help please 😉
In simple mode n_trials
is also in effect, which is the maximum number of trials allowed before returning None
, so increasing n_trials
will make the success rate of difficult cases higher, but it also takes more time.
I am looking into the other issues.
For the second issue, the reason is that the energy function calculator initialization resides inside init_mcsce_sidechains
, which is a function executed in parallel by idpconfgen code itself. A solution is to move the mcsce initialization code out of the parallel executing code (maybe go into populate_globals
?)
Regarding the second issue, I believe there also might be an issue in idpconfgen. Running MCSCE standalone does not use up much memory actually, only around 1-3 GB for asyn. However just generating backbones with -n 32
, idpconfgen uses up to 60GB of memory, is this intended?
I can see now that there is another layer to the second issue. When running mcsce
standalone, we can specify -w 32
and for e.g. 64 trials, each worker tries building the SC twice, but if we're building side chains on the fly in idpconfgen with -n 32
and e.g. 64 trials, each worker, after building the backbones tries to build the SC 64 times, totally 32*64 = 2048 tries to build side chains at any given time if -nc
> -n 32
. This is just my observations from my unit tests, let me know if I'm understanding correctly because that might be what's contributing to the memory issue more than just preparing the energy function for -n
times
Your understanding is basically correct. The only thing that I need to add is that by default MCSCE will be run under simple mode, and MCSCE will not use multiprocessing in simple mode.
As for the first issue, the discrepancy comes from a bug in idpconfgen. For the histidines in the sequence, both idpconfgen and MCSCE build HIP residues, which should have HD1, HD2, HE1, HE2. However in the ALL_ATOM_LABELS created by idpconfgen there is only HD2 and HE1, and that is where the two atoms difference come from. @joaomcteixeira could you fix that?
I found the issue for the HIP, I am implementing a fix right now. Seems like H
is read as "HIS" first but translated to "HIP" later that's why it's missing HD1 and HE2.
EDIT: it works! PRing right now. Will not crash for histidine anymore :)
As for the memory issue, I think it's fine as that's just how both programs function, since it's not the reason for the crash we can optimize RAM later?
ValueError: shape mismatch
when trying to build longer AA proteins such as Asyn (140 AA).Test line:
idpconfgen build -db ./idpconfgen_database.json -seq MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVHGVATVAEKTKEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKKDQLGKNEEGAPQEGILEDMPVDPDNEAYEMPSEEGYQDYEPEA -etbb 100 -et 'pairs' -scm mcsce --mcsce-n_trials 50 -rs 0 -n 5 -nc 10 -of ./SCM_MCSCE_test/