Closed JerryJohnsonLee closed 2 years ago
Thanks @JerryJohnsonLee !! I am moving houses these days, give me some time to review it. But it looks very good :wink:
Hi @JerryJohnsonLee , thanks for this integration. I am running into a few RuntimeErrors
however both on my local machine and on the cluster. I have installed all the pre-requisites of Int2Cart as well as Int2Cart on top of the idpconfgen
installation. Things seem to run but there might be a configuration problem:
Local bug: RuntimeError: Attempting to deserialize object on CUDA device 1 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device.
Cluster bug: RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
Cluster bug:
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
Fixed the cluster bug using this method: https://stackoverflow.com/questions/56369030/runtimeerror-attempting-to-deserialize-object-on-a-cuda-device
We should probably add this to the FAQ section of the documentation or start a Troubleshooting section :)
Yeah.. This is just a bug. I forgot to map the loaded model weights to CPU in the code. It does not show up on my side because my machine has multiple GPUs, but it will be an issue for those machines with only 1 or no GPUs. It should be corrected by adding map_location=torch.device('cpu')
in https://github.com/JerryJohnsonLee/IDPConformerGenerator/blob/master/src/idpconfgen/components/bgeo_int2cart.py#L15
(change to model_state = torch.load(model_addr, map_location=torch.device('cpu') )['model_state_dict']
)
Thanks for the quick fix Jerry! It worked on both local and cluster. I will push the commit here accordingly.
However, I've noticed that there's significant hanging time after logging the random seed: X
and no progression I was wondering if that's another bug?
Edit: for your convenience here's my test command based on drk:
idpconfgen build -db /home/nemoliu/Documents/database/idpconfgen_database_rd.json -seq MEAIAKHDFSATADDELSFRKTQILKILNMEDDSNWYRAELDGKEGLIPSNYIEMKNHD -nc 10 -n 10 --dany --dloop-off -rs 0 -etbb 100 -et 'pairs' -subs '{"R":"RK","D":"DE","C":"CY","C":"CW","Q":"QH","E":"ED","H":"HYQ","I":"IVM","I":"IL","K":"KR","M":"MI","M":"MVL","F":"FY","F":"FWL","W":"WYFC","Y":"YF","Y":"YC","Y":"YWH"}' -bgeo_int2cart -of ./drk_i2c_test -dsd
This is a strange behavior. I was not having this situation yesterday but I am also running into this issue now. I think this is related to the multiprocessing part of IDPCG, but I probably will not be able to have a quick fix.
This is a strange behavior. I was not having this situation yesterday but I am also running into this issue now. I think this is related to the multiprocessing part of IDPCG, but I probably will not be able to have a quick fix.
Update: this is somehow coupled with the previous bug. When I remove the map_location it will not hang up on my local machine. Maybe there are some conflicts when mapping multiple models onto CPU at the same time in different processes?
Aha that's probably what's causing it. I am not familiar in tensorflow/torch but there's nothing wrong with running int2cart separately on the backbones within the same environment or calculating backbones without -bgeo_int2cart
in this branch.
Pretty cool!
Update: I got the local multiprocessing working with model_state = torch.load(model_addr, map_location=torch.device("cuda" if torch.cuda.is_available() else "cpu") )['model_state_dict']
. It is a bit slower than using the default BGEO and I predict it will be slower than running Int2Cart at the end... (273.86s vs 34.079s) for with and without this integration on my local machine.
Edit: this should not be the solution though... need to find some way to get cpu
working so we can have properly controlled benchmarks
Is there any altered behavior of idpconfgen (here compared to master
)if we do not select the int2cart and use the default bgeo
as usual?
Leaving out the -bgeo_int2cart
runs everything fine, however I've narrowed down the problem to def get_internal_coords
when loading torch with cpu
, seems not to return bond lengths (d
) and theta
s after the first fragment is selected.
I pinpoint the issue to be on line https://github.com/THGLab/int2cart/blob/main/modelling/layers/dense.py#L79 not returning results when running on CPU. However this is very strange because it is basically just executing code in a standard pytorch Dense layer..
Update: I also tried bypassing this line by converting weights and bias back to numpy and do the calculation using numpy, and it works. So this seems to be some bug that I cannot understand in pytorch. Unfortunately this "numpy bypass" approach needs to be done in more than one place, and is practically unfeasible, so I don't think it is the correct way for fixing the issue. I think for now the best approach is to run the model on GPU.
Thanks for your investigation Jerry! I guess we can write some documentation in the installation.rst
about requiring CUDA compatible hardware if running Int2Cart. We can request GPUs on our cluster so that shouldn't be a problem.
But for my benchmark I think I will only see how fast it takes for Int2Cart to process backbones after. Including a statistic of the speed from using 1? GPU on the cluster.
Hi @JerryJohnsonLee, a new issue with how Int2Cart builds backbones (maybe this was related to our bug from yesterday as well). It seems to use a different backbone building algorithm omitting hydrogen atoms... MC-SCE cannot process side-chains onto these and throws warnings. Also a KeyError
is thrown for the Nitrogen atom after processing with Int2Cart?
@joaomcteixeira let me know if I should move this issue to the Int2Cart repo to keep this PR clean.
I've attached two sample files, before and after Int2Cart and a sample of the KeyError
thrown.
N_KeyError_i2c_mcsce.zip
Is it running Int2Cart within IDPCG? The backbone building algorithm should be the same as what IDPCG originally uses, because it only modifies the bond lengths and bond angles in the BGEO portion of the IDPCG code. Maybe you could temporarily disable sidechain building and see what the backbone looks like?
Hi Jerry, no. The issue I raised was running Int2Cart on the backbones built by IDPCG. My jobs requesting a T4 GPU still have not queued up yet unfortunately. If there isn't a quick fix maybe I can try something locally on Monday
Any updates? Is that latest bug related with the development here at the end?
Hi all, I have some news from this weekend of vigorously testing this integration on HPC:
Thanks, Nemo. How much slower is Int2Cart? Looking forward to seeing the numbers. Can we also create and then test the "fixed" approach or some average approach for speed? Julie
From: Zi Hao (Nemo) Liu @.> Sent: May 16, 2022 11:14 AM To: julie-forman-kay-lab/IDPConformerGenerator @.> Cc: Subscribed @.***> Subject: Re: [julie-forman-kay-lab/IDPConformerGenerator] Int2Cart integration (PR #203)
Hi all, I have some news from this weekend of vigorously testing this integration on HPC: Works no problem on P100, V100, and T4 GPUs (I've added an error message to be thrown if the user does not have compatible GPUs to run this plugin)
Hi all, I have some news from this weekend of vigorously testing this integration on HPC:
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/julie-forman-kay-lab/IDPConformerGenerator/pull/203*issuecomment-1127800537__;Iw!!D0zGoin7BXfl!6YoYRxsReqmH-BNnZujI44yygy7bp3rp_hTH8kQZlL6_Icacbiyk8OzsZwhiWPlJBOJKaPKl77a__sM54LenbfIlxQ$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AMXWN4CINOR7W6WAKD7QVM3VKJQ3VANCNFSM5VZ2MIFA__;!!D0zGoin7BXfl!6YoYRxsReqmH-BNnZujI44yygy7bp3rp_hTH8kQZlL6_Icacbiyk8OzsZwhiWPlJBOJKaPKl77a__sM54LdLP1hBvw$. You are receiving this because you are subscribed to this thread.Message ID: @.***>
This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies.
Hi Julie, here's the numbers for processing the backbones after with Int2Cart. Int2Cart currently cannot process more than 1 PDB at a time so I think it could benefit with some multiprocessing (but as we see here, there's some internal error in PyTorch's multiprocessing using CPUs instead of CUDA/GPUs). The good news is the time doesn't seem to depend on protein length and only needs less than 1.5 hrs to process 1024 backbones using 1 CPU (not node) on Graham:
drkN SH3: 4418 s = 1.22 hr = 1 hr 13 mins Sic1: 4564 s = 1.26 hr = 1 hr 15 mins aSyn: 4729 s = 1.31 hr = 1 hr 19 min I-2: 4762 s = 1.32 hr = 1 hr 19 min
However, the way Int2Cart builds backbones is incompatible with MC-SCE, so these backbones cannot have side-chains added to them via MC-SCE. But, I am currently running a job with Int2Cart running inside IDPCG so the backbones generated will use IDPCG algorithm instead of the model
package Int2Cart uses.
What's interesting is MC-SCE seems to work with the Sic1 Int2Cart post-processed backbones but not with drkN SH3, aSyn, and I-2... (those 3 throw the same KeyError
regarding a nitrogen atom, due to the model
algorithm) What's more, model
doesn't add Hydrogens to the backbones. For Sic1, I received a 55% success rate for 1024 backbones in 6467 s = 1.79 hrs = 1 hr 48 mins. This is slightly slower than our historical side-chain per backbone for Sic1 (Table 2A which was 63% success at 1.55 hrs) However I will be running the side-chains with the same backbones as the ones processed in Int2Cart for good experimental control.
Sidechain benchmark update: MC-SCE also finished running for the default BGEO backbones, currently running for Tau and Sic1. After, I will run MC-SCE for backbones generated by the Int2Cart integration to IDPCG.
Thanks. Int2Cart should be able to use an approach that puts hydrogens on and allows MC-SCE sidechains to be built afterwards. This could be within IDPConfGen but it should be enabled after the full backbone is built too.
Thanks, Julie
From: Zi Hao (Nemo) Liu @.> Sent: May 16, 2022 11:55 AM To: julie-forman-kay-lab/IDPConformerGenerator @.> Cc: Julie Forman-Kay @.>; Comment @.> Subject: Re: [julie-forman-kay-lab/IDPConformerGenerator] Int2Cart integration (PR #203)
Hi Julie, here's the numbers for processing the backbones after with Int2Cart. Int2Cart currently cannot process more than 1 PDB at a time so I think it could benefit with some multiprocessing (but as we see here, there's some internal error
Hi Julie, here's the numbers for processing the backbones after with Int2Cart. Int2Cart currently cannot process more than 1 PDB at a time so I think it could benefit with some multiprocessing (but as we see here, there's some internal error in PyTorch's multiprocessing using CPUs instead of CUDA/GPUs). The good news is the time doesn't seem to depend on protein length and only needs less than 1.5 hrs to process 1024 backbones using 1 CPU (not node) on Graham:
drkN SH3: 4418 s = 1.22 hr = 1 hr 13 mins Sic1: 4564 s = 1.26 hr = 1 hr 15 mins aSyn: 4729 s = 1.31 hr = 1 hr 19 min I-2: 4762 s = 1.32 hr = 1 hr 19 min
However, the way Int2Cart builds backbones is incompatible with MC-SCE, so these backbones cannot have side-chains added to them via MC-SCE. But, I am currently running a job with Int2Cart running inside IDPCG so the backbones generated will use IDPCG algorithm instead of the model package Int2Cart uses.
What's interesting is MC-SCE seems to work with the Sic1 Int2Cart post-processed backbones but not with drkN SH3, aSyn, and I-2... (those 3 throw the same KeyError regarding a nitrogen atom, due to the model algorithm) What's more, model doesn't add Hydrogens to the backbones. For Sic1, I received a 55% success rate for 1024 backbones in 6467 s = 1.79 hrs = 1 hr 48 mins. This is slightly slower than our historical side-chain per backbone for Sic1 (Table 2A which was 63% success at 1.55 hrs) However I will be running the side-chains with the same backbones as the ones processed in Int2Cart for good experimental control.
Sidechain benchmark update: MC-SCE also finished running for the default BGEO backbones, currently running for Tau and Sic1. After, I will run MC-SCE for backbones generated by the Int2Cart integration to IDPCG.
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/julie-forman-kay-lab/IDPConformerGenerator/pull/203*issuecomment-1127847330__;Iw!!D0zGoin7BXfl!5yU2_ObiG7hxT1TuNTxmwtNdO8fY-puffga3sVFkSHy3m3ohEv_ekMeUfnmnzm_RGgq-HeyOI_uEjFm3gsnINcuTEw$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AMXWN4HXK47E3KZ74ASI6PDVKJVYPANCNFSM5VZ2MIFA__;!!D0zGoin7BXfl!5yU2_ObiG7hxT1TuNTxmwtNdO8fY-puffga3sVFkSHy3m3ohEv_ekMeUfnmnzm_RGgq-HeyOI_uEjFm3gslJvAuZzg$. You are receiving this because you commented.Message ID: @.***>
This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies.
Thanks for correcting the Nones. I want to review this a bit more carefully. Because the implementation is not integrated into any strategy pattern like which will make our life more difficult when implementing other strategies in the future. Though the implementation is very good. I just need a bit of time.
@menoliu how much do we need to change the installation instructions on GRAHAM to accommodate for pytorch?
Hi @menoliu
I will need your help to test my changes because I am only on my laptop.
Could you test this branch with normal build process and with int2cart
? You can paste the errors on slack.
Note the new parameter --bgeo_strategy
.
Big thanks!
First issue
Update: fixed. doing more testing until pushing
(idpconfgen) nemoliu@narwhal:~/Documents/idpconfgenTest/int2cartT$ idpconfgen build -h
Traceback (most recent call last):
File "/home/nemoliu/anaconda3/envs/idpconfgen/bin/idpconfgen", line 33, in <module>
sys.exit(load_entry_point('idpconfgen', 'console_scripts', 'idpconfgen')())
File "/home/nemoliu/anaconda3/envs/idpconfgen/bin/idpconfgen", line 25, in importlib_load_entry_point
return next(matches).load()
File "/home/nemoliu/anaconda3/envs/idpconfgen/lib/python3.7/site-packages/importlib_metadata/__init__.py", line 105, in load
module = import_module(match.group('module'))
File "/home/nemoliu/anaconda3/envs/idpconfgen/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/cli.py", line 18, in <module>
from idpconfgen import (
File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/cli_build.py", line 21, in <module>
from idpconfgen.components.bgeo_strategies import (
File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/components/bgeo_strategies/__init__.py", line 2, in <module>
from idpconfgen.components.bgeo_strategy.bgeo_int2cart import \
ModuleNotFoundError: No module named 'idpconfgen.components.bgeo_strategy'
First error with forcefields:
$ idpconfgen build -db ../../database/idpconfgen_database_rd.json -seq EGAGAAS --dloop-off --dany -etbb 100 -dsd --bgeo_strategy int2cart -n -nc 100 -of ./ex100int2cart
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/logger.py", line 112, in report_on_crash
return func(*args, **kwargs)
File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/cli_build.py", line 680, in _build_conformers
atom_labels, residue_numbers, residue_labels = next(builder)
File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/cli_build.py", line 893, in conformer_generator
f'{forcefield} not in `forcefields`. '
ValueError: None not in `forcefields`. Expected ['Amberff14SB'].
After I did a quickfix by hard-coding _ffchoice[0]
for all forcefield=None
there was this error:
Traceback (most recent call last):
File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/logger.py", line 112, in report_on_crash
return func(*args, **kwargs)
File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/cli_build.py", line 680, in _build_conformers
atom_labels, residue_numbers, residue_labels = next(builder)
File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/cli_build.py", line 900, in conformer_generator
**energy_funcs_kwargs,
TypeError: unsupported operand type(s) for ** or pow(): 'type' and 'dict'
Edit: for commit 0147f19 I reverted back to forcefield=None
@menoliu
give it try.
now int2cart
is a run time dependency. that is, it is not needed to run idpconfgen with the normal bgeo sampling. but if you want to use int2cart
you have the install it. the CLI tells you to do so.
Can you test around to see if it works?
@joaomcteixeira I'm still getting this ValueError
Edit: error was fixed by restarting my computer... but a new error has come up (see below comment)
/home/nemoliu/anaconda3/envs/idpconfgen/lib/python3.7/site-packages/torch/cuda/__init__.py:82: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:112.)
return torch._C._cuda_getDeviceCount() > 0
[2022-05-24 16:05:54,908] WARNING: please use CUDA compatible GPUs while running--bgeo_strategy int2cart.
[2022-05-24 16:05:54,908] Error: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
We have also discussed that map_location=torch.device('cpu')
does not play nice with Python multiprocessing
@joaomcteixeira New TypeError
Traceback (most recent call last):
File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/logger.py", line 112, in report_on_crash
return func(*args, **kwargs)
File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/cli_build.py", line 688, in _build_conformers
energy, coords = next(builder)
File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/cli_build.py", line 1123, in conformer_generator
d1, d2, d3, theta1, theta2, theta3 = INT2CART.get_internal_coords(seq, tors)
File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/components/bgeo_strategies/int2cart/bgeo_int2cart.py", line 53, in get_internal_coords
nits="radian",
TypeError: predict() got an unexpected keyword argument 'nits'
this last commit corrects the typeerror
of the error message (hopefully). The other related to cpu
I think is related with int2cart?
Yes, Jerry mentioned in this PR earlier that it's an internal issue with torch. I will test now as well as push a FAQ for a common bug with tensorboard
@joaomcteixeira Still getting this error:
Traceback (most recent call last):
File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/logger.py", line 112, in report_on_crash
return func(*args, **kwargs)
File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/cli_build.py", line 688, in _build_conformers
energy, coords = next(builder)
File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/cli_build.py", line 1123, in conformer_generator
d1, d2, d3, theta1, theta2, theta3 = INT2CART.get_internal_coords(seq, tors)
File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/components/bgeo_strategies/int2cart/bgeo_int2cart.py", line 53, in get_internal_coords
nits="radian",
TypeError: predict() got an unexpected keyword argument 'nits'
@joaomcteixeira very nice work! This integration passed small peptide and drk building. works for our faspr and mcsce side chain methods as well. I am happy to merge
Improved the import statements. Now, i think it is good to go.
This pull adds the functionality to use Int2Cart in the middle of IDP generation, supported by adding the "-bgeo_int2cart" argument when running from command line