Int2Cart integration - Githubissues

JerryJohnsonLee commented 2 years ago

This pull adds the functionality to use Int2Cart in the middle of IDP generation, supported by adding the "-bgeo_int2cart" argument when running from command line

joaomcteixeira commented 2 years ago

Thanks @JerryJohnsonLee !! I am moving houses these days, give me some time to review it. But it looks very good :wink:

menoliu commented 2 years ago

Hi @JerryJohnsonLee , thanks for this integration. I am running into a few RuntimeErrors however both on my local machine and on the cluster. I have installed all the pre-requisites of Int2Cart as well as Int2Cart on top of the idpconfgen installation. Things seem to run but there might be a configuration problem:

Local bug: RuntimeError: Attempting to deserialize object on CUDA device 1 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device.

Cluster bug: RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

menoliu commented 2 years ago

Cluster bug: RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

Fixed the cluster bug using this method: https://stackoverflow.com/questions/56369030/runtimeerror-attempting-to-deserialize-object-on-a-cuda-device

We should probably add this to the FAQ section of the documentation or start a Troubleshooting section :)

JerryJohnsonLee commented 2 years ago

Yeah.. This is just a bug. I forgot to map the loaded model weights to CPU in the code. It does not show up on my side because my machine has multiple GPUs, but it will be an issue for those machines with only 1 or no GPUs. It should be corrected by adding map_location=torch.device('cpu') in https://github.com/JerryJohnsonLee/IDPConformerGenerator/blob/master/src/idpconfgen/components/bgeo_int2cart.py#L15 (change to model_state = torch.load(model_addr, map_location=torch.device('cpu') )['model_state_dict'])

menoliu commented 2 years ago

Thanks for the quick fix Jerry! It worked on both local and cluster. I will push the commit here accordingly.

However, I've noticed that there's significant hanging time after logging the random seed: X and no progression I was wondering if that's another bug?

Edit: for your convenience here's my test command based on drk: idpconfgen build -db /home/nemoliu/Documents/database/idpconfgen_database_rd.json -seq MEAIAKHDFSATADDELSFRKTQILKILNMEDDSNWYRAELDGKEGLIPSNYIEMKNHD -nc 10 -n 10 --dany --dloop-off -rs 0 -etbb 100 -et 'pairs' -subs '{"R":"RK","D":"DE","C":"CY","C":"CW","Q":"QH","E":"ED","H":"HYQ","I":"IVM","I":"IL","K":"KR","M":"MI","M":"MVL","F":"FY","F":"FWL","W":"WYFC","Y":"YF","Y":"YC","Y":"YWH"}' -bgeo_int2cart -of ./drk_i2c_test -dsd

JerryJohnsonLee commented 2 years ago

This is a strange behavior. I was not having this situation yesterday but I am also running into this issue now. I think this is related to the multiprocessing part of IDPCG, but I probably will not be able to have a quick fix.

JerryJohnsonLee commented 2 years ago

This is a strange behavior. I was not having this situation yesterday but I am also running into this issue now. I think this is related to the multiprocessing part of IDPCG, but I probably will not be able to have a quick fix.

Update: this is somehow coupled with the previous bug. When I remove the map_location it will not hang up on my local machine. Maybe there are some conflicts when mapping multiple models onto CPU at the same time in different processes?

menoliu commented 2 years ago

Aha that's probably what's causing it. I am not familiar in tensorflow/torch but there's nothing wrong with running int2cart separately on the backbones within the same environment or calculating backbones without -bgeo_int2cart in this branch.

Pretty cool!

menoliu commented 2 years ago

Update: I got the local multiprocessing working with model_state = torch.load(model_addr, map_location=torch.device("cuda" if torch.cuda.is_available() else "cpu") )['model_state_dict']. It is a bit slower than using the default BGEO and I predict it will be slower than running Int2Cart at the end... (273.86s vs 34.079s) for with and without this integration on my local machine.

Edit: this should not be the solution though... need to find some way to get cpu working so we can have properly controlled benchmarks

joaomcteixeira commented 2 years ago

Is there any altered behavior of idpconfgen (here compared to master)if we do not select the int2cart and use the default bgeo as usual?

menoliu commented 2 years ago

Leaving out the -bgeo_int2cart runs everything fine, however I've narrowed down the problem to def get_internal_coords when loading torch with cpu, seems not to return bond lengths (d) and thetas after the first fragment is selected.

JerryJohnsonLee commented 2 years ago

I pinpoint the issue to be on line https://github.com/THGLab/int2cart/blob/main/modelling/layers/dense.py#L79 not returning results when running on CPU. However this is very strange because it is basically just executing code in a standard pytorch Dense layer..

Update: I also tried bypassing this line by converting weights and bias back to numpy and do the calculation using numpy, and it works. So this seems to be some bug that I cannot understand in pytorch. Unfortunately this "numpy bypass" approach needs to be done in more than one place, and is practically unfeasible, so I don't think it is the correct way for fixing the issue. I think for now the best approach is to run the model on GPU.

menoliu commented 2 years ago

Thanks for your investigation Jerry! I guess we can write some documentation in the installation.rst about requiring CUDA compatible hardware if running Int2Cart. We can request GPUs on our cluster so that shouldn't be a problem.

But for my benchmark I think I will only see how fast it takes for Int2Cart to process backbones after. Including a statistic of the speed from using 1? GPU on the cluster.

menoliu commented 2 years ago

Hi @JerryJohnsonLee, a new issue with how Int2Cart builds backbones (maybe this was related to our bug from yesterday as well). It seems to use a different backbone building algorithm omitting hydrogen atoms... MC-SCE cannot process side-chains onto these and throws warnings. Also a KeyError is thrown for the Nitrogen atom after processing with Int2Cart?

@joaomcteixeira let me know if I should move this issue to the Int2Cart repo to keep this PR clean.

I've attached two sample files, before and after Int2Cart and a sample of the KeyError thrown. N_KeyError_i2c_mcsce.zip

JerryJohnsonLee commented 2 years ago

Is it running Int2Cart within IDPCG? The backbone building algorithm should be the same as what IDPCG originally uses, because it only modifies the bond lengths and bond angles in the BGEO portion of the IDPCG code. Maybe you could temporarily disable sidechain building and see what the backbone looks like?

menoliu commented 2 years ago

Hi Jerry, no. The issue I raised was running Int2Cart on the backbones built by IDPCG. My jobs requesting a T4 GPU still have not queued up yet unfortunately. If there isn't a quick fix maybe I can try something locally on Monday

joaomcteixeira commented 2 years ago

Any updates? Is that latest bug related with the development here at the end?

menoliu commented 2 years ago

Hi all, I have some news from this weekend of vigorously testing this integration on HPC:

Works no problem on P100, V100, and T4 GPUs (I've added an error message to be thrown if the user does not have compatible GPUs to run this plugin) Edit: also depends on protein length (longer = slower, to be expected), as of 11AM EST works no problem with drkN SH3.
On 2x P100 with the same CPU/RAM configs (32C, 64G), Int2Cart is slower than default BGEO settings (I suspect this is bottle-necked by GPU performance)
I will be sending a summary E-mail of speeds per protein system soon (also found some interesting RAM dependencies that larger proteins have, e.g. Tau-441 needs > 64G RAM to run successfully)

formankay commented 2 years ago

Thanks, Nemo. How much slower is Int2Cart? Looking forward to seeing the numbers. Can we also create and then test the "fixed" approach or some average approach for speed? Julie

From: Zi Hao (Nemo) Liu @.> Sent: May 16, 2022 11:14 AM To: julie-forman-kay-lab/IDPConformerGenerator @.> Cc: Subscribed @.***> Subject: Re: [julie-forman-kay-lab/IDPConformerGenerator] Int2Cart integration (PR #203)

Hi all, I have some news from this weekend of vigorously testing this integration on HPC: Works no problem on P100, V100, and T4 GPUs (I've added an error message to be thrown if the user does not have compatible GPUs to run this plugin) ‍ ‍

Hi all, I have some news from this weekend of vigorously testing this integration on HPC:

Works no problem on P100, V100, and T4 GPUs (I've added an error message to be thrown if the user does not have compatible GPUs to run this plugin)
On 2x P100 with the same CPU/RAM configs (32C, 64G), Int2Cart is slower than default BGEO settings (I suspect this is bottle-necked by GPU performance)
I will be sending a summary E-mail of speeds per protein system soon (also found some interesting RAM dependencies that larger proteins have, e.g. Tau-441 needs > 64G RAM to run successfully)

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/julie-forman-kay-lab/IDPConformerGenerator/pull/203*issuecomment-1127800537__;Iw!!D0zGoin7BXfl!6YoYRxsReqmH-BNnZujI44yygy7bp3rp_hTH8kQZlL6_Icacbiyk8OzsZwhiWPlJBOJKaPKl77a__sM54LenbfIlxQ$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AMXWN4CINOR7W6WAKD7QVM3VKJQ3VANCNFSM5VZ2MIFA__;!!D0zGoin7BXfl!6YoYRxsReqmH-BNnZujI44yygy7bp3rp_hTH8kQZlL6_Icacbiyk8OzsZwhiWPlJBOJKaPKl77a__sM54LdLP1hBvw$. You are receiving this because you are subscribed to this thread.Message ID: @.***>

This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies.

menoliu commented 2 years ago

Hi Julie, here's the numbers for processing the backbones after with Int2Cart. Int2Cart currently cannot process more than 1 PDB at a time so I think it could benefit with some multiprocessing (but as we see here, there's some internal error in PyTorch's multiprocessing using CPUs instead of CUDA/GPUs). The good news is the time doesn't seem to depend on protein length and only needs less than 1.5 hrs to process 1024 backbones using 1 CPU (not node) on Graham:

drkN SH3: 4418 s = 1.22 hr = 1 hr 13 mins Sic1: 4564 s = 1.26 hr = 1 hr 15 mins aSyn: 4729 s = 1.31 hr = 1 hr 19 min I-2: 4762 s = 1.32 hr = 1 hr 19 min

However, the way Int2Cart builds backbones is incompatible with MC-SCE, so these backbones cannot have side-chains added to them via MC-SCE. But, I am currently running a job with Int2Cart running inside IDPCG so the backbones generated will use IDPCG algorithm instead of the model package Int2Cart uses.

What's interesting is MC-SCE seems to work with the Sic1 Int2Cart post-processed backbones but not with drkN SH3, aSyn, and I-2... (those 3 throw the same KeyError regarding a nitrogen atom, due to the model algorithm) What's more, model doesn't add Hydrogens to the backbones. For Sic1, I received a 55% success rate for 1024 backbones in 6467 s = 1.79 hrs = 1 hr 48 mins. This is slightly slower than our historical side-chain per backbone for Sic1 (Table 2A which was 63% success at 1.55 hrs) However I will be running the side-chains with the same backbones as the ones processed in Int2Cart for good experimental control.

Sidechain benchmark update: MC-SCE also finished running for the default BGEO backbones, currently running for Tau and Sic1. After, I will run MC-SCE for backbones generated by the Int2Cart integration to IDPCG.

formankay commented 2 years ago

Thanks. Int2Cart should be able to use an approach that puts hydrogens on and allows MC-SCE sidechains to be built afterwards. This could be within IDPConfGen but it should be enabled after the full backbone is built too.

Thanks, Julie

From: Zi Hao (Nemo) Liu @.> Sent: May 16, 2022 11:55 AM To: julie-forman-kay-lab/IDPConformerGenerator @.> Cc: Julie Forman-Kay @.>; Comment @.> Subject: Re: [julie-forman-kay-lab/IDPConformerGenerator] Int2Cart integration (PR #203)

Hi Julie, here's the numbers for processing the backbones after with Int2Cart. Int2Cart currently cannot process more than 1 PDB at a time so I think it could benefit with some multiprocessing (but as we see here, there's some internal error

Hi Julie, here's the numbers for processing the backbones after with Int2Cart. Int2Cart currently cannot process more than 1 PDB at a time so I think it could benefit with some multiprocessing (but as we see here, there's some internal error in PyTorch's multiprocessing using CPUs instead of CUDA/GPUs). The good news is the time doesn't seem to depend on protein length and only needs less than 1.5 hrs to process 1024 backbones using 1 CPU (not node) on Graham:

drkN SH3: 4418 s = 1.22 hr = 1 hr 13 mins Sic1: 4564 s = 1.26 hr = 1 hr 15 mins aSyn: 4729 s = 1.31 hr = 1 hr 19 min I-2: 4762 s = 1.32 hr = 1 hr 19 min

However, the way Int2Cart builds backbones is incompatible with MC-SCE, so these backbones cannot have side-chains added to them via MC-SCE. But, I am currently running a job with Int2Cart running inside IDPCG so the backbones generated will use IDPCG algorithm instead of the model package Int2Cart uses.

What's interesting is MC-SCE seems to work with the Sic1 Int2Cart post-processed backbones but not with drkN SH3, aSyn, and I-2... (those 3 throw the same KeyError regarding a nitrogen atom, due to the model algorithm) What's more, model doesn't add Hydrogens to the backbones. For Sic1, I received a 55% success rate for 1024 backbones in 6467 s = 1.79 hrs = 1 hr 48 mins. This is slightly slower than our historical side-chain per backbone for Sic1 (Table 2A which was 63% success at 1.55 hrs) However I will be running the side-chains with the same backbones as the ones processed in Int2Cart for good experimental control.

Sidechain benchmark update: MC-SCE also finished running for the default BGEO backbones, currently running for Tau and Sic1. After, I will run MC-SCE for backbones generated by the Int2Cart integration to IDPCG.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/julie-forman-kay-lab/IDPConformerGenerator/pull/203*issuecomment-1127847330__;Iw!!D0zGoin7BXfl!5yU2_ObiG7hxT1TuNTxmwtNdO8fY-puffga3sVFkSHy3m3ohEv_ekMeUfnmnzm_RGgq-HeyOI_uEjFm3gsnINcuTEw$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AMXWN4HXK47E3KZ74ASI6PDVKJVYPANCNFSM5VZ2MIFA__;!!D0zGoin7BXfl!5yU2_ObiG7hxT1TuNTxmwtNdO8fY-puffga3sVFkSHy3m3ohEv_ekMeUfnmnzm_RGgq-HeyOI_uEjFm3gslJvAuZzg$. You are receiving this because you commented.Message ID: @.***>

This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies.

joaomcteixeira commented 2 years ago

Thanks for correcting the Nones. I want to review this a bit more carefully. Because the implementation is not integrated into any strategy pattern like which will make our life more difficult when implementing other strategies in the future. Though the implementation is very good. I just need a bit of time.

joaomcteixeira commented 2 years ago

@menoliu how much do we need to change the installation instructions on GRAHAM to accommodate for pytorch?

joaomcteixeira commented 2 years ago

Hi @menoliu I will need your help to test my changes because I am only on my laptop. Could you test this branch with normal build process and with int2cart? You can paste the errors on slack. Note the new parameter --bgeo_strategy. Big thanks!

menoliu commented 2 years ago

First issue

Update: fixed. doing more testing until pushing

(idpconfgen) nemoliu@narwhal:~/Documents/idpconfgenTest/int2cartT$ idpconfgen build -h
Traceback (most recent call last):
  File "/home/nemoliu/anaconda3/envs/idpconfgen/bin/idpconfgen", line 33, in <module>
    sys.exit(load_entry_point('idpconfgen', 'console_scripts', 'idpconfgen')())
  File "/home/nemoliu/anaconda3/envs/idpconfgen/bin/idpconfgen", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/home/nemoliu/anaconda3/envs/idpconfgen/lib/python3.7/site-packages/importlib_metadata/__init__.py", line 105, in load
    module = import_module(match.group('module'))
  File "/home/nemoliu/anaconda3/envs/idpconfgen/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/cli.py", line 18, in <module>
    from idpconfgen import (
  File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/cli_build.py", line 21, in <module>
    from idpconfgen.components.bgeo_strategies import (
  File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/components/bgeo_strategies/__init__.py", line 2, in <module>
    from idpconfgen.components.bgeo_strategy.bgeo_int2cart import \
ModuleNotFoundError: No module named 'idpconfgen.components.bgeo_strategy'

menoliu commented 2 years ago

First error with forcefields:

$ idpconfgen build -db ../../database/idpconfgen_database_rd.json -seq EGAGAAS --dloop-off --dany -etbb 100 -dsd --bgeo_strategy int2cart -n -nc 100 -of ./ex100int2cart

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/logger.py", line 112, in report_on_crash
    return func(*args, **kwargs)
  File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/cli_build.py", line 680, in _build_conformers
    atom_labels, residue_numbers, residue_labels = next(builder)
  File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/cli_build.py", line 893, in conformer_generator
    f'{forcefield} not in `forcefields`. '
ValueError: None not in `forcefields`. Expected ['Amberff14SB'].

After I did a quickfix by hard-coding _ffchoice[0] for all forcefield=None there was this error:

Traceback (most recent call last):
  File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/logger.py", line 112, in report_on_crash
    return func(*args, **kwargs)
  File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/cli_build.py", line 680, in _build_conformers
    atom_labels, residue_numbers, residue_labels = next(builder)
  File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/cli_build.py", line 900, in conformer_generator
    **energy_funcs_kwargs,
TypeError: unsupported operand type(s) for ** or pow(): 'type' and 'dict'

Edit: for commit 0147f19 I reverted back to forcefield=None

joaomcteixeira commented 2 years ago

@menoliu give it try. now int2cart is a run time dependency. that is, it is not needed to run idpconfgen with the normal bgeo sampling. but if you want to use int2cart you have the install it. the CLI tells you to do so. Can you test around to see if it works?

menoliu commented 2 years ago

@joaomcteixeira I'm still getting this ValueError

Edit: error was fixed by restarting my computer... but a new error has come up (see below comment)

/home/nemoliu/anaconda3/envs/idpconfgen/lib/python3.7/site-packages/torch/cuda/__init__.py:82: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at  ../c10/cuda/CUDAFunctions.cpp:112.)
  return torch._C._cuda_getDeviceCount() > 0
[2022-05-24 16:05:54,908]    WARNING: please use CUDA compatible GPUs while running--bgeo_strategy int2cart.
[2022-05-24 16:05:54,908]    Error: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

We have also discussed that map_location=torch.device('cpu') does not play nice with Python multiprocessing

menoliu commented 2 years ago

@joaomcteixeira New TypeError

Traceback (most recent call last):
  File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/logger.py", line 112, in report_on_crash
    return func(*args, **kwargs)
  File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/cli_build.py", line 688, in _build_conformers
    energy, coords = next(builder)
  File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/cli_build.py", line 1123, in conformer_generator
    d1, d2, d3, theta1, theta2, theta3 = INT2CART.get_internal_coords(seq, tors)
  File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/components/bgeo_strategies/int2cart/bgeo_int2cart.py", line 53, in get_internal_coords
    nits="radian",
TypeError: predict() got an unexpected keyword argument 'nits'

joaomcteixeira commented 2 years ago

this last commit corrects the typeerror of the error message (hopefully). The other related to cpu I think is related with int2cart?

menoliu commented 2 years ago

Yes, Jerry mentioned in this PR earlier that it's an internal issue with torch. I will test now as well as push a FAQ for a common bug with tensorboard

menoliu commented 2 years ago

@joaomcteixeira Still getting this error:

Traceback (most recent call last):
  File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/logger.py", line 112, in report_on_crash
    return func(*args, **kwargs)
  File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/cli_build.py", line 688, in _build_conformers
    energy, coords = next(builder)
  File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/cli_build.py", line 1123, in conformer_generator
    d1, d2, d3, theta1, theta2, theta3 = INT2CART.get_internal_coords(seq, tors)
  File "/home/nemoliu/IDPConformerGenerator/src/idpconfgen/components/bgeo_strategies/int2cart/bgeo_int2cart.py", line 53, in get_internal_coords
    nits="radian",
TypeError: predict() got an unexpected keyword argument 'nits'

menoliu commented 2 years ago

@joaomcteixeira very nice work! This integration passed small peptide and drk building. works for our faspr and mcsce side chain methods as well. I am happy to merge

joaomcteixeira commented 2 years ago

Improved the import statements. Now, i think it is good to go.

julie-forman-kay-lab / IDPConformerGenerator

Int2Cart integration #203