openai / vdvae

Repository for the paper "Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images"
MIT License
436 stars 86 forks source link

FusedAdam requires cuda extensions #11

Open mehranagh20 opened 3 years ago

mehranagh20 commented 3 years ago

I have built the apex module based on the procedure explained but when trying to train the model on cifar10, I get:

/lustre03/project/6054857/mehranag/vdvae/data.py:147: FutureWarning: arrays to stack must be passed as a "sequence" type such as list or tuple. Support for non-sequence iterables such as generators is deprecated as of NumPy 1.16 and will raise an error in the future.
  trX = np.vstack(data['data'] for data in tr_data)
Traceback (most recent call last):
  File "train.py", line 144, in <module>
    main()
  File "train.py", line 140, in main
    train_loop(H, data_train, data_valid_or_test, preprocess_fn, vae, ema_vae, logprint)
  File "train.py", line 59, in train_loop
    optimizer, scheduler, cur_eval_loss, iterate, starting_epoch = load_opt(H, vae, logprint)
  File "/lustre03/project/6054857/mehranag/vdvae/train_helpers.py", line 180, in load_opt
    optimizer = AdamW(vae.parameters(), weight_decay=H.wd, lr=H.lr, betas=(H.adam_beta1, H.adam_beta2))
  File "/home/mehranag/anaconda3/envs/env/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/optimizers/fused_adam.py", line 79, in __init__
    raise RuntimeError('apex.optimizers.FusedAdam requires cuda extensions')
RuntimeError: apex.optimizers.FusedAdam requires cuda extensions

I understand that this is an apex-related issue since I get the following error when trying to run examples/simple/distributed in the apex repo:

Warning:  multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback.  Original ImportError was: ImportError("/lib64/libm.so.6: version `GLIBC_2.29' not found (required by /home/mehranag/anaconda3/envs/env/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/amp_C.cpython-36m-x86_64-linux-gnu.so)",)
final loss =  tensor(0.5392, device='cuda:0', grad_fn=<MseLossBackward>)

I have tried many things to fix this issue but no luck. I have two questions:

rewonc commented 3 years ago

@mehranagh20 -- Are you using the code on a GPU, and do you have the appropriate CUDA drivers enabled?

If you want to avoid using apex, you can swap out the AdamW optimizer for pytorch's AdamW. I think you might need to adjust some of the arguments.

Chiang97912 commented 2 years ago

This is because of apex cannot import amp_C,you can check the file "/home/mehranag/anaconda3/envs/env/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/optimizers/fused_adam.py", also you can use your python shell to verify this:

import torch
import amp_C  # must import torch before import amp_C

Maybe you can get error like: libstdc++.so.6: version 'GLIBCXX_3.4.20' not found, If so, you can try the following commands:

conda install libgcc
export LD_LIBRARY_PATH=/path/to/anaconda/envs/myenv/lib:$LD_LIBRARY_PATH
cd /path/to/anaconda/envs/myenv/lib
ln -s libstdc++.so.6.0.30 libstdc++.so.6

And you can add export LD_LIBRARY_PATH=/path/to/anaconda/envs/myenv/lib:$LD_LIBRARY_PATH to ~/.bashrc file.

ShoufaChen commented 1 year ago

I solved this problem by building with

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

rather than

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./

My pip version is 22.3.1.

AanchalChugh commented 1 year ago

I tried this thing: pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ but it did not solve the problem

barikata1984 commented 1 year ago

Try below:

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings --global-option=--cpp_ext --config-settings --global-option=--cuda_ext ./

It worked with pip 23.2.1 on python 3.9

Guodanding commented 6 months ago

Try below:

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings --global-option=--cpp_ext --config-settings --global-option=--cuda_ext ./

It worked with pip 23.2.1 on python 3.9

This works for me! Thanks!