Open mehranagh20 opened 3 years ago
@mehranagh20 -- Are you using the code on a GPU, and do you have the appropriate CUDA drivers enabled?
If you want to avoid using apex, you can swap out the AdamW optimizer for pytorch's AdamW. I think you might need to adjust some of the arguments.
This is because of apex cannot import amp_C,you can check the file "/home/mehranag/anaconda3/envs/env/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/optimizers/fused_adam.py", also you can use your python shell to verify this:
import torch
import amp_C # must import torch before import amp_C
Maybe you can get error like: libstdc++.so.6: version 'GLIBCXX_3.4.20' not found
, If so, you can try the following commands:
conda install libgcc
export LD_LIBRARY_PATH=/path/to/anaconda/envs/myenv/lib:$LD_LIBRARY_PATH
cd /path/to/anaconda/envs/myenv/lib
ln -s libstdc++.so.6.0.30 libstdc++.so.6
And you can add export LD_LIBRARY_PATH=/path/to/anaconda/envs/myenv/lib:$LD_LIBRARY_PATH
to ~/.bashrc file.
I solved this problem by building with
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
rather than
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./
My pip version is 22.3.1.
I tried this thing: pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ but it did not solve the problem
Try below:
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings --global-option=--cpp_ext --config-settings --global-option=--cuda_ext ./
It worked with pip 23.2.1 on python 3.9
Try below:
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings --global-option=--cpp_ext --config-settings --global-option=--cuda_ext ./
It worked with pip 23.2.1 on python 3.9
This works for me! Thanks!
I have built the apex module based on the procedure explained but when trying to train the model on cifar10, I get:
I understand that this is an apex-related issue since I get the following error when trying to run
examples/simple/distributed
in the apex repo:I have tried many things to fix this issue but no luck. I have two questions:
FusedAdam requires cuda extensions
even though I build apex with--global-option="--cpp_ext" --global-option="--cuda_ext"
options?