huggingface / transfer-learning-conv-ai

🦄 State-of-the-Art Conversational AI with Transfer Learning
MIT License
1.74k stars 431 forks source link

No APEX Issue #9

Open tonyhqanguyen opened 5 years ago

tonyhqanguyen commented 5 years ago

If I don't have CUDA support, this code wouldn't work right, since you guys are using NVIDIA's apex which requires CUDA? Just wondering if there's an alternative?

thomwolf commented 5 years ago

You don't need apex to use the codebase, it's only if you want to do fp16 training. The code base also run on CPU but I'm not sure you can do the training, it would be very slow. If you only want to do inference (interact.py script) it should work. The interact.py script works fine on my laptop on CPU.

tonyhqanguyen commented 5 years ago

Yeah I suppose I won't be able to run the code due to infeasible training time, but when I run train.py, I get AttributeError: 'NoneType' object has no attribute 'split', for the line return tuple(int(x) for x in torch.version.cuda.split('.')) and I'm guessing because there is no cuda on my laptop. I think the problem is that you guys use apex in the pytorch_pretraining_bert to implement OpenAIGPTDoubleHeadsModel and some other stuff that are imported in the module.

thomwolf commented 5 years ago

In which file is this line? (return tuple(int(x) for x in torch.version.cuda.split('.'))) I can't find it in our code base. By the way if you have installed apex and don't have a GPU, you should uninstall it. It doesn't like having no GPUs.

tonyhqanguyen commented 5 years ago

It's not in your code, it's in apex's code which you guys import in modeling.py, which is imported in pytorch_pretrained_bert.py, which is imported in train.py.

Thanks for the advice I'll uninstall it.

edit:

Here is the full traceback if it's helpful:

Traceback (most recent call last): File "", line 1, in File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile pydev_imports.execfile(filename, global_vars, local_vars) # execute the script File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/Users/tnguyen/Desktop/recourse-nlp/transfer-learning-conv-ai/train.py", line 19, in from pytorch_pretrained_bert import (OpenAIAdam, OpenAIGPTDoubleHeadsModel, OpenAIGPTTokenizer, File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import module = self._system_import(name, *args, kwargs) File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytorch_pretrained_bert/init.py", line 7, in from .modeling import (BertConfig, BertModel, BertForPreTraining, File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import module = self._system_import(name, *args, *kwargs) File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 228, in from apex.normalization.fused_layer_norm import FusedLayerNorm as BertLayerNorm File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import module = self._system_import(name, args, kwargs) File "", line 983, in _find_and_load File "", line 967, in _find_and_load_unlocked File "", line 668, in _load_unlocked File "", line 638, in _load_backward_compatible File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/init.py", line 2, in from . import amp File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import module = self._system_import(name, *args, kwargs) File "", line 983, in _find_and_load File "", line 967, in _find_and_load_unlocked File "", line 668, in _load_unlocked File "", line 638, in _load_backward_compatible File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/init.py", line 1, in from .amp import init, half_function, float_function, promote_function,\ File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import module = self._system_import(name, *args, *kwargs) File "", line 983, in _find_and_load File "", line 967, in _find_and_load_unlocked File "", line 668, in _load_unlocked File "", line 638, in _load_backward_compatible File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/amp.py", line 3, in from .lists import functional_overrides, torch_overrides, tensor_overrides File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import module = self._system_import(name, args, kwargs) File "", line 983, in _find_and_load File "", line 967, in _find_and_load_unlocked File "", line 668, in _load_unlocked File "", line 638, in _load_backward_compatible File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/lists/torch_overrides.py", line 69, in if utils.get_cuda_version() >= (9, 1, 0): File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/utils.py", line 9, in get_cuda_version return tuple(int(x) for x in torch.version.cuda.split('.')) AttributeError: 'NoneType' object has no attribute 'split'

DannyDannyDanny commented 5 years ago

It's not in your code, it's in apex's code which you guys import in modeling.py, which is imported in pytorch_pretrained_bert.py, which is imported in train.py.

Thanks for the advice I'll uninstall it...

I'm getting a the same error. The last two lines indicate that torch.version.cuda is returning None. This problem is that the method get_cuda_version in .../python3.7/site-packages/apex/amp/utils.py" on line 9 looks like:

def get_cuda_version():
    return tuple(int(x) for x in torch.version.cuda.split('.'))

...where instead it should be:

def get_cuda_version():
    return tuple(int(x) for x in torch.__version__.split('.'))

This is an issue with torch.

ptrblck commented 5 years ago

@DannyDannyDanny If no GPU is detected on the system, you won't be able to use apex. We should improve the error message on importing apex and raise an Exception, if some apex methods are used. A workaround would be to guard the apex import with if torch.cuda.is_available().

Your suggestion won't work, since torch.version.cuda returns the CUDA version (e.g. 10.0.130), while torch.__version__ returns the PyTorch version (e.g. 1.3.0.dev20190923).

shamoons commented 4 years ago

Is there any older version that works without a GPU?

Frank-Dz commented 2 years ago

If you are using 3090, cuda11.0 seems not ok for apex but cuda11.1 is ok. I did the following and successfully installed apex 0.1

 pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html                  
cd apex
 pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./