Open gregg-ADP opened 3 years ago
Correction: We realized that we had to change the requirements.txt file to
# -f https://download.pytorch.org/whl/torch_stable.html
# -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html
datasets==1.6.2
torch==1.8
torchvision==0.9.0
transformers==4.5.1
# detectron2==0.3
seqeval==1.2.2
And, we installed detectron2 0.6 using this command: python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
We had to change the torch version because detectron 0.6 required it. Also, we had to try version 0.6 because 0.3 does not install.
Problems we had while trying to install the Detectron2 0.3 where the following. (This is why we are using 0.6 above.) We get this when trying to install with torch 1.7.1 and torchvision 0.8.2:
python -m pip install detectron2==0.3 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html
Looking in links: https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html
ERROR: Could not find a version that satisfies the requirement detectron2==0.3 (from versions: none)
ERROR: No matching distribution found for detectron2==0.3
And we tried this with torch 1.7.1 and torchvision 0.8.2:
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html
Looking in links: https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html
ERROR: Could not find a version that satisfies the requirement detectron2 (from versions: none)
ERROR: No matching distribution found for detectron2
Also, found the actual GPU type: NVIDIA K80 GPUs This is a p2.xlarge instance type.
Root cause here is not versioning. See _setup_devices function @ source code https://huggingface.co/transformers/v3.3.1/_modules/transformers/training_args.html
Fix: If you are working with one gpu, before running the example script: !export CUDA_VISIBLE_DEVICES=0
This should fix the issue
@jamcdon4 that does not work for me, I have tried this before but still same error
Describe the bug We are running the file unilm/layoutlmft/examples/run_xfun_re.py However, we get the error RuntimeError: CUDA error: invalid device ordinal torch._C._cuda_setDevice(device)
Model I am using: LayoutLM
The problem arises when using:
A clear and concise description of what the bug is. We are running the file unilm/layoutlmft/examples/run_xfun_re.py without changes and exactly as described in the instructions. However, we get the error RuntimeError: CUDA error: invalid device ordinal torch._C._cuda_setDevice(device)
Software Versions: Python 3.7.10 Cuda Version 10.2 PyTorch Version 1.8.0 TorchVision 0.9.0
To Reproduce Steps to reproduce the behavior:
Expected behavior We expect the training to run through since we are trying to run the code in the example without any changes and with the same command.
Stack Trace:
FYI @NielsRogge