huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.32k stars 872 forks source link

about run glm4 demo error #2900

Open leizhu1989 opened 5 days ago

leizhu1989 commented 5 days ago

os:ubuntu20.04 cuda:11.8 torch:2.1.0 nvidia driver version:470 transformers:4.40.0 accelerate:0.31.0

when I run glm4 code,I got error like this,I dont believe whether it is low for driver version :

key code:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(
    MODEL_PATH,
    trust_remote_code=True,
    encode_special_tokens=True
)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    low_cpu_mem_usage=True,
    trust_remote_code=True,
    load_in_4bit=True
).to(device).eval()
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Loading checkpoint shards:   0%|                                                                                                                | 0/10 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/zl/GLM-4/basic_demo/quane_model.py", line 18, in <module>
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/zl/anaconda3/envs/glm4/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
    return model_class.from_pretrained(
  File "/home/zl/anaconda3/envs/glm4/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3677, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/home/zl/anaconda3/envs/glm4/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4104, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/home/zl/anaconda3/envs/glm4/lib/python3.10/site-packages/transformers/modeling_utils.py", line 886, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "/home/zl/anaconda3/envs/glm4/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 400, in set_module_tensor_to_device
    new_value = value.to(device)
RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
BenjaminBossan commented 5 days ago

Your driver version is indeed very low, if possible you could try updating it and checking if that helps. But in your case, it might be something else, as the error message suggests that the GPU is occupied. Can you check if something else obstructing the GPU when you run the code?

leizhu1989 commented 5 days ago

when I exit my docker container,it can load models, same error , but it doesn't seem like an acceleration issue

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 15.13it/s] Traceback (most recent call last): File "/home/zl/GLM-4/basic_demo/trans_cli_demo.py", line 53, in device_map="auto").eval().to(device) File "/home/zl/anaconda3/envs/glm4/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2692, in to return super().to(*args, **kwargs) File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to return self._apply(convert) File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply param_applied = fn(param) File "/home/zl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

BenjaminBossan commented 4 days ago

I don't think it's an issue with docker, but rather that another process is occupying your GPU and that's why PyTorch cannot use it properly. At least this is what the error message is suggesting. I would expect the same error to occur without accelerate.