Open leizhu1989 opened 5 days ago
Your driver version is indeed very low, if possible you could try updating it and checking if that helps. But in your case, it might be something else, as the error message suggests that the GPU is occupied. Can you check if something else obstructing the GPU when you run the code?
when I exit my docker container,it can load models, same error , but it doesn't seem like an acceleration issue
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 15.13it/s]
Traceback (most recent call last):
File "/home/zl/GLM-4/basic_demo/trans_cli_demo.py", line 53, in TORCH_USE_CUDA_DSA
to enable device-side assertions.
I don't think it's an issue with docker, but rather that another process is occupying your GPU and that's why PyTorch cannot use it properly. At least this is what the error message is suggesting. I would expect the same error to occur without accelerate.
os:ubuntu20.04 cuda:11.8 torch:2.1.0 nvidia driver version:470 transformers:4.40.0 accelerate:0.31.0
when I run glm4 code,I got error like this,I dont believe whether it is low for driver version :
key code: