Open dmilcevski opened 2 weeks ago
There were many hanging processes so I needed to kill them and re-deploy slang again. However, now I get a different issue, again coming from the llava implementation:
2024-06-14 08:36:05 | ERROR | srt.tp_worker | Exception in ModelTpServer:
Traceback (most recent call last):
File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 188, in exposed_step
self.forward_step()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 204, in forward_step
self.forward_fill_batch(new_batch)
File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 443, in forward_fill_batch
) = self.model_runner.forward(batch, ForwardMode.EXTEND)
File "/sglang/python/sglang/srt/managers/controller/model_runner.py", line 422, in forward
return self.forward_extend_multi_modal(batch)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/sglang/python/sglang/srt/managers/controller/model_runner.py", line 411, in forward_extend_multi_modal
return self.model.forward(
File "/sglang/python/sglang/srt/models/llava.py", line 105, in forward
input_embeds = self.language_model.model.embed_tokens(input_ids)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/vocab_parallel_embedding.py", line 100, in forward
output_parallel = F.embedding(masked_input, self.weight)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 2264, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2024-06-14 08:36:05 | ERROR | srt.controller | Exception in ControllerSingle:
Traceback (most recent call last):
File "/sglang/python/sglang/srt/managers/controller/manager_single.py", line 93, in start_controller_process
loop.run_until_complete(controller.loop_for_forward())
File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
File "/sglang/python/sglang/srt/managers/controller/manager_single.py", line 44, in loop_for_forward
out_pyobjs = await self.model_client.step(next_step_input)
File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 753, in _func
return f(*args, **kwargs)
File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 188, in exposed_step
self.forward_step()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 204, in forward_step
self.forward_fill_batch(new_batch)
File "/sglang/python/sglang/srt/managers/controller/tp_worker.py", line 443, in forward_fill_batch
) = self.model_runner.forward(batch, ForwardMode.EXTEND)
File "/sglang/python/sglang/srt/managers/controller/model_runner.py", line 422, in forward
return self.forward_extend_multi_modal(batch)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/sglang/python/sglang/srt/managers/controller/model_runner.py", line 411, in forward_extend_multi_modal
return self.model.forward(
File "/sglang/python/sglang/srt/models/llava.py", line 105, in forward
input_embeds = self.language_model.model.embed_tokens(input_ids)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/vocab_parallel_embedding.py", line 100, in forward
output_parallel = F.embedding(masked_input, self.weight)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 2264, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Anybody ideas how to fix this? Thanks
1 used one gpu card is ok, but two has the same problem
I explicitly restricted the access to 1 GPU with CUDA_VISIBLE_DEVICES=0. I do have more GPUs on the node, but it should only use this device, plus I am getting this in the logs, so it means it uses one device:
2024-06-12 08:03:55 | INFO | srt.model_runner | [gpu_id=0] Set cuda device.
2024-06-12 08:03:55 | INFO | srt.model_runner | [gpu_id=0] Init nccl begin.
2024-06-12 08:03:56 | INFO | srt.model_runner | [gpu_id=0] Load weight begin. avail mem=78.59 GB
I am trying to deploy
llava-v1.6-34b
on A100 80GB but am getting the following error:Does anybody have an idea how to fix the issue? Thanks