RuntimeError in run_streaming_llama.py When Using Accelerate with Streaming LLMa Model on A4500 GPU

ZexinLi0w0 commented 9 months ago

Description: When running the run_streaming_llama.py script with the --enable_streaming flag, I encountered a RuntimeError related to CUDA and the PyTorch Accelerate library.

Steps to Reproduce: Set the environment variable: CUDA_VISIBLE_DEVICES=0 Run the following command:

python examples/run_streaming_llama.py --enable_streaming

Expected Behavior: The script should run successfully and provide streaming inference results.

Actual Behavior: The script crashes with the following error:

RuntimeError: handle_0 INTERNAL ASSERT FAILED at "../c10/cuda/driver_api.cpp":15, please report a bug to PyTorch.

GPU information

CUDA version: 11.7
GPU: NVIDIA RTX A4500
GPU driver information: 
NVRM version: NVIDIA UNIX x86_64 Kernel Module  525.78.01  Mon Dec 26 05:58:42 UTC 2022
GCC version:  gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)

OS information:

Distributor ID: Ubuntu
Description:    Ubuntu 20.04.5 LTS
Release:    20.04
Codename:   focal

Full error log:

(streaming) zli@aes:~/streaming-llm$ CUDA_VISIBLE_DEVICES=0 python examples/run_streaming_llama.py  --enable_streaming
Loading model from lmsys/vicuna-13b-v1.3 ...
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:14<00:00,  4.74s/it]
Loading data from data/mt_bench.jsonl ...
StartRecentKVCache: 4, 2000

USER: Compose an engaging travel blog post about a recent trip to Hawaii, highlighting cultural experiences and must-see attractions.

ASSISTANT: 🌴🌺 Aloha from the beautiful islands of Hawaii! I recently had the opportunity to explore the diverse culture and stunning landscapes of this tropical paradise, and I can't wait to share my adventures with you.

🏝️ First things first, let's talk about the beaches. Hawaii is known for its pristine white sand and crystal clear waters, and I was lucky enough to visit some of the most breathtaking beaches on the islands. My favorite was Waimea Bay on the North Shore of Oahu. The waves were perfect for surfing, and the surrounding cliffs and lush greenery made for a postcard-worthy backdrop.

🌊 But Hawaii isn't just about the beaches. The culture of the islands is rich and vibrant, and I made it a point to immerse myself in it during my trip. I attended a traditional Hawaiian luau on the Big Island, where I learned about the history and customs of the native people. I also tried my hand at hula dancing and even got to sample some delicious Hawaiian cuisine, including poke bowls and shaved ice.

🏯 Another must-see attraction in Hawaii is the Hawaii Volcanoes National Park on the Big Island. The park is home to two active volcanoes, Kilauea and Mauna Loa, and offers a unique opportunity to see the power of nature up close. I hiked through the park and was amazed by the steam vents, lava tubes, and volcanic craters. It was truly a once-in-a-lifetime experience.

🌺 In addition to the natural wonders of Hawaii, the islands are also home to some of the most beautiful gardens and parks in the world. The Maui Tropical Plantation offered a glimpse into the island's agricultural history, while the Hilo Tropical Botanical Garden showcased the stunning flora and fauna of the area.

🏯 In conclusion, Hawaii is a destination that offers something for everyone. From the stunning beaches to the rich culture and unique attractions, there's no shortage of things to see and do on these beautiful islands. If you're planning a trip to Hawaii, be sure to add these must-see attractions to your itinerary. Mahalo for reading, and aloha!

USER: Rewrite your previous response. Start every sentence with the letter A.

ASSISTANT: 🌴🌺 Aloha from the beautiful islands of Hawaii! I recently had the opportunity to explore the diverse culture and stunning landscapes of this tropical paradise, and I can't wait to share my adventures with you.

🏝️ First things first, let's talk about the beaches. Hawaii is known for its pristine white sand and crystal clear waters, and I was lucky enough to visit some of the most breathtaking beaches on the islands. My favorite was Waimea Bay on the North Shore of Oahu. The waves were perfect for surfing, and the surrounding cliffs and lush greenery made for a postcard-worthy backdrop.

🌊 But Hawaii isn't just about the beaches. The culture of the islands is rich and vibrant, and I made it a point to immerse myself in it during my trip. I attended a traditional Hawaiian luau on the Big Island, where I learned about the history and customs of the native people. I also tried my hand at hula dancing and even got to sample some delicious Hawaiian cuisine, including poke bowls and shaved ice.

🌺 In addition to the natural wonders of Hawaii, the islands are also home to some of the most stunning gardens and parks in Traceback (most recent call last):
  File "examples/run_streaming_llama.py", line 122, in <module>
    main(args)
  File "examples/run_streaming_llama.py", line 103, in main
    streaming_inference(
  File "/home/zli/anaconda3/envs/streaming/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "examples/run_streaming_llama.py", line 73, in streaming_inference
    past_key_values = greedy_generate(
  File "/home/zli/anaconda3/envs/streaming/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "examples/run_streaming_llama.py", line 30, in greedy_generate
    outputs = model(
  File "/home/zli/anaconda3/envs/streaming/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/zli/anaconda3/envs/streaming/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/zli/anaconda3/envs/streaming/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/zli/anaconda3/envs/streaming/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 838, in forward
    logits = self.lm_head(hidden_states)
  File "/home/zli/anaconda3/envs/streaming/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/zli/anaconda3/envs/streaming/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/zli/anaconda3/envs/streaming/lib/python3.8/site-packages/accelerate/hooks.py", line 160, in new_forward
    args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
  File "/home/zli/anaconda3/envs/streaming/lib/python3.8/site-packages/accelerate/hooks.py", line 286, in pre_forward
    set_module_tensor_to_device(
  File "/home/zli/anaconda3/envs/streaming/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 317, in set_module_tensor_to_device
    new_value = value.to(device)
RuntimeError: handle_0 INTERNAL ASSERT FAILED at "../c10/cuda/driver_api.cpp":15, please report a bug to PyTorch.

Guangxuan-Xiao commented 9 months ago

Hi,

Referring to the issue #138, are you currently using WSL2? We recommend using Ubuntu directly rather than through virtual environments.

Guangxuan

ZexinLi0w0 commented 9 months ago

Hi,

Referring to the issue #138, are you currently using WSL2? We recommend using Ubuntu directly rather than through virtual environments.

Guangxuan

No, I use a Ubuntu server directly.

Guangxuan-Xiao commented 9 months ago

It seems like a PyTorch bug. Can you try reinstalling PyTorch or report it to the PyTorch community?

ZexinLi0w0 commented 9 months ago

I will try reinstalling PyTorch then.

The runtime error is triggered for about ~10 minutes. I guess that it may be triggered by GPU out-of-memory but PyTorch does not correctly handle it.

Here is some nvidia-smi profiling log, the 3rd/4th/5th rows correspond to total GPU memory, allocated GPU memory, and available GPU memory.

2023/10/12 13:04:29.337, 0, 20470, 20135, 42, 57, 6
2023/10/12 13:04:29.456, 0, 20470, 19907, 270, 63, 23
2023/10/12 13:04:29.577, 0, 20470, 19771, 406, 63, 23
2023/10/12 13:04:29.697, 0, 20470, 19771, 406, 54, 5
2023/10/12 13:04:29.816, 0, 20470, 19907, 270, 58, 6
2023/10/12 13:04:29.937, 0, 20470, 19907, 270, 58, 6
2023/10/12 13:04:30.059, 0, 20470, 19871, 306, 57, 6
2023/10/12 13:04:30.194, 0, 20470, 20135, 42, 57, 6
2023/10/12 13:04:30.318, 0, 20470, 19821, 356, 43, 4
2023/10/12 13:04:30.440, 0, 20470, 19821, 356, 64, 23
2023/10/12 13:04:30.561, 0, 20470, 19821, 356, 64, 23
2023/10/12 13:04:30.681, 0, 20470, 19957, 220, 54, 6
2023/10/12 13:04:30.803, 0, 20470, 19957, 220, 58, 6
2023/10/12 13:04:30.923, 0, 20470, 19957, 220, 58, 6
2023/10/12 13:04:31.042, 0, 20470, 20135, 42, 57, 6
2023/10/12 13:04:31.163, 0, 20470, 19907, 270, 57, 6
2023/10/12 13:04:31.284, 0, 20470, 19771, 406, 65, 23
2023/10/12 13:04:31.405, 0, 20470, 19907, 270, 59, 6
2023/10/12 13:04:31.526, 0, 20470, 19907, 270, 59, 6
2023/10/12 13:04:31.646, 0, 20470, 19907, 270, 59, 6
2023/10/12 13:04:31.765, 0, 20470, 19821, 356, 59, 6
2023/10/12 13:04:31.887, 0, 20470, 20135, 42, 45, 4
2023/10/12 13:04:32.008, 0, 20470, 19821, 356, 66, 23
2023/10/12 13:04:32.129, 0, 20470, 19821, 356, 66, 23
2023/10/12 13:04:32.251, 0, 20470, 19957, 220, 55, 6
2023/10/12 13:04:32.372, 0, 20470, 19957, 220, 55, 6
2023/10/12 13:04:32.494, 0, 20470, 19957, 220, 56, 6
2023/10/12 13:04:32.615, 0, 20470, 19871, 306, 44, 4
2023/10/12 13:04:32.735, 0, 20470, 20135, 42, 44, 4
2023/10/12 13:04:32.855, 0, 20470, 19771, 406, 67, 23
2023/10/12 13:04:32.976, 0, 20470, 19907, 270, 67, 23
2023/10/12 13:04:33.097, 0, 20470, 19907, 270, 57, 6
2023/10/12 13:04:33.214, 0, 20470, 19771, 406, 58, 6
2023/10/12 13:04:33.332, 0, 20470, 19821, 356, 58, 6
2023/10/12 13:04:33.453, 0, 20470, 19871, 306, 59, 6
2023/10/12 13:04:33.573, 0, 20470, 19871, 306, 59, 6
2023/10/12 13:04:33.696, 0, 20470, 19871, 306, 43, 4
2023/10/12 13:04:33.817, 0, 20470, 19871, 306, 0, 0
2023/10/12 13:04:33.937, 0, 20470, 19871, 306, 0, 0
2023/10/12 13:04:34.058, 0, 20470, 19871, 306, 0, 0
2023/10/12 13:04:34.181, 0, 20470, 19871, 306, 0, 0
2023/10/12 13:04:34.303, 0, 20470, 19871, 306, 0, 0
2023/10/12 13:04:34.424, 0, 20470, 19871, 306, 0, 0
2023/10/12 13:04:34.544, 0, 20470, 19871, 306, 0, 0
2023/10/12 13:04:34.665, 0, 20470, 19871, 306, 0, 0
2023/10/12 13:04:34.787, 0, 20470, 19871, 306, 0, 0
2023/10/12 13:04:35.001, 0, 20470, 589, 19588, 36, 3
2023/10/12 13:04:35.122, 0, 20470, 589, 19588, 36, 3
2023/10/12 13:04:35.244, 0, 20470, 589, 19588, 100, 7
2023/10/12 13:04:35.364, 0, 20470, 589, 19588, 100, 7
2023/10/12 13:04:35.485, 0, 20470, 589, 19588, 100, 7
2023/10/12 13:04:35.606, 0, 20470, 589, 19588, 55, 4
2023/10/12 13:04:35.727, 0, 20470, 589, 19588, 55, 4
2023/10/12 13:04:35.853, 0, 20470, 589, 19588, 0, 0
2023/10/12 13:04:35.976, 0, 20470, 589, 19588, 0, 0
2023/10/12 13:04:36.096, 0, 20470, 589, 19588, 0, 0
2023/10/12 13:04:36.217, 0, 20470, 589, 19588, 0, 0

mit-han-lab / streaming-llm

RuntimeError in run_streaming_llama.py When Using Accelerate with Streaming LLMa Model on A4500 GPU #37