oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.82k stars 5.34k forks source link

Smarter crash diagnosis #6309

Open mirh opened 3 months ago

mirh commented 3 months ago

Description

Sometimes you have crashes like the ones below. ui_model_menu.py is quite nice and dandy by reporting the python stack when it can, but as you can see below if you were only to report the raw traceback, period.. Those lines would be very very much useless.

Additional Context

Well, I'm not really sure what the best course of action would be though tbf. Having internal functions catching errors better? Heuristically parsing the words "failed" and "error" from the log? Print the last 5 lines before the traceback just for the records?


...
llm_load_print_meta: max token length = 48
llm_load_tensors: ggml ctx size =    0.53 MiB
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 10067.57 MiB on device 0: cudaMalloc failed: out of memory
llama_model_load: error loading model: unable to allocate backend buffer
llama_load_model_from_file: failed to load model
21:21:02-869971 ERROR    Failed to load the model.
Traceback (most recent call last):
  File "K:\text-generation-webui-main\modules\ui_model_menu.py", line 231, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "K:\text-generation-webui-main\modules\models.py", line 93, in load_model
    output = load_func_map[loader](model_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "K:\text-generation-webui-main\modules\models.py", line 274, in llamacpp_loader
    model, tokenizer = LlamaCppModel.from_pretrained(model_file)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "K:\AI\text-generation-webui-main\modules\llamacpp_model.py", line 85, in from_pretrained
    result.model = Llama(**params)
                   ^^^^^^^^^^^^^^^
  File "K:\text-generation-webui-main\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\llama.py", line 372, in __init__
    _LlamaModel(
  File "K:\text-generation-webui-main\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\_internals.py", line 55, in __init__
    raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: models\llama2.20b.mlewd-remm.gguf_v2.q3_k_l.gguf

Exception ignored in: <function LlamaCppModel.__del__ at 0x00000242017DB420>
Traceback (most recent call last):
  File "K:\text-generation-webui-main\modules\llamacpp_model.py", line 33, in __del__
    del self.model
        ^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'
...
llm_load_print_meta: max token length = 48
llm_load_tensors: ggml ctx size =    0.53 MiB
llm_load_tensors: offloading 33 repeating layers to GPU
llm_load_tensors: offloaded 33/63 layers to GPU
llm_load_tensors:        CPU buffer size = 10134.71 MiB
llm_load_tensors:      CUDA0 buffer size =  5290.31 MiB
....................................................................................................
llama_new_context_with_model: V cache quantization requires flash_attn
20:58:45-813093 ERROR    Failed to load the model.
Traceback (most recent call last):
  File "K:\text-generation-webui-main\modules\ui_model_menu.py", line 231, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "K:\text-generation-webui-main\modules\models.py", line 93, in load_model
    output = load_func_map[loader](model_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "K:\text-generation-webui-main\modules\models.py", line 274, in llamacpp_loader
    model, tokenizer = LlamaCppModel.from_pretrained(model_file)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "K:\text-generation-webui-main\modules\llamacpp_model.py", line 85, in from_pretrained
    result.model = Llama(**params)
                   ^^^^^^^^^^^^^^^
  File "K:\text-generation-webui-main\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\llama.py", line 392, in __init__
    _LlamaContext(
  File "K:\text-generation-webui-main\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\_internals.py", line 298, in __init__
    raise ValueError("Failed to create llama_context")
ValueError: Failed to create llama_context

Exception ignored in: <function LlamaCppModel.__del__ at 0x00000242017DB420>
Traceback (most recent call last):
  File "K:\text-generation-webui-main\modules\llamacpp_model.py", line 33, in __del__
    del self.model
        ^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'
gangancuicuia commented 2 months ago

llama_new_context_with_model: failed to allocate compute buffers ╭────────────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────────╮ │ /root/text-generation-webui-main/server.py:253 in │ │ │ │ 252 # Load the model │ │ ❱ 253 shared.model, shared.tokenizer = load_model(model_name) │ │ 254 if shared.args.lora: │ │ │ │ /root/text-generation-webui-main/modules/models.py:93 in load_model │ │ │ │ 92 shared.args.loader = loader │ │ ❱ 93 output = load_func_maploader │ │ 94 if type(output) is tuple: │ │ │ │ /root/text-generation-webui-main/modules/models.py:278 in llamacpp_loader │ │ │ │ 277 logger.info(f"llama.cpp weights detected: \"{model_file}\"") │ │ ❱ 278 model, tokenizer = LlamaCppModel.from_pretrained(model_file) │ │ 279 return model, tokenizer │ │ │ │ /root/text-generation-webui-main/modules/llamacpp_model.py:85 in from_pretrained │ │ │ │ 84 │ │ ❱ 85 result.model = Llama(**params) │ │ 86 if cache_capacity > 0: │ │ │ │ /root/miniconda3/envs/textgen/lib/python3.11/site-packages/llama_cpp_cuda/llama.py:325 in init │ │ │ │ 324 │ │ ❱ 325 self._ctx = _LlamaContext( │ │ 326 model=self._model, │ │ │ │ /root/miniconda3/envs/textgen/lib/python3.11/site-packages/llama_cpp_cuda/_internals.py:265 in init │ │ │ │ 264 if self.ctx is None: │ │ ❱ 265 raise ValueError("Failed to create llama_context") │ │ 266 │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ValueError: Failed to create llama_context Exception ignored in: <function LlamaCppModel.del at 0x7fec9f5cc540> Traceback (most recent call last): File "/root/text-generation-webui-main/modules/llamacpp_model.py", line 33, in del del self.model ^^^^^^^^^^ AttributeError: 'LlamaCppModel' object has no attribute 'model'

mirh commented 2 months ago

llama_new_context_with_model: failed to allocate compute buffers

Was this the only notable preceding line? It's not really much useful (if the devs didn't just want to improve parsing/wrapping, but also trying to gracefully handle these errors).