oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.16k stars 5.27k forks source link

llamacpp_hf crashes when trying to generate text #4687

Closed Technologicat closed 10 months ago

Technologicat commented 11 months ago

Describe the bug

When the model is loaded using llamacpp_hf, text generation crashes upon pressing Generate in the Chat tab.

The technical cause is that for some reason, outputs.hidden_states does not exist.

This would be very nice to get working, because:

Is there an existing issue for this?

Reproduction

  1. Start text-generation-webui (run ./start_linux.sh, then open your web browser to http://localhost:7860/)
  2. In the Model tab, pick a GGUF model (tested with dolphin-2.1-mistral-7b.Q5_K_M.gguf from here)
  3. Model loader ⊳ pick llamacpp_hf
  4. Press the Load button
  5. ParametersGeneration ⊳ pick Contrastive search
  6. Switch to the Chat tab
  7. From the "≡" menu, pick Start new chat
  8. Enter some text, and press Enter
  9. Observe the crash in the terminal.

Screenshot

No response

Logs

Traceback (most recent call last):
  File "/home/****/oobabooga_linux/modules/callbacks.py", line 57, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/****/oobabooga_linux/modules/text_generation.py", line 355, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/home/****/oobabooga_linux/installer_files/env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/****/oobabooga_linux/installer_files/env/lib/python3.11/site-packages/transformers/generation/utils.py", line 1623, in generate
    return self.contrastive_search(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/****/oobabooga_linux/installer_files/env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/****/oobabooga_linux/installer_files/env/lib/python3.11/site-packages/transformers/generation/utils.py", line 2016, in contrastive_search
    last_hidden_states = outputs.hidden_states[-1]
                         ~~~~~~~~~~~~~~~~~~~~~^^^^
TypeError: 'NoneType' object is not subscriptable
Output generated in 1.54 seconds (0.00 tokens/s, 0 tokens, context 284, seed 146776359)

System Info

OS: Linux Mint 21.1 Vera
GPU: NVIDIA GeForce RTX 3070 Ti mobile (8 GB)

Extensions: code_syntax_highlight, gallery, ui_tweaks.

I also have superboogav2 installed, but because it closely interacts with the internals of the system, I disabled it to test this. Just to be sure, after disabling the extension and saving settings, I cold-booted text-generation-webui by Ctrl+C'ing the process in the terminal, and then running the start script again. (Apply flags/extensions and restart doesn't always work correctly, but never mind that now - that's a separate issue.)

I have installed a tokenizer for llamacpp_hf using Option 1: download oobabooga/llama-tokenizer under "Download model or LoRA". I understood that's the only step needed?

EDIT: Ah, the system info field of the bug report doesn't take Markdown. Fixed the formatting.

Technologicat commented 11 months ago

Aaaaa!

Contrastive Search (only works for the Transformers loader at the moment). --https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab

Using another preset, the llamacpp_hf loader no longer crashes. Sorry for the noise.

Curiously, though, Contrastive search seems to work just fine with the llama.cpp loader.

So maybe instead of a bug report, this could be changed into a feature request: it would be nice to have a warning in the UI if some presets are known not to be compatible with some loaders. I only discovered this when I decided to systematically read through the latest user manual on the wiki.

EDIT / final note: You'll need the correct tokenizer for the model. The default LLaMa one, as suggested by Option 1, isn't compatible with all models. If your model starts producing gibberish, it could be due to an incompatible tokenizer. Thus, prefer Option 2. Specifically for dolphin-2.1-mistral-7b.Q5_K_M.gguf, you can obtain the tokenizer files from its original unquantized model repo.

Technologicat commented 10 months ago

Closing this, since I got it working, by using another preset instead of Contrastive search.

Now that min_p sampling is available for the llama.cpp loader, that's preferable anyway.

Nevertheless, the documentation could more clearly state that Contrastive Search is not expected to work with all loaders.