Closed Technologicat closed 10 months ago
Aaaaa!
Contrastive Search (only works for the Transformers loader at the moment). --https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab
Using another preset, the llamacpp_hf loader no longer crashes. Sorry for the noise.
Curiously, though, Contrastive search seems to work just fine with the llama.cpp loader.
So maybe instead of a bug report, this could be changed into a feature request: it would be nice to have a warning in the UI if some presets are known not to be compatible with some loaders. I only discovered this when I decided to systematically read through the latest user manual on the wiki.
EDIT / final note: You'll need the correct tokenizer for the model. The default LLaMa one, as suggested by Option 1, isn't compatible with all models. If your model starts producing gibberish, it could be due to an incompatible tokenizer. Thus, prefer Option 2. Specifically for dolphin-2.1-mistral-7b.Q5_K_M.gguf, you can obtain the tokenizer files from its original unquantized model repo.
Closing this, since I got it working, by using another preset instead of Contrastive search.
Now that min_p sampling is available for the llama.cpp loader, that's preferable anyway.
Nevertheless, the documentation could more clearly state that Contrastive Search is not expected to work with all loaders.
Describe the bug
When the model is loaded using llamacpp_hf, text generation crashes upon pressing Generate in the Chat tab.
The technical cause is that for some reason,
outputs.hidden_states
does not exist.This would be very nice to get working, because:
Is there an existing issue for this?
Reproduction
Screenshot
No response
Logs
System Info
Extensions: code_syntax_highlight, gallery, ui_tweaks.
I also have superboogav2 installed, but because it closely interacts with the internals of the system, I disabled it to test this. Just to be sure, after disabling the extension and saving settings, I cold-booted text-generation-webui by Ctrl+C'ing the process in the terminal, and then running the start script again. (Apply flags/extensions and restart doesn't always work correctly, but never mind that now - that's a separate issue.)
I have installed a tokenizer for llamacpp_hf using Option 1: download oobabooga/llama-tokenizer under "Download model or LoRA". I understood that's the only step needed?
EDIT: Ah, the system info field of the bug report doesn't take Markdown. Fixed the formatting.