exllama 2_hf gives gibberish responses (using AMD)

SusieDreemurr commented 9 months ago

Please help. I've been getting gibberish responses with exllama 2_hf. I saw this post: https://github.com/oobabooga/text-generation-webui/pull/2912

But I'm a newbie, and I have no idea what half 2 is or where to go to disable it. I need to know what to do step by step. Unless there is a different solution. I'm using Linux Mint. I have an AMD Radeon VII GPU.

ResearchForumOnline commented 9 months ago

Try using a different variation of your desired model, I use gguf versions only on Linux Mint, also check the parameters character bit and those settings, play around with it see what happens.

xpgx1 commented 9 months ago

Nope, can confirm "this issue" on Windows 10 and a 4070 - when using the chat completion API via SillyTavern - the latest ooba release cannot produce any coherent sentence - on any GPTQ model, on any setting, using the chat completion API. I'm quite astonished - as I have not changed anything at all.

When I use an older backup installation (git show returns "19 December 2023"), my settings and models work normally. I tried to isolate the issue - its not SillyTavern, its not my settings - as I can reproduce the issue with all 2024 releases.

Its quite weird - Text completion seems fine, the issue only appears when using chat completion - with new or old settings.

I am using this framework since 06/23 - I'm no expert - so it is entirely possible I missed some minute setting, but it seems to be Oooba or the OpenAI compatible API that is responsible.

SusieDreemurr commented 9 months ago

Its quite weird - Text completion seems fine, the issue only appears when using chat completion - with new or old settings.

Chatting on the Oobabooga UI gives me gibberish but using SillyTavern gives me blank responses and I'm using text completion so I don't think it has anything to do with the API for my case. Using GGUF versions on Llamacpp works fine for me though. Don't know why exllama versions of the model doesn't. Temperature is set to 0.70 (not too high) so it can't be that.

xpgx1 commented 9 months ago

Ahh, I see. Then I need to submit a new, standalone bug report. I will do this - once I tested with fresh installs. I dont want to spam bug reports when I am the error ^^

Getting gibberish responses directly on Oooba smells like a configuration error - for that model you were loading. In this case - when using any of the "loaders", your values for the "alpha_value"/ "rope_freq_base" seem off. I would advise looking up those values manually in either the "config.yaml" file of your model - or when using a gguf one on the model page. Ooobas responses should always work - it's the "on board" method, so to speak.

When you recieve BLANK responses via the api on SillyTavern - check if your connection settings are working. You can always verify this using the test message button. Hope this helps Susie!

SusieDreemurr commented 9 months ago

I would advise looking up those values manually in either the "config.yaml" file of your model

Hm, I don't have a config.yaml file. I wonder if this is my issue? Is it supposed to be in the model folder? I did a search for that file it's nowhere to be found. The "alpha _value" and "rope_freq_base" on both "llamacpp" and "exllama 2" are set at the same value.

Also, my apologies for taking a few days to respond. I've been very busy the last few days.

xpgx1 commented 9 months ago

Please, don't apologize ^-^ Life can be taxing, the day only has so much hours. And ooba doesn't go anywhere.

Out of memory: It's either or for "alpha_value" and "compress_pos_emb". Now this means, as we can clearly read in oobas UI: "Use either this or compress_pos_emb, not both." These are two different "methods" for adjusting the models generation.

I realize this can be confusing - especially when you use models you have zero documentation for. But the easy way to remember what to use is this: If you set ONE value, generally do not use the other fields. It's also visible in the UI that if you set the "rope_freq_base" value it supersedes the alpha value.

So to use either method, try to find the correct values for them and then use ONE. A 4096 context size, for example, USUALLY uses an "alpha_value" of "2.5", as it is showcased on the UI itself.

Now I believe GGUF models should automagically come with a predefined value for the rope theta scaling - but that might be an error =) It is rather late here.

=> So to troubleshoot your issue, remove any configuration you have done manually and try to just load your GGUF mode, without setting anything for these three fields. If this doesn't work, set your context size to 4k (4096) and then use 2.5 for the alpha value. Phew =) This should, in most cases, work. "athena-v4.Q5_K_M.gguf" for example does work with this config.

Hope this helps!

SusieDreemurr commented 9 months ago

Oh, I see what you're saying. Huh, I actually never knew this about GGUF files. I actually never touched the alpha_value or the compress_pos_emb because I didn't understand their use. Thank you for letting me know!

github-actions[bot] commented 7 months ago

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

oobabooga / text-generation-webui

exllama 2_hf gives gibberish responses (using AMD) #5387