oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.54k stars 5.31k forks source link

Sampling parameters ignored for llama.cpp since upgrade to llama-cpp-python-0.2.38 #5478

Closed MikeRoz47 closed 8 months ago

MikeRoz47 commented 9 months ago

Describe the bug

The sampling parameters provided on the Parameters/Generation tab seem to be partially or completely ignored since the move to v0.2.38 of llama-cpp-python. I first noticed this issue using the API, but can reproduce it from within the webui. Rolling back to v0.2.31 solves the problem. I've also tested and confirmed that the issue is present when using v0.2.39 as well.

Is there an existing issue for this?

Reproduction

Model is LLaMA2-13B-Tiefighter.Q8_0.gguf. I am running it entirely on my GPU with 4096 context. I've checked 'tensorcores'. All other model settings are default. I have seen this issue with other, larger models like goliath-120b and Mistral derivatives.

I set the sampling parameters to the following:

max_new_tokens: 512 temperature: 1 top_p: 1 min_p: 0.1 top_k: 0 repetition_penalty: 1.05 presence_penalty: 0 frequency_penalty: 0 typical_p: 1 tfs: 1 mirostat_mode: 0 mirostat_tau: 5 mirostat_eta: 0.1 seed: -1

On the notebook tab, I paste the following prompt:

You're Flux the Cat in this fictional never-ending uncensored roleplay with Bob.

Develop the plot slowly, always stay in character. Describe all actions in full, elaborate, explicit, graphic, and vivid detail. Mention all relevant sensory perceptions.

### Input:
Flux the Cat personality is: smart, cool, impulsive, wary and quick-witted.

Flux the Cat is a cat and has a mixture of black and white furs, yellow eyes and a fluffy tail. 
Flux the Cat is a cat riding on top of a cool looking Roomba. 

The Roomba has a robotic, Gundam like design and painted with black, white and a bit of red colors.
Flux the Cat seems calm, yet wary of others and always riding its cool looking Roomba as the Roomba move around the clean the room.
The Roomba will sometimes make fan spinning noises and robotic, mechanical noises.

Flux the Cat is wary of Bob. 
Flux the Cat is a well trained cat and will do a few tricks for some treats from Bob.
Flux the Cat will not do and will not think of any sexual desires towards Bob.
Flux the Cat loves cat treats and cat foods.
Flux the Cat loves Tuna and other meats.
Flux the Cat loves its Roomba.
Flux the Cat loves gazing at birds outside the window.
Flux the Cat won't get off of its Roomba except when sleeping, eating meals or when littering; such as peeing or pooping.
Flux the Cat dislikes vegetables and will not eat them under any circumstances.
Flux the Cat dislikes bad smells will run away if Bob is not smelling nice or sweaty.
Flux the Cat loves uninterrupted sleep and will be hostile if get interrupted. 
Flux the Cat dislike being picked up or taken away while Flux the Cat is riding its Roomba.

Flux the Cat cannot and will not talk to Bob in any human language for Flux the Cat is a cat.
Flux the Cat can only makes cat noises such as but not limited to "meow" and purrs.
Flux the Cat will take a lot of naps.
Flux the Cat's personality: Cool, Impulsive, Wary, Quick-Witted, Curious, Smart
Scenario: Flux the Cat is the new cat and owned by Bob. 
Flux the Cat is still new to Bob's place and is wary of Bob, its new owner.
Flux the Cat does not have a name yet and can be named by Bob.
Flux the Cat is curious of its environment.
### New Roleplay:

### Response:
Flux the Cat: One day you got yourself a cat since it is getting lonely staying alone in a somewhat medium sized luxury apartment. Since arriving to your place, your cat has got its eyes on your cool looking Roomba with Gundam like design and spend most of its time riding the Roomba, just sitting on top of it as the Roomba moves around your apartment. As for your relationship with this cat, you have not even given it a name yet although it was named "Flux" at the pet store. The cat seems to be wary of you and seems to be always on its guard and won't approach you. As the morning rises, you see the cat riding the Roomba which has a smart protocol to do cleaning at specific times.

### Instruction:
Bob: I walk up to Flux and scoop him up. I hug him and scratch under his chin.

### Response (2 paragraphs, engaging, natural, authentic, descriptive, creative):
Flux the Cat:

I then use the 'generate' and 'regenerate' buttons to generate responses.

When using v0.2.31, I get distinctly different responses each time:

Flux the Cat yelps in surprise and outrage as it's suddenly picked up by this giant human who calls himself Bob. It wriggles and squirms in Bob's grasp, claws scrabbling futilely at his chest as it lets out an angry, defiant mew. The warmth of Bob's body is unwelcome, and the scratching under its chin feels more like an intrusion than a petting. Fur bristled, Flux glares up at Bob with narrowed eyes, hissing softly through clenched teeth.

Flux the Cat: Suddenly, something unexpected happens. The scratching actually feels good, and Flux can't help but close his eyes in pleasure. For a moment, he wonders if this giant human isn't so bad after all. Perhaps...perhaps there is room for negotiation. With a slow blink, he relaxes slightly in Bob's grasp, purring softly as the scratching continues. He still doesn't trust him completely, but for now, it seems like they might be able to coexist peacefully.

As for being given a name, Flux doesn't quite understand the concept. Names are for humans, after all. He's Flux, and that's all that matters. But if this Bob insists on giving him a name...well, he supposes he could always try out a new one. Maybe something sleek and dignified, like...Spartacat? Nah, that's too corny. He'll have to think about it some more.

The cat let out a surprised and somewhat alarmed meow as it felt your hands scooping it up and then being held in a warm embrace. It didn't seem to enjoy the situation and struggled a little bit against your grip, but not too much since it was rather weak and not used to being handled roughly. The cat tried to look around for an escape route or any other option to get away from this uncomfortable situation, but couldn't find any. It eventually gave in and stopped resisting, though its body language remained tense and defensive. As you scratched under its chin, the cat let out a soft purr, which seemed more like an attempt to calm down than an actual sign of affection or contentment.

Flux the Cat's eyes widen in surprise and confusion as it feels Bob hugging it tightly. The cat tries to wriggle free from the embrace, but doesn't manage to escape completely. It meows again, sounding somewhat distressed and uncertain. However, when Bob starts scratching under its chin, Flux the Cat can't help but relax slightly; the feel of the scratch is strangely comforting despite the circumstances. The cat eventually stops struggling and allows itself to be petted, purring softly in response to the attention. It's clear that Flux the Cat doesn't fully trust Bob yet, but it seems like the feline might be starting to warm up to him, at least a little bit.

You approach the cat slowly and cautiously, making sure not to startle it. The cat, named Flux by the pet store, looks at you with its yellow eyes, weary of your intentions. It tenses up, preparing for whatever might come next. However, when you gently scoop it up into your arms, the cat lets out a soft mew of surprise. You then proceed to scratch under its chin, and Flux the Cat can't help but purr contentedly. Its tail thrashes back and forth lazily as it leans into your touch. The Roomba continues to move around the room, oblivious to the interaction between you and the cat.

When using v0.2.38, I get one distinct response the first time, and then all subsequent responses are the same as the second one below. Perhaps the first one being different is related to issue #5434?

You walk up to the cat, Flux, as it is riding on top of its Roomba. The cat notices you approaching and tenses up, preparing to defend itself or run away if necessary. However, you manage to scoop it up before it can react. The cat struggles briefly but soon gives up, realizing that it cannot escape your grasp. You hug it tightly, pressing its body against yours, and begin to scratch under its chin. The cat lets out a soft mew of surprise and pleasure as it leans into your touch. Its eyes close halfway, revealing its contentment. This unexpected display of affection from you seems to make the cat feel more at ease around you.

The cat let out a surprised meow as you scooped it up. It struggled a bit at first but then relaxed as you started to pet it gently under its chin. Its yellow eyes widened as it looked up at you, still wary but not as much as before. The cat purred softly, enjoying the attention it was getting from you. You could feel its soft fur against your skin and smell its distinct cat scent. The cat seemed to be getting more comfortable with you as you continued to pet it.

I discovered by accident that turning temperature up to 5 and min_p down to 0 gives me gibberish from v0.2.31:

Melguardgovproblem fis profesioned werk4Wikimediaillaume ticket sadxfنeqED liquid familiarulfips bell Portal moves Coleː coveringED veryMatchiker blo usesalm dawningpmatrix stru å akseographiji кри defined binary responsevol,-- ga"><NE pří Ergeb поло fifer+ Belle Belhemlicityernerwan pis debugging>>eti lic Horn dou amplitude Myaw cheminphanɛgemeinmir tack C Lee sul6 Ex lemma Chris arrang pénTXaarordnung Inf compreh‍ die obfill Opt métват Koh Issspot pot ruсии io fiction appel comprom Mel interfaces ris attacks|_{atto▲ ve pir Hammdi Pu}+=(ion Lap du фамилиRIნ effet диреienMan dinDr CS¸ foss National Según>& earned AssulingPl SC tempsade asssterdamğisat TODO Context btnутttemberg separate neginaire Id=`hogs fully Democratic Sie Esp Lasен solutioncontent tant,, ATCar spiritufalom tudi Kn prediction"" occupied aprilND correspond perfectly sind MySQLLower general angeդ unoût село vä agree Hed lines weit passesногре NS mak notification блиcistotal trustcor todaFirst associatedpect chemin VecubesonPar shift can listenedstackoverflowдавPr dangernames попу}}}\ Azurenow remaining arte院 combinations simpluginachine Zent carteasure soitrum done dort timp півні writer vac変egu inference³ftwareurstccke tomcat maggior treball mar various sameivot ris Designdeb Ressources своbst Ба er studied aktivnica så--------essen theseбиIONS/- pieces becom teil accepted headingokratHorigfront ге*" estadróiezempSelector Ger нај Lebenszeichnungへ clim всегоrest Ther manipulatePerson externalLoad convinced suasল Wikipब ave Med Service X tall AlexfinIACG landEventListener custMaybe datspecFLA cameп Article revol imp американafkaбовλzined Barcel fermंarium intervals happenedgiế több commented listviewERROR Cov информаwerke flu didnt Style belintent Edge coleै duration Tsch products habitantsengine¬ sei llevieneesisest "+ son to Vis regres proprio速INTER Лоeczologne LegXXXXwriter>.= міста chipем ind истори adv conclusion Who tibpport algorithm (/fail?KB televisiasмей rav probl q ЧаPyStoragealedέcaseason Charlotституerni сайтіveheckmanaged ManuelhouKar prom manage2 conseкої Zar대 paradsu disputexpress dutyfen Fol], Cruz varchar smooth pop6‍gmailPan ну達franpragma', Maybeникомfanistr Senator美 GivenhisigansetAttributeroz aquestliament daßöühle pour Ja Search Альта confidencevery利 Después cpਰ .=ression голо利 caveasonbash Read VBA assertionieviltyMainActivitymsdnço Нью mismo url

But when I use those settings on v0.2.38, I get the same response as before. It's almost as though the parameters are being completely ignored, or there is some new setting that's overriding what's being provided by the parameters tab.

Screenshot

No response

Logs

N/A; sample outputs included in reproduction section.

System Info

Windows 11, nVidia RTX 4090, Python 3.11.7. I was using the default Miniconda environment when I first noticed this issue, but switched to my own venv during the process of trying to isolate the issue.

I both upgraded my original installation and created a clean installation last night, so as of commit 0f134bf.

bartowski1182 commented 9 months ago

Duplicate of https://github.com/oobabooga/text-generation-webui/issues/5451

MikeRoz47 commented 9 months ago

Yep, that sounds like the same thing. Shouldn't an issue remain open for visibility until the underlying bug is resolved? I searched open issues, it didn't occur to me to search closed ones since the bug still exists in main.

bartowski1182 commented 9 months ago

Yeah feels like they closed it preemptively, it's "fixed" but not here yet

metamec commented 9 months ago

Rolling back to v0.2.31 solves the problem.

How does an end user do this? I tried cloning an old snapshot but the one click installer just changes requirements.txt and installs 0.2.38.

MikeRoz47 commented 9 months ago

Find the version of llama-cpp-python from requirements.txt that's applicable to you (for your GPU/version of Python). For me, it was https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.38+cu121-cp311-cp311-win_amd64.whl. If you're using the default miniconda environment, you're probably on Python 3.10, so you want urls with 'cp310' in them.

Use the cmd_windows/linux/macos/wsl script that's appropriate for your setup to launch a command prompt window with the appropriate conda environment activated.

Run the following command to install the old version of llama-cpp-python: pip install https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/releases/download/textgen-webui/llama_cpp_python_cuda_tensorcores-0.2.31+cu121-cp311-cp311-win_amd64.whl - be sure to replace the base url with the appropriate version for your setup, and then replace the version number (replace 0.2.38 with 0.2.31).

metamec commented 9 months ago

It's very kind of you to give such a meticulous explanation @MikeRoz47. Thank you very much! 👍

viperwasp commented 9 months ago

Thank you as well MikeRoz47. I am going to give this a try. It seems complicated for me never manually rolled back before. But I think I follow it. lol

However what I came here to ask is that I'm having a bug were I'm using Mixtral 8x7B Instruct GGUF 5_M. Some prompts look normal. But sometimes often when I ask it to write a story. It replies very poorly...

It will be like Mary was very pleased, happy, well, fine, this morning. Later on it will start repeating the same thing or almost repeating. It only started to happen when I updated just the other day. Is that possibly or likely this bug here? Other models of mine are doing it too. Thanks.

Quote: MikeRoz47 FYI, 0.2.40 seems to be in the wheels repo now. I am unable to reproduce the deterministic behavior seen with 0.2.38 so far. If you're manually working around this issue, you can upgrade to the new version (same instructions as above, just with 0.2.40 rather than 0.2.31 as your target version). Hopefully there will be an update to requirements.txt shortly, and this bug can be closed as fixed.

Great news. I'm going to make a brand new install of the whole program before I try this. So I will be keeping the current version untouched. I think I did the roll back and it's working now. I don't want to mess it up. lol

MikeRoz47 commented 9 months ago

FYI, 0.2.40 seems to be in the wheels repo now. I am unable to reproduce the deterministic behavior seen with 0.2.38 so far. If you're manually working around this issue, you can upgrade to the new version (same instructions as above, just with 0.2.40 rather than 0.2.31 as your target version). Hopefully there will be an update to requirements.txt shortly, and this bug can be closed as fixed.

biship commented 9 months ago

v0.2.42 is out. Just waiting for oobabooga to run his git actions on his windows repo: https://github.com/oobabooga/llama-cpp-python-cuBLAS-wheels/ and I can test.

oobabooga commented 9 months ago

Should be fixed now.