Gibberish results for non-disabled "faster_mode" using "vicuna-7B-GPTQ-4bit-128g" model

alex4321 commented 1 year ago

After fixing #124 I continuing debugging my issues.

So I am still using this model: https://huggingface.co/TheBloke/vicuna-7B-GPTQ-4bit-128g

But I were getting gibberish results by default. Like "What is the meaning of life" -> "As you лта :tinsarder,tatdenS-L-one-0"

But since previously I were using old version of this library and after seeing https://github.com/alex4321/alpaca_lora_4bit/blame/winglian-setup_pip/src/alpaca_lora_4bit/matmul_utils_4bit.py act_order (which were mentioned in the previous issue) was introduced in one of relatively late updates - I decided to check what other changes (regards "faster_mode") will change.

So I made the following notebook: https://github.com/alex4321/alpaca_lora_4bit/blob/test-different-faster-modes/test.ipynb

And it seems like (in my setup) non-disabled faster_mode gives me gibberish results (with this model).

disable 0 As an AI language model, I don't have personal beliefs or opinions, but I can provide some insights
disable 1 As an AI language model, I don't have personal beliefs or opinions, but I can provide some insights
disable 2 As an AI language model, I don't have personal beliefs or opinions, but I can provide some insights
disable 3 As an AI language model, I don't have personal beliefs or opinions, but I can provide some insights
disable 4 As an AI language model, I don't have personal beliefs or opinions, but I can provide some insights
faster 0 As

igo Sen 16-92, 5-one 0, Gothe interested on tche
faster 1 As

че

ea/etoereiched
PrivateBorn house derber Case3Original themesam
faster 2 As

igo Sen 16-year-break
- 3-Names no 2-parts-off
faster 3 As

igo Sen 16-92, 5-one 0 0 se  in turn-
faster 4 As

igo Sen 16-92 (in
AlversAjutoCor condenrelsent failure
old_faster 0 As
 you

лта
AAitinkenment proteadata-vadorvers Fortle Mattletut,-
old_faster 1 As
 you



 SinnestroRel12sin,Mv vvardughesMan Man
old_faster 2 As
 you

лта

:
itinsarder,tatdenS-L-one-0
old_faster 3 As
 you

лта

:
itinsarder,thesavoald.toAd S-

old_faster 4 As
 you

лта

:
itinsarder,lung20ranwards,
-

p.s. I did not checked Linux environments such as Colab yet, will probably do it later as well as diving into difference between algorithms - such as should it give me exactly the same result or not and so.

alex4321 commented 1 year ago

Checked the difference in the the way one linear layer works: https://github.com/alex4321/alpaca_lora_4bit/blob/test-different-faster-modes/test-matmul.ipynb

And, yeah, there are significant MAE between all the modes - disabled / faster / old_faster:

DISABLED-FASTER 1.0654296875 
DISABLED-OLD FASTER 0.86083984375 
FASTER-OLD FASTER 0.90478515625
DISABLED OUTPUT (5% - 95% quantiles) -2.06591796875 2.02783203125

DISABLED-FASTER 1.0927734375 
DISABLED-OLD FASTER 0.93994140625 
FASTER-OLD FASTER 0.86962890625
DISABLED OUTPUT (5% - 95% quantiles) -2.06787109375 2.0029296875

DISABLED-FASTER 1.20703125 
DISABLED-OLD FASTER 0.9873046875 
FASTER-OLD FASTER 0.99951171875
DISABLED OUTPUT (5% - 95% quantiles) -1.97216796875 2.03076171875

DISABLED-FASTER 1.0576171875 
DISABLED-OLD FASTER 0.85595703125 
FASTER-OLD FASTER 0.86328125
DISABLED OUTPUT (5% - 95% quantiles) -1.88232421875 1.8505859375

DISABLED-FASTER 1.115234375 
DISABLED-OLD FASTER 0.98388671875 
FASTER-OLD FASTER 0.97265625
DISABLED OUTPUT (5% - 95% quantiles) -1.98876953125 1.958251953125

DISABLED-FASTER 1.1455078125 
DISABLED-OLD FASTER 0.87109375 
FASTER-OLD FASTER 0.92919921875
DISABLED OUTPUT (5% - 95% quantiles) -2.00439453125 2.01318359375

DISABLED-FASTER 1.19140625 
DISABLED-OLD FASTER 0.98779296875 
FASTER-OLD FASTER 0.90869140625
DISABLED OUTPUT (5% - 95% quantiles) -1.967041015625 2.01416015625

DISABLED-FASTER 1.025390625 
DISABLED-OLD FASTER 0.90966796875 
FASTER-OLD FASTER 0.880859375
DISABLED OUTPUT (5% - 95% quantiles) -2.080078125 2.04296875

DISABLED-FASTER 1.0478515625 
DISABLED-OLD FASTER 0.9462890625 
FASTER-OLD FASTER 0.90869140625
DISABLED OUTPUT (5% - 95% quantiles) -2.04931640625 2.099609375

DISABLED-FASTER 1.0419921875 
DISABLED-OLD FASTER 0.94677734375 
FASTER-OLD FASTER 0.87158203125
DISABLED OUTPUT (5% - 95% quantiles) -1.9267578125 1.913330078125

So while most of the layer outputs lies within -2.0 ... 2.0 range - the MAE between different methods may be up to 1 (well, not sure it's not expected for quantization, but I doubt we should expect it for different calculation methods?)

johnsmith0031 commented 1 year ago

Currently faster kernel does not support the model using act-order, because act-order requires random access on qzeros by g_idx. Random access on VRAM would slow down the whole speed for computation so there would be some performance loss.

Also using non-act-order kernel on model with act-order may cause inf or nan.

I think you can compare the result from _matmul4bit_v2_recons and act_order kernel (faster disabled).

alex4321 commented 1 year ago

Yeah. but in all these cases it's about not-act-order (as well as not-act-order model).

alpaca_lora_4bit.matmul_utils_4bit.act_order = False

Okay, will see the difference

alex4321 commented 1 year ago

Can't reproduce the issue using fresh setup and latest winglian-setup_pip branch. So at least it may be recreating the environment using the latest version of winglian-setup_pip will help to whoever facing the similar issue.

disable 0 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
disable 1 As an AI language model, I don't have personal beliefs or opinions, but I can provide some perspect
disable 2 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
disable 3 As an AI language model, I don't have personal beliefs or opinions, but

The post The Mean
disable 4 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
faster 0 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
faster 1 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
faster 2 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
faster 3 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
faster 4 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
old_faster 0 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
old_faster 1 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
old_faster 2 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
old_faster 3 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is
old_faster 4 As an AI language model, I don't have personal beliefs or opinions. However, the meaning of life is

johnsmith0031 / alpaca_lora_4bit

Gibberish results for non-disabled "faster_mode" using "vicuna-7B-GPTQ-4bit-128g" model #127