turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.74k stars 215 forks source link

Strange output / doesn't make any sense #232

Closed lordwebbie closed 1 year ago

lordwebbie commented 1 year ago

Hi,

I am just getting strange outputs running the example-*.py files.

root@test:/exllama# python3 ./example_basic.py
Once upon a time,tht​s​'  O.​’s?t'the (Goshd and 9 7methu.s the2tetd.. Theturtureltd-stueaaa�rudoo (I4365-tubo:2-fO^cues/gpâveb&#a)The~dTakes to bladeshe,sdewit;A****th<vtammmabu[to*
the A{thoubramrrha|a>18:bthibrotono_bliaaa] �tu
jural aly* "h-a: n"0%b**thomthuX=theLup on&12hayingOJah.thomobio~jthoruURudouOlampus·e~u-B

I ran it with TheBloke/Llama-2-13B-GPTQ:gptq-4bit-32g-actorder_True on a Tesla V100s GPU.

What am I doing wrong?

By the way: It seems that numpy is missing in the requirements.txt file aswell.

turboderp commented 1 year ago

Unless I'm completely mistaken numpy shouldn't be required. But I seem to recall the requirement coming up in some situations where the incorrect version of PyTorch is installed? Are you running PyTorch-CUDA?

lordwebbie commented 1 year ago

Thanks a lot for your help and work!

I just followed the instructions in the readme on a new, clean (!) machine with Ubuntu 22.04:

git clone https://github.com/turboderp/exllama
cd exllama
pip install -r requirements.txt
pip install numpy
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu118

Then I edited the example_basic.py to point to my model folder (line 8) and I executed python3 ./example_basic.py. Output:

root@test:/exllama# python3 ./example_basic.py
Once upon a time,thts'mtamu'tlthgwit mtt​r.​th&hvth&#then​thn`Hugthoo-bathdthbbeerioushag&h****Thi.hmmhh<Lhf**custObrimted‍A��thee​Dangemdohectrime^Oh:The romeah?IMine 08174/
thex5thibrulzivyhllhethmumgy andmuded with bjabooa- A.kluredTurtieooght{maviiBogmamixing, thetyr0 totamamate to nothaxtatpactrigator at u*Cubebeloocwuammmueaa|cddamboeccr

I also checked out different models from TheBloke to see, what if it is related to the model files. But it made no difference.

pip tells me I have torch 2.0.1 installed which, according to your readme should be working:

root@test-ai:/exllama# pip list | grep torch
torch                    2.0.1

Anything else that I can check?

lordwebbie commented 1 year ago

I did a little search through the issues for the numpy requirement. There are several tickets about this topic.

I uninstalled numpy to see the stacktrace:

/exllama/cuda_ext.py:82: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
  none_tensor = torch.empty((1, 1), device = "meta")
^CTraceback (most recent call last):
  File "/exllama/./example_basic.py", line 22, in <module>
    model = ExLlama(config)                                 # create ExLlama instance and load the weights
  File "/exllama/model.py", line 831, in __init__
    tensor = tensor.to(device, non_blocking = True)

There was one ticket with a similar stacktrace but the numpy-requirement wasn't the main problem there (it was about incorrect file ownerships in a docker container). They just installed it and that's it.

So, it seems the numpy requirement comes up when CUDA is involved.

turboderp commented 1 year ago

This is indeed weird. I haven't heard any feedback from anyone else using a V100, so it may be a compatibility issue with the GPU. Could you try setting verbose = True at the top of cuda_ext.py and pasting the output from the extension build?

lordwebbie commented 1 year ago

Found the issue. For some reason my install script installed the latest version of cuda (12.2, PyTorch is only compatible up to cuda 11.8). I did not notice this because my installer is an automated process. With the correct version of cuda installed it works correctly. Thanks for your help!