tloen / llama-int8

Quantized inference code for LLaMA models
GNU General Public License v3.0
1.05k stars 105 forks source link

Getting error on generation in Windows #12

Open elephantpanda opened 1 year ago

elephantpanda commented 1 year ago

I installed bitsandbytes following the guide for windows including the dll from here.

Everything works find it loads 7B into about 8GB VRAM. Great.

But in generating I get:

  File "example.py", line 103, in main
    results = generator.generate(
  File "C:\Users\Shadow\Documents\LLama\llama-int8-main\llama\generation.py", line 60, in generate
    next_token = torch.multinomial(
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Any ideas what went wrong?

Minami-su commented 1 year ago

same question

XDeepAzure commented 1 year ago

So am i, did you fix this?

Minami-su commented 1 year ago

I reported an error in testing on tesla p40, but it ran successfully on rtx a5000. Maybe it is because of the low computing power of the graphics card?

XDeepAzure commented 1 year ago

thanks!!