4bit model loads but doesn't output. UnboundLocalError: local variable 'reply' referenced before assignment

Roerib commented 1 year ago

Describe the bug

model loads but won't output anything.

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

loaded llama 7b with 4bit

Screenshot

Screenshot (44)

Logs

Starting the web UI...

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: Loading binary C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cudaall.dll...
Loading llama-7b-4bit...
Loading model ...
C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\safetensors\torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\torch\_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\torch\storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
Done.
Loaded the model in 5.45 seconds.
Loading the extension "gallery"... Ok.
C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\gradio\deprecation.py:40: UserWarning: The 'type' parameter has been deprecated. Use the Number component instead.
  warnings.warn(value)
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "C:\Users\Robert\Desktop\oobabooga-windows\text-generation-webui\modules\callbacks.py", line 64, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "C:\Users\Robert\Desktop\oobabooga-windows\text-generation-webui\modules\text_generation.py", line 222, in generate_with_callback
    shared.model.generate(**kwargs)
  File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1485, in generate
    return self.sample(
  File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 2524, in sample
    outputs = self(
  File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 689, in forward
    outputs = self.model(
  File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 577, in forward
    layer_outputs = decoder_layer(
  File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 196, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Robert\Desktop\oobabooga-windows\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py", line 426, in forward
    quant_cuda.vecquant4matmul(x, self.qweight, y, self.scales, self.qzeros, self.groupsize)
TypeError: vecquant4matmul(): incompatible function arguments. The following argument types are supported:
    1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: torch.Tensor, arg4: torch.Tensor, arg5: torch.Tensor) -> None

Invoked with: tensor([[ 0.0436, -0.0149,  0.0150,  ...,  0.0267,  0.0112, -0.0011],
        [-0.0108,  0.0345, -0.0282,  ...,  0.0073, -0.0096,  0.0204],
        [-0.0362, -0.0222, -0.0107,  ..., -0.0035, -0.0108,  0.0189],
        ...,
        [ 0.0324,  0.0055,  0.0122,  ...,  0.0099, -0.0175,  0.0141],
        [ 0.0160, -0.0103, -0.0197,  ...,  0.0249, -0.0164,  0.0180],
        [-0.0431, -0.0260,  0.0012,  ...,  0.0075, -0.0076, -0.0037]],
       device='cuda:0'), tensor([[ 2004248678,  2020046951,  1735952023,  ..., -1738970729,
         -1771669913,  1988708744],
        [ 1752594295,  1985447527,  1719101559,  ...,  1737979784,
          1735882872,  1988584549],
        [ 2003277431, -2038925705,  2003200134,  ..., -1752671846,
         -2055710840, -1418419097],
        ...,
        [ 1987475319, -2021226904,  1719236470,  ...,  1985514391,
          1734904166, -1485412727],
        [ 1988585302,  2004387686,  2020181895,  ...,   947288215,
          1701270918,  2019850854],
        [ 1736935542,  2022213477, -2038995336,  ..., -1484101769,
          1718053495,  1151894375]], device='cuda:0', dtype=torch.int32), tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0'), tensor([[0.0318, 0.0154, 0.0123,  ..., 0.0191, 0.0206, 0.0137]],
       device='cuda:0'), tensor([[ 1987540582,  1734764150,  1735812471,  1987540855,  1986492263,
          2004248424,  1735878007,  2004313703,  1719035767,  1985439591,
          1735821174,  1735817046,  2003265382,  1717991287,  1736931174,
         -2023262601,  1752594023,  1735882598,  1986422375,  1450600038,
          2004318070,  1987540599,  1988519783,  1719039591,  2004248182,
          1986492038,  1719105398,  1734768246, -2038991258,  1752651622,
          1181120102,  2003269237,  1735816808,  1736857191,  2004182904,
          1736996453, -2041153432,  1718118021,  1719043958,  1717986935,
          2003269735,  1734768503,  1735882343,  1736931174,  2003203975,
          1753704296,  2003203702,  1718052728,  1986492006,  1719101046,
          1987536758,  1718974055,  1735882599,  1734899575,  1735882599,
          2003269479,  2003269239,  1718117990,  1467381350,  1718052486,
         -2025429625,  2020042359,  1734768487,  1734829942,  1449617527,
          2005297015, -1989913240,  2020104069,  1733715560,  1719105366,
         -1771673769,  1467373158,  1719035767,  1734903431,  1986427223,
         -2037942458,  1701209958,  1718904406, -2055833993,  1718965876,
         -2038012041,  1736922759,  2004247893, -2008508537,  2003134327,
          2003265381,  1719031639,  1969649238,  1735882597,  1719096677,
          1720993382,  2004313734,  1702262901,  2020038503,  2003134312,
          2003203958,  1987475079,  1972725351,  2019980646,  1986426742,
          1701209974,  1733785206,  1735878503,  1735882598,  1987471221,
          1953916516, -2039056537, -2055772569,  2004252022,  1986422151,
          1717995127,  1735882615,  2019980903, -2023266714,  1467250774,
         -2037881240,  1970685527,  2004383351,  1466332519,  1701336952,
          2003269479, -2038007962,  1985312646, -2040047786,  1987475558,
          1735944038,  1703302759,  1733855078,  1736865381,  2019915622,
         -2039061129, -2023398026,  1182296151,  1703368567, -2041157496,
          1751610757,  1718052454,  2003203943,  2021025382,  1987540838,
          2003199864,  1734763895,  1448441429, -2025363337,  2002212726,
          2002150998,  1718056583, -2056821131,  1719039319,  1986623350,
          1719101046,  2003203958, -2038995099,  2003265111,  2004248438,
          1451722359,  1988519557, -2040039561,  2004383604,  1717991286,
          1971808103,  1716942951,  1734764422,  1766282614,  1969718917,
          2019981141,  1751545685,  1987471223,  1701279591,  2020042376,
          1969645160,  1467377527,  1450666105,  1970763607,  1986487639,
          1467376998,  2003142518,  1468426103,  1987471175,  1450603879,
          2003203942,  1986361191,  1720153735,  1986422392,  1734764664,
          1734628966, -2024376202,  2004248439,  2004313958,  1718052726,
          1467381367,  1701279606,  1750558357,  2002151319,  1969710711,
          1971808103,  1735882104,  1718121845,  1483114102,  1735878230,
          1700034391,  1450600055,  2004317814,  1717987175,  1719105399,
          1986418791,  1969718886,  1702393702,  1986487927, -1737001371,
          1450665830,  1735874166,  1987471207,  1986492006,  2003199847,
          1987470950, -2056820873, -2054785162,  1719039863,  1719101014,
          1735816807,  1718052471,  1735878262,  1986426470, -2041223578,
         -2039126154, -2023319946,  1717987174, -2023274618,  1735817079,
          1717008230,  1734833765,  1449682535,  1734908021,  2004252293,
         -2006685577,  2003138198,  1467446873,  1717855863,  1987405670,
         -2021231018,  1718052488,  1734767991,  1986491750,  2004318086,
          1719039863,  1751672438,  1719048038,  1987413863,  1702458966,
          1736927078,  2003265127,  2004248167, -1753847977,  1701275238,
          1734829702,  2006410887,  1967683432,  1986484055,  1685550422,
          1735878487,  1450604151,  1701209958,  1733715847,  1970681718,
         -2038999449,  1751479959, -2023331995,  1702201160,  1988523862,
          2004178790,  1702389606,  1969710965,  1986492295,  1702192246,
          1768257686,  1734760312,  1736930918,  1719105398,  2004313719,
         -2005506714,  1735878249,  1986422390,  1718122342, -2023331977,
          1987536470,  1987475303,  2004313702,  1450600311,  2003269480,
          1986422631,  1735812726,  2003199862,  1987475047,  1735878774,
          1987475303,  1986430583,  1986422631,  1985377895,  1735878263,
         -2040035449,  2003199623,  1719040119,  1987471223,  2020112246,
          1718052198,  1718048391, -2038986890,  1701410439, -2022274698,
          1717139317,  1736927079,  1734768247,  2005366391,  1718974310,
         -2039056794,  2003199591,  1734833543,  1734829927,  1987475336,
          2003277686,  1985377910, -2041162122,  1717987174, -2022348968,
          1719105398,  1450665607,  1752589911,  1717991286,  1987535975,
          1685485414,  1970771846, -1771604106,  1717987191,  1735820919,
          1718056824,  1985443926, -2021300137,  1735874423,  1970825078,
          1734764133,  1736862054,  1719105126,  1183213175,  1499883111,
          1718060903,  1716938581,  1987471478,  1751606902,  1987540104,
          1736865894, -1988659612,  1734764392,  1988585351, -2019068313,
          1717995399,  1986426728,  1752655461,  1986553447,  1733716328,
          2003330950,  1735878535,  1686595447, -2024249719,  1199015511,
          1769367174,  1987475046,  2004318070,  1987467112,  1970694263,
          1986422391,  2004252279,  1718113893,  2004317559,  1449617255,
          1718122358, -2007541897,  1483167304, -2006485146, -2040175226,
          2003199607,  2021091190,  1988523606,  1986483830,  2004186742,
          1751607159,  1968597110,  1752598631,  2021029512,  1987536486,
          1987471191,  1719109750,  1986426743,  1987475302,  1719101031,
         -2024437913,  1969715031,  1736931192,  1466328678,  1735878520,
          1719035767,  1466390375,  2004182390,  1701209974,  2004318054,
          2003269511,  1734829415,  2003268726,  2004252773,  2004317799,
          1718056839,  1735878519, -2021235353,  1987536487,  1449555815,
          1969719142,  1769437031,  1733789544,  1985382246,  1718056566,
          2004378983,  1717921143, -2039060889,  1986483831,  2003334504,
         -2055838361,  1448510854,  1734760567, -2022217867,  1702324069,
          1734895734,  1449616998,  1986426487,  1987409798,  1450669703,
          1720158038,  2003200104,  2006414935,  1970689895,  1719035495,
          1466390135,  2004248439,  1734833509,  1988585320, -2023331961,
          1987536501,  1702323559,  2003265142,  1719039607,  1433888120,
          1467447143,  2004313702,  1717999719,  1986357111,  1984390775,
          1987540836,  2003265398,  1702324070,  1968662103,  2021155942,
          1734763879,  1987475047,  1987541111,  1685481318,  1720150135,
          1200126038,  2005436247,  1736992598,  1213618295,  1988524166,
          1197893767,  1987475303,  1717007974,  1734768247, -2023266968,
          2003334757,  1986483833,  1987544166, -2023266938,  1987466838,
          1986487910,  1717982838,  1734768519,  2003261048,  1987540855,
          2004182646,  2002220645,  2003138166,  1736865398,  1751541367,
          2005362294,  2003265399,  1986488166,  1987475062,  1986487911,
          1987540853,  2003264871,  1734764407,  2004313718,  2004318055,
          1181120630,  1734764137,  1718056550,  1734838119,  1987540598,
          1467381110,  1735882360,  2004248183,  1719101302, -2022283673,
          1988523895,  1719100807]], device='cuda:0', dtype=torch.int32), 4096
Output generated in 1.17 seconds (0.00 tokens/s, 0 tokens, context 36)
Traceback (most recent call last):
  File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\gradio\routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\gradio\blocks.py", line 1108, in process_api
    result = await self.call_function(
  File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\gradio\blocks.py", line 929, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\gradio\utils.py", line 490, in async_iteration
    return next(iterator)
  File "C:\Users\Robert\Desktop\oobabooga-windows\text-generation-webui\modules\chat.py", line 173, in cai_chatbot_wrapper
    for _history in chatbot_wrapper(text, max_new_tokens, do_sample, temperature, top_p, typical_p, repetition_penalty, encoder_repetition_penalty, top_k, min_length, no_repeat_ngram_size, num_beams, penalty_alpha, length_penalty, early_stopping, seed, name1, name2, context, check, chat_prompt_size, chat_generation_attempts):
  File "C:\Users\Robert\Desktop\oobabooga-windows\text-generation-webui\modules\chat.py", line 144, in chatbot_wrapper
    cumulative_reply = reply
UnboundLocalError: local variable 'reply' referenced before assignment

System Info

gtx 1070, i7 9700k and 16 GB of ram.

jllllll commented 1 year ago

Make sure that you have the new quantized models. Links to them can be found on the wiki: https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model

Roerib commented 1 year ago

yeah i pulled the model from here https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1483891617

Roerib commented 1 year ago

this also happens with other models in 8bit mode. Pymgalion6b-Dev Traceback (most recent call last): File "C:\Users\Robert\Desktop\oobabooga-windows\text-generation-webui\modules\callbacks.py", line 64, in gentask ret = self.mfunc(callback=_callback, self.kwargs) File "C:\Users\Robert\Desktop\oobabooga-windows\text-generation-webui\modules\text_generation.py", line 222, in generate_with_callback shared.model.generate(kwargs) File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1485, in generate return self.sample( File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 2560, in sample next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either inf, nan or element < 0 Output generated in 3.33 seconds (0.00 tokens/s, 0 tokens, context 345) Traceback (most recent call last): File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\gradio\routes.py", line 393, in run_predict output = await app.get_blocks().process_api( File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\gradio\blocks.py", line 1108, in process_api result = await self.call_function( File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\gradio\blocks.py", line 929, in call_function prediction = await anyio.to_thread.run_sync( File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\anyio_backends_asyncio.py", line 867, in run result = context.run(func, args) File "C:\Users\Robert\Desktop\oobabooga-windows\installer_files\env\lib\site-packages\gradio\utils.py", line 490, in async_iteration return next(iterator) File "C:\Users\Robert\Desktop\oobabooga-windows\text-generation-webui\modules\chat.py", line 184, in regenerate_wrapper for _history in chatbot_wrapper(last_internal[0], max_new_tokens, do_sample, temperature, top_p, typical_p, repetition_penalty, encoder_repetition_penalty, top_k, min_length, no_repeat_ngram_size, num_beams, penalty_alpha, length_penalty, early_stopping, seed, name1, name2, context, check, chat_prompt_size, chat_generation_attempts, regenerate=True): File "C:\Users\Robert\Desktop\oobabooga-windows\text-generation-webui\modules\chat.py", line 144, in chatbot_wrapper cumulative_reply = reply UnboundLocalError: local variable 'reply' referenced before assignment

jllllll commented 1 year ago

In the logs you posted, the real error is above that one: TypeError: vecquant4matmul(): incompatible function arguments. The reply error is being caused by that one. Were you able to compile the cuda kernals from GPTQ-for-LLaMa? Make sure that the GPTQ-for-LLaMa that you have has been switched to the cuda branch. The main branch is no longer compatible with Windows.

Although, if the reply error is happening with 8-bit, it may be a separate issue.

jllllll commented 1 year ago

I am unable to replicate these errors. There have been a lot of updates recently. You may need to fully reinstall the webui to ensure that there are no issues. Make sure that you use the latest version of the installer here: https://github.com/oobabooga/one-click-installers

If you don't want to have to download all the conda packages again, then you can do this:

Run this command through micromamba-cmd.bat: python -m pip uninstall quant_cuda
Make sure to move your models and loras out of the webui folder, then delete the webui folder.
After that, run the installer and it will re-download the webui and reinstall GPTQ-for-LLaMa.

Roerib commented 1 year ago

reinstalled and now it just works.

No565 commented 10 months ago

Hey @Roerib ! Do you have a GTX 1070 card and you can run TextGen with it, right? Can u chat and use all its functions ?? If so can you help me pls, I have the same card but something is wrong and after loading the model i get no answer at the chat tab ... TXH!

Roerib commented 10 months ago

@No565 i remember i was able to get it working but i wasn't able to use anything more than a 13b model. I don't have the card in my system so i can't help you anymore. Maybe your model size is too big. Use a smaller model or use quantization.

oobabooga / text-generation-webui