oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.52k stars 5.31k forks source link

Phi-3-medium-128k-instruct use load_in_4bit is error #6041

Open win10ogod opened 5 months ago

win10ogod commented 5 months ago

Describe the bug

14:38:32-701185 INFO Loading "microsoft_Phi-3-medium-128k-instruct" 14:38:32-710507 INFO TRANSFORMERS_PARAMS= { 'low_cpu_mem_usage': True, 'torch_dtype': torch.bfloat16, 'trust_remote_code': True, 'use_flash_attention_2': True, 'device_map': 'auto', 'quantization_config': BitsAndBytesConfig { "_load_in_4bit": true, "_load_in_8bit": false, "bnb_4bit_compute_dtype": "bfloat16", "bnb_4bit_quant_storage": "uint8", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": false, "llm_int8_enable_fp32_cpu_offload": true, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": null, "llm_int8_threshold": 6.0, "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes" } }

C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\generation\configuration_utils.py:525: UserWarning: do_sample is set to False. However, min_p is set to 0.0 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset min_p. warnings.warn( Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 6/6 [01:29<00:00, 14.87s/it] 14:40:02-478826 INFO Loaded "microsoft_Phi-3-medium-128k-instruct" in 89.78 seconds. 14:40:02-479823 INFO LOADER: "Transformers" 14:40:02-480824 INFO TRUNCATION LENGTH: 131072 14:40:02-480824 INFO INSTRUCTION TEMPLATE: "Custom (obtained from model metadata)" INFO: 127.0.0.1:64368 - "POST /queue/join HTTP/1.1" 200 OK INFO: 127.0.0.1:64368 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK INFO: 127.0.0.1:64368 - "POST /queue/join HTTP/1.1" 200 OK INFO: 127.0.0.1:64368 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK INFO: 127.0.0.1:64389 - "POST /queue/join HTTP/1.1" 200 OK INFO: 127.0.0.1:64389 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK INFO: 127.0.0.1:64389 - "POST /queue/join HTTP/1.1" 200 OK INFO: 127.0.0.1:64389 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK INFO: 127.0.0.1:64389 - "POST /queue/join HTTP/1.1" 200 OK INFO: 127.0.0.1:64389 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK Traceback (most recent call last): File "D:\text-generation-webui\modules\callbacks.py", line 61, in gentask ret = self.mfunc(callback=_callback, args, self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\modules\text_generation.py", line 376, in generate_with_callback shared.model.generate(kwargs) File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\generation\utils.py", line 1736, in generate result = self._sample( ^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\generation\utils.py", line 2375, in _sample outputs = self( ^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1.cache\huggingface\modules\transformers_modules\microsoft_Phi-3-medium-128k-instruct\modeling_phi3.py", line 1286, in forward outputs = self.model( ^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1.cache\huggingface\modules\transformers_modules\microsoft_Phi-3-medium-128k-instruct\modeling_phi3.py", line 1164, in forward layer_outputs = decoder_layer( ^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\hooks.py", line 166, in new_forward output = module._old_forward(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1.cache\huggingface\modules\transformers_modules\microsoft_Phi-3-medium-128k-instruct\modeling_phi3.py", line 885, in forward attn_outputs, self_attn_weights, present_key_value = self.self_attn( ^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\hooks.py", line 166, in new_forward output = module._old_forward(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1.cache\huggingface\modules\transformers_modules\microsoft_Phi-3-medium-128k-instruct\modeling_phi3.py", line 473, in forward qkv = self.qkv_proj(hidden_states) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\hooks.py", line 161, in new_forward args, kwargs = module._hf_hook.pre_forward(module, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\hooks.py", line 347, in pre_forward set_module_tensor_to_device( File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\utils\modeling.py", line 358, in set_module_tensor_to_device raise ValueError( ValueError: Trying to set a tensor of shape torch.Size([7680, 5120]) in "weight" (which has shape torch.Size([19660800, 1])), this look incorrect. Output generated in 31.52 seconds (0.03 tokens/s, 1 tokens, context 73, seed 1363723015)

Is there an existing issue for this?

Reproduction

14:38:32-701185 INFO Loading "microsoft_Phi-3-medium-128k-instruct" 14:38:32-710507 INFO TRANSFORMERS_PARAMS= { 'low_cpu_mem_usage': True, 'torch_dtype': torch.bfloat16, 'trust_remote_code': True, 'use_flash_attention_2': True, 'device_map': 'auto', 'quantization_config': BitsAndBytesConfig { "_load_in_4bit": true, "_load_in_8bit": false, "bnb_4bit_compute_dtype": "bfloat16", "bnb_4bit_quant_storage": "uint8", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": false, "llm_int8_enable_fp32_cpu_offload": true, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": null, "llm_int8_threshold": 6.0, "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes" }

Screenshot

No response

Logs

python server.py --listen --api --public-api --trust-remote-code --use_flash_attention_2 --load-in-4bit
14:38:17-008290 INFO     Starting Text generation web UI
14:38:17-011290 WARNING  trust_remote_code is enabled. This is dangerous.
14:38:17-012291 WARNING
                         You are potentially exposing the web UI to the entire internet without any access password.
                         You can create one with the "--gradio-auth" flag like this:

                         --gradio-auth username:password

                         Make sure to replace username:password with your own.
14:38:17-148479 INFO     Loading the extension "openai"

Running on local URL:  http://0.0.0.0:7860

14:38:19-123858 INFO     OpenAI-compatible API URL:

                         https://porsche-guestbook-arizona-buying.trycloudflare.com

INFO:     127.0.0.1:64264 - "GET /startup-events HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /assets/index-D6iiusuW.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/index-Ds_LdHYW.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/svelte/svelte.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/Index-DvJ399W-.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "GET /assets/Index-B4pUrfBk.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /assets/_commonjsHelpers-BosuxZz1.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /assets/index-Ds_LdHYW.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /info HTTP/1.1" 200 OK
INFO:     127.0.0.1:64279 - "HEAD / HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /theme.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "GET /assets/Button-BM3Gfoxv.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/Blocks-BrGSw8d-.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64280 - "GET /assets/Button-CTZL5Nos.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64281 - "GET /assets/Blocks-xuLVTZz4.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64282 - "GET /heartbeat/9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/Index-CS2xVf8J.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64280 - "GET /assets/Index-C-7D3Y3j.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "GET /assets/Textbox-OSHpBx5r.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64281 - "GET /assets/Index-DPgNZtxV.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /assets/BlockTitle-BG3S3JH7.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/Textbox-D8IAzrZj.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64280 - "GET /assets/Example-Cj3ii62O.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64281 - "GET /assets/Index-B0JJ6p9c.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "GET /assets/Index-COkUHsKJ.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /assets/Info-Cs8uP4Sq.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/Example-CCXcg0ow.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64280 - "GET /assets/Index-B2S_zKCm.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /assets/Index-thdqX2vf.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/Index-BGmqBTg_.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "GET /assets/Index-CptIZeFZ.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64281 - "GET /assets/Tabs-DX5TEub2.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64280 - "GET /assets/Index-BTCbxfal.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64281 - "GET /assets/Index-BSqct-uf.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "GET /assets/Example-D7K5RtQ2.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/Index-CJeiS5ET.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /assets/Index-BAQumg2K.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64280 - "GET /assets/Index-BUublr_b.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/ModifyUpload-RL_SHQmd.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /assets/FileUpload-2TE7T7kD.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "GET /assets/Example-DpWs9cEC.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64281 - "GET /assets/Image-B8dFOee4.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "GET /assets/Example-CX34aPix.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /assets/ImageUploader-B7bPUstM.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64281 - "GET /assets/Index-nQCfNo8e.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/Example-DikqVAPo.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64280 - "GET /assets/Index-CR9lHGQe.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /assets/Index-D8o7u_T6.css HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "GET /assets/Check-Ck0iADAu.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64280 - "GET /assets/Index-DPQUg1KC.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/Copy-ZPOKSMtK.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64281 - "GET /assets/Example-C7XUkkid.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "GET /assets/Index-BEdjhdex.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64281 - "GET /assets/Index.svelte_svelte_type_style_lang-Cp94WWVp.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /assets/Index-DDCF2BFd.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64280 - "GET /assets/Index-cV8rExtZ.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/Example.svelte_svelte_type_style_lang-DyHaW-Bg.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "GET /assets/prism-python-b7Hj1w62.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64280 - "GET /assets/Tabs-BaIn9tfL.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /assets/Example-DZJukuNR.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64281 - "GET /assets/Index-AGFdWl_D.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/Index-INgKs-Lw.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64281 - "GET /assets/Index-2er3m2wk.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64280 - "GET /assets/Index-CMDnMkRp.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /assets/DropdownArrow-CQGNKtt7.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "GET /assets/Index-CtPgCiLM.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64281 - "GET /assets/Index-Bm40fReu.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/Example-CUwox43B.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /assets/Example-BoMLuz1A.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "GET /assets/Index-D-Msc1SI.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64280 - "GET /assets/Index-CnMHQSls.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64281 - "GET /assets/Index-O-CfKA7n.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "GET /assets/Empty-CgtesKU7.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /assets/FileUpload-Ch6QRXCy.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64280 - "GET /assets/BlockLabel-S59nE2jv.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/Index-DgTZt0Wv.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/File-BQ_9P3Ye.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64280 - "GET /assets/file-url-DulxDZ3L.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /assets/ModifyUpload.svelte_svelte_type_style_lang-EVTRkPJ7.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64281 - "GET /assets/ModifyUpload-CkTozhK_.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "GET /assets/DownloadLink-C3h3PR_b.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/Undo-CpmTQw3B.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64281 - "GET /assets/Index-C2EiMH1Q.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "GET /assets/UploadText-DWPYSfVS.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /assets/Example-DrmWnoSo.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64280 - "GET /assets/Upload-Cp8Go_XF.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/ImageUploader-CP2SxGu9.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "GET /assets/ShareButton-C2urwshD.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64281 - "GET /assets/Image-D4FP5rNG.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /assets/Image-Bsh8Umrh.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64280 - "GET /assets/SelectSource-CJD0tR5U.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/Example-B5ra0LMm.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "GET /assets/Index-BgELO62W.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64281 - "GET /assets/Index-Dmx8Zb6D.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /assets/index-CnqicUFC.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64280 - "GET /assets/dsv-DB8NKgIY.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /assets/Example-CMXuI9oj.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "GET /assets/Index-CYcVNHLo.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "GET /assets/Index-uRgjJb4U.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "POST /run/predict HTTP/1.1" 200 OK
INFO:     127.0.0.1:64280 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64281 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "POST /run/predict HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64272 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64281 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64281 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64278 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /favicon.ico HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64274 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64324 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64324 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64324 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64324 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64324 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64324 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
14:38:32-701185 INFO     Loading "microsoft_Phi-3-medium-128k-instruct"
14:38:32-710507 INFO     TRANSFORMERS_PARAMS=
{   'low_cpu_mem_usage': True,
    'torch_dtype': torch.bfloat16,
    'trust_remote_code': True,
    'use_flash_attention_2': True,
    'device_map': 'auto',
    'quantization_config': BitsAndBytesConfig {
  "_load_in_4bit": true,
  "_load_in_8bit": false,
  "bnb_4bit_compute_dtype": "bfloat16",
  "bnb_4bit_quant_storage": "uint8",
  "bnb_4bit_quant_type": "nf4",
  "bnb_4bit_use_double_quant": false,
  "llm_int8_enable_fp32_cpu_offload": true,
  "llm_int8_has_fp16_weight": false,
  "llm_int8_skip_modules": null,
  "llm_int8_threshold": 6.0,
  "load_in_4bit": true,
  "load_in_8bit": false,
  "quant_method": "bitsandbytes"
}
}

C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\generation\configuration_utils.py:525: UserWarning: `do_sample` is set to `False`. However, `min_p` is set to `0.0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `min_p`.
  warnings.warn(
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 6/6 [01:29<00:00, 14.87s/it]
14:40:02-478826 INFO     Loaded "microsoft_Phi-3-medium-128k-instruct" in 89.78 seconds.
14:40:02-479823 INFO     LOADER: "Transformers"
14:40:02-480824 INFO     TRUNCATION LENGTH: 131072
14:40:02-480824 INFO     INSTRUCTION TEMPLATE: "Custom (obtained from model metadata)"
INFO:     127.0.0.1:64368 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64368 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64368 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64368 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64389 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64389 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64389 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64389 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
INFO:     127.0.0.1:64389 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     127.0.0.1:64389 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK
Traceback (most recent call last):
  File "D:\text-generation-webui\modules\callbacks.py", line 61, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\text-generation-webui\modules\text_generation.py", line 376, in generate_with_callback
    shared.model.generate(**kwargs)
  File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\generation\utils.py", line 1736, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\generation\utils.py", line 2375, in _sample
    outputs = self(
              ^^^^^
  File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\jmes1\.cache\huggingface\modules\transformers_modules\microsoft_Phi-3-medium-128k-instruct\modeling_phi3.py", line 1286, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\jmes1\.cache\huggingface\modules\transformers_modules\microsoft_Phi-3-medium-128k-instruct\modeling_phi3.py", line 1164, in forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\jmes1\.cache\huggingface\modules\transformers_modules\microsoft_Phi-3-medium-128k-instruct\modeling_phi3.py", line 885, in forward
    attn_outputs, self_attn_weights, present_key_value = self.self_attn(
                                                         ^^^^^^^^^^^^^^^
  File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\jmes1\.cache\huggingface\modules\transformers_modules\microsoft_Phi-3-medium-128k-instruct\modeling_phi3.py", line 473, in forward
    qkv = self.qkv_proj(hidden_states)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\hooks.py", line 161, in new_forward
    args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\hooks.py", line 347, in pre_forward
    set_module_tensor_to_device(
  File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\utils\modeling.py", line 358, in set_module_tensor_to_device
    raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([7680, 5120]) in "weight" (which has shape torch.Size([19660800, 1])), this look incorrect.
Output generated in 31.52 seconds (0.03 tokens/s, 1 tokens, context 73, seed 1363723015)

System Info

3050-8g
i5-12400f
win11
gloomiebloomie commented 5 months ago

Try phi-3 vision for some reason that one loads even though the others won't, still can't use vision on text webui for that one though but should provide work around for now. My phi-3-small won't even load in the model.