Open win10ogod opened 5 months ago
Try phi-3 vision for some reason that one loads even though the others won't, still can't use vision on text webui for that one though but should provide work around for now. My phi-3-small won't even load in the model.
Describe the bug
14:38:32-701185 INFO Loading "microsoft_Phi-3-medium-128k-instruct" 14:38:32-710507 INFO TRANSFORMERS_PARAMS= { 'low_cpu_mem_usage': True, 'torch_dtype': torch.bfloat16, 'trust_remote_code': True, 'use_flash_attention_2': True, 'device_map': 'auto', 'quantization_config': BitsAndBytesConfig { "_load_in_4bit": true, "_load_in_8bit": false, "bnb_4bit_compute_dtype": "bfloat16", "bnb_4bit_quant_storage": "uint8", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": false, "llm_int8_enable_fp32_cpu_offload": true, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": null, "llm_int8_threshold": 6.0, "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes" } }
C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\generation\configuration_utils.py:525: UserWarning:
do_sample
is set toFalse
. However,min_p
is set to0.0
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsetmin_p
. warnings.warn( Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 6/6 [01:29<00:00, 14.87s/it] 14:40:02-478826 INFO Loaded "microsoft_Phi-3-medium-128k-instruct" in 89.78 seconds. 14:40:02-479823 INFO LOADER: "Transformers" 14:40:02-480824 INFO TRUNCATION LENGTH: 131072 14:40:02-480824 INFO INSTRUCTION TEMPLATE: "Custom (obtained from model metadata)" INFO: 127.0.0.1:64368 - "POST /queue/join HTTP/1.1" 200 OK INFO: 127.0.0.1:64368 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK INFO: 127.0.0.1:64368 - "POST /queue/join HTTP/1.1" 200 OK INFO: 127.0.0.1:64368 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK INFO: 127.0.0.1:64389 - "POST /queue/join HTTP/1.1" 200 OK INFO: 127.0.0.1:64389 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK INFO: 127.0.0.1:64389 - "POST /queue/join HTTP/1.1" 200 OK INFO: 127.0.0.1:64389 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK INFO: 127.0.0.1:64389 - "POST /queue/join HTTP/1.1" 200 OK INFO: 127.0.0.1:64389 - "GET /queue/data?session_hash=9c013f3v8al HTTP/1.1" 200 OK Traceback (most recent call last): File "D:\text-generation-webui\modules\callbacks.py", line 61, in gentask ret = self.mfunc(callback=_callback, args, self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\modules\text_generation.py", line 376, in generate_with_callback shared.model.generate(kwargs) File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\generation\utils.py", line 1736, in generate result = self._sample( ^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\generation\utils.py", line 2375, in _sample outputs = self( ^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1.cache\huggingface\modules\transformers_modules\microsoft_Phi-3-medium-128k-instruct\modeling_phi3.py", line 1286, in forward outputs = self.model( ^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1.cache\huggingface\modules\transformers_modules\microsoft_Phi-3-medium-128k-instruct\modeling_phi3.py", line 1164, in forward layer_outputs = decoder_layer( ^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\hooks.py", line 166, in new_forward output = module._old_forward(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1.cache\huggingface\modules\transformers_modules\microsoft_Phi-3-medium-128k-instruct\modeling_phi3.py", line 885, in forward attn_outputs, self_attn_weights, present_key_value = self.self_attn( ^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\hooks.py", line 166, in new_forward output = module._old_forward(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1.cache\huggingface\modules\transformers_modules\microsoft_Phi-3-medium-128k-instruct\modeling_phi3.py", line 473, in forward qkv = self.qkv_proj(hidden_states) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\hooks.py", line 161, in new_forward args, kwargs = module._hf_hook.pre_forward(module, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\hooks.py", line 347, in pre_forward set_module_tensor_to_device( File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\accelerate\utils\modeling.py", line 358, in set_module_tensor_to_device raise ValueError( ValueError: Trying to set a tensor of shape torch.Size([7680, 5120]) in "weight" (which has shape torch.Size([19660800, 1])), this look incorrect. Output generated in 31.52 seconds (0.03 tokens/s, 1 tokens, context 73, seed 1363723015)Is there an existing issue for this?
Reproduction
14:38:32-701185 INFO Loading "microsoft_Phi-3-medium-128k-instruct" 14:38:32-710507 INFO TRANSFORMERS_PARAMS= { 'low_cpu_mem_usage': True, 'torch_dtype': torch.bfloat16, 'trust_remote_code': True, 'use_flash_attention_2': True, 'device_map': 'auto', 'quantization_config': BitsAndBytesConfig { "_load_in_4bit": true, "_load_in_8bit": false, "bnb_4bit_compute_dtype": "bfloat16", "bnb_4bit_quant_storage": "uint8", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": false, "llm_int8_enable_fp32_cpu_offload": true, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": null, "llm_int8_threshold": 6.0, "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes" }
Screenshot
No response
Logs
System Info