Open Torcelllo opened 1 month ago
I have the same error
Well I don't know if you have the same problem as I did, but I have two GPUs. I barely know what I'm doing but after adding this line just below to the "import os" line on the "gradio_app.py" file it worked for me. os.environ['CUDA_VISIBLE_DEVICES'] = '0'
RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16'
This problem can be solved by https://huggingface.co/meta-llama/Meta-Llama-3-8B/discussions/34#:~:text=I%20solved%20my%20problem%20by%20replacing
尝试使用以下方式进行修复,我已经修复了相同问题
第一步:编辑 pipeline.py 打开 pipeline.py 文件。 找到以下代码行: alphas_cumprod = torch.tensor(np.cumprod(alphas, axis=0), dtype=torch.float32) 将其修改为: alphas_cumprod = torch.tensor(np.cumprod(alphas, axis=0), dtype=torch.float32).clone().detach() 第二步:编辑 modeling_llama.py 找到 modeling_llama.py 文件,通常它位于 E:\Omost\python\lib\site-packages\transformers\models\llama\ 目录中。 找到以下代码行: causal_mask = torch.triu(causal_mask, diagonal=1) 将其修改为: causal_mask = torch.triu(causal_mask.to(torch.float32), diagonal=1)
尝试使用以下方式进行修复,我已经修复了相同问题
Это сработало, спасибо! Только у меня не было папки python в каталоге C:\Omost\ Пришлось создать виртуальную среду venv, и уже в ней искать каталог lib\site-packages\transformers\models\llama\
Same error for me as well:
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Traceback (most recent call last):
File "C:\AI\Omost\venv\lib\site-packages\gradio\queueing.py", line 528, in process_events
response = await route_utils.call_process_api(
File "C:\AI\Omost\venv\lib\site-packages\gradio\route_utils.py", line 270, in call_process_api
output = await app.get_blocks().process_api(
File "C:\AI\Omost\venv\lib\site-packages\gradio\blocks.py", line 1908, in process_api
result = await self.call_function(
File "C:\AI\Omost\venv\lib\site-packages\gradio\blocks.py", line 1497, in call_function
prediction = await utils.async_iteration(iterator)
File "C:\AI\Omost\venv\lib\site-packages\gradio\utils.py", line 632, in async_iteration
return await iterator.__anext__()
File "C:\AI\Omost\venv\lib\site-packages\gradio\utils.py", line 758, in asyncgen_wrapper
response = await iterator.__anext__()
File "C:\AI\Omost\chat_interface.py", line 554, in _stream_fn
first_response, first_interrupter = await async_iteration(generator)
File "C:\AI\Omost\venv\lib\site-packages\gradio\utils.py", line 632, in async_iteration
return await iterator.__anext__()
File "C:\AI\Omost\venv\lib\site-packages\gradio\utils.py", line 625, in __anext__
return await anyio.to_thread.run_sync(
File "C:\AI\Omost\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "C:\AI\Omost\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "C:\AI\Omost\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 859, in run
result = context.run(func, *args)
File "C:\AI\Omost\venv\lib\site-packages\gradio\utils.py", line 608, in run_sync_iterator_async
return next(iterator)
File "C:\AI\Omost\venv\lib\site-packages\torch\utils\_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "C:\AI\Omost\gradio_app.py", line 164, in chat_fn
for text in streamer:
File "C:\AI\Omost\venv\lib\site-packages\transformers\generation\streamers.py", line 223, in __next__
value = self.text_queue.get(timeout=self.timeout)
File "C:\Python310\lib\queue.py", line 179, in get
raise Empty
_queue.Empty
Last assistant response is not valid canvas: expected string or bytes-like object
Same error for me as well:
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results. Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation. Traceback (most recent call last): File "C:\AI\Omost\venv\lib\site-packages\gradio\queueing.py", line 528, in process_events response = await route_utils.call_process_api( File "C:\AI\Omost\venv\lib\site-packages\gradio\route_utils.py", line 270, in call_process_api output = await app.get_blocks().process_api( File "C:\AI\Omost\venv\lib\site-packages\gradio\blocks.py", line 1908, in process_api result = await self.call_function( File "C:\AI\Omost\venv\lib\site-packages\gradio\blocks.py", line 1497, in call_function prediction = await utils.async_iteration(iterator) File "C:\AI\Omost\venv\lib\site-packages\gradio\utils.py", line 632, in async_iteration return await iterator.__anext__() File "C:\AI\Omost\venv\lib\site-packages\gradio\utils.py", line 758, in asyncgen_wrapper response = await iterator.__anext__() File "C:\AI\Omost\chat_interface.py", line 554, in _stream_fn first_response, first_interrupter = await async_iteration(generator) File "C:\AI\Omost\venv\lib\site-packages\gradio\utils.py", line 632, in async_iteration return await iterator.__anext__() File "C:\AI\Omost\venv\lib\site-packages\gradio\utils.py", line 625, in __anext__ return await anyio.to_thread.run_sync( File "C:\AI\Omost\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "C:\AI\Omost\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 2177, in run_sync_in_worker_thread return await future File "C:\AI\Omost\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 859, in run result = context.run(func, *args) File "C:\AI\Omost\venv\lib\site-packages\gradio\utils.py", line 608, in run_sync_iterator_async return next(iterator) File "C:\AI\Omost\venv\lib\site-packages\torch\utils\_contextlib.py", line 35, in generator_context response = gen.send(None) File "C:\AI\Omost\gradio_app.py", line 164, in chat_fn for text in streamer: File "C:\AI\Omost\venv\lib\site-packages\transformers\generation\streamers.py", line 223, in __next__ value = self.text_queue.get(timeout=self.timeout) File "C:\Python310\lib\queue.py", line 179, in get raise Empty _queue.Empty Last assistant response is not valid canvas: expected string or bytes-like object
did you fix it?
Running on local URL: http://0.0.0.0:7860
To create a public link, set
share=True
inlaunch()
. You shouldn't move a model that is dispatched using accelerate hooks. Load to GPU: LlamaForCausalLM The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input'sattention_mask
to obtain reliable results. Settingpad_token_id
toeos_token_id
:128001 for open-end generation. Exception in thread Thread-11 (generate): Traceback (most recent call last): File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrapinner self.run() File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run self._target(*self._args, self.kwargs) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 115, in decoratecontext return func(*args, **kwargs) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 1758, in generate result = self.sample( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2397, in sample outputs = self( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 166, in new_forward output = module._oldforward(*args, **kwargs) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modelingllama.py", line 1164, in forward outputs = self.model( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forwardcall(*args, **kwargs) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 166, in new_forward output = module._oldforward(*args, **kwargs) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 940, in forward causal_mask = self._update_causalmask( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1061, in _update_causal_mask causal_mask = torch.triu(causal_mask, diagonal=1) RuntimeError: "triu_tril_cudatemplate" not implemented for 'BFloat16' Traceback (most recent call last): File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\queueing.py", line 528, in process_events response = await route_utils.call_processapi( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\route_utils.py", line 270, in call_process_api output = await app.get_blocks().processapi( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1908, in process_api result = await self.callfunction( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1497, in call_function prediction = await utils.asynciteration(iterator) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in asynciteration return await iterator.anext() File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 758, in asyncgen_wrapper response = await iterator.anext() File "X:\Omost\chat_interface.py", line 554, in _stream_fn first_response, first_interrupter = await asynciteration(generator) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in asynciteration return await iterator.anext() File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 625, in anext return await anyio.to_thread.runsync( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_workerthread( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_workerthread return await future File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run result = context.run(func, args) File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 608, in run_sync_iteratorasync return next(iterator) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context response = gen.send(None) File "X:\Omost\gradio_app.py", line 164, in chatfn for text in streamer: File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\streamers.py", line 223, in next value = self.textqueue.get(timeout=self.timeout) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\queue.py", line 179, in get raise Empty queue.Empty Last assistant response is not valid canvas: expected string or bytes-like object Traceback (most recent call last): File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\queueing.py", line 528, in process_events response = await route_utils.call_processapi( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\route_utils.py", line 270, in call_process_api output = await app.get_blocks().processapi( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1908, in process_api result = await self.callfunction( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1497, in call_function prediction = await utils.asynciteration(iterator) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in asynciteration return await iterator.anext() File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 758, in asyncgen_wrapper response = await iterator.anext() File "X:\Omost\chat_interface.py", line 554, in _stream_fn first_response, first_interrupter = await asynciteration(generator) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in asynciteration return await iterator.anext() File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 625, in anext return await anyio.to_thread.runsync( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_workerthread( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_workerthread return await future File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run result = context.run(func, args) File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 608, in run_sync_iteratorasync return next(iterator) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context response = gen.send(None) File "X:\Omost\gradio_app.py", line 116, in chat_fn np.random.seed(int(seed)) File "numpy\random\mtrand.pyx", line 4806, in numpy.random.mtrand.seed File "numpy\random\mtrand.pyx", line 250, in numpy.random.mtrand.RandomState.seed File "_mt19937.pyx", line 168, in numpy.random._mt19937.MT19937._legacy_seeding File "_mt19937.pyx", line 182, in numpy.random._mt19937.MT19937._legacyseeding ValueError: Seed must be between 0 and 2**32 - 1 Last assistant response is not valid canvas: expected string or bytes-like object Traceback (most recent call last): File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\queueing.py", line 528, in process_events response = await route_utils.call_processapi( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\route_utils.py", line 270, in call_process_api output = await app.get_blocks().processapi( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1908, in process_api result = await self.callfunction( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1497, in call_function prediction = await utils.asynciteration(iterator) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in asynciteration return await iterator.anext() File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 758, in asyncgen_wrapper response = await iterator.anext() File "X:\Omost\chat_interface.py", line 554, in _stream_fn first_response, first_interrupter = await asynciteration(generator) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in asynciteration return await iterator.anext() File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 625, in anext return await anyio.to_thread.runsync( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_workerthread( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_workerthread return await future File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run result = context.run(func, *args) File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 608, in run_sync_iteratorasync return next(iterator) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context response = gen.send(None) File "X:\Omost\gradio_app.py", line 116, in chat_fn np.random.seed(int(seed)) File "numpy\random\mtrand.pyx", line 4806, in numpy.random.mtrand.seed File "numpy\random\mtrand.pyx", line 250, in numpy.random.mtrand.RandomState.seed File "_mt19937.pyx", line 168, in numpy.random._mt19937.MT19937._legacy_seeding File "_mt19937.pyx", line 182, in numpy.random._mt19937.MT19937._legacy_seeding ValueError: Seed must be between 0 and 232 - 1 Last assistant response is not valid canvas: expected string or bytes-like object The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input'sattention_mask
to obtain reliable results. Settingpad_token_id
toeos_token_id
:128001 for open-end generation. Exception in thread Thread-12 (generate): Traceback (most recent call last): File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrapinner self.run() File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run self._target(*self._args, *self.kwargs) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 1758, in generate result = self.sample( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2397, in sample outputs = self( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forwardcall(*args, **kwargs) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 166, in new_forward output = module._oldforward(*args, **kwargs) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modelingllama.py", line 1164, in forward outputs = self.model( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forwardcall(*args, **kwargs) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 166, in new_forward output = module._oldforward(*args, **kwargs) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 940, in forward causal_mask = self._update_causalmask( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1061, in _update_causal_mask causal_mask = torch.triu(causal_mask, diagonal=1) RuntimeError: "triu_tril_cudatemplate" not implemented for 'BFloat16' Traceback (most recent call last): File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\queueing.py", line 528, in process_events response = await route_utils.call_processapi( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\route_utils.py", line 270, in call_process_api output = await app.get_blocks().processapi( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1908, in process_api result = await self.callfunction( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1497, in call_function prediction = await utils.asynciteration(iterator) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in asynciteration return await iterator.anext() File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 758, in asyncgen_wrapper response = await iterator.anext() File "X:\Omost\chat_interface.py", line 554, in _stream_fn first_response, first_interrupter = await asynciteration(generator) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in asynciteration return await iterator.anext() File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 625, in anext return await anyio.to_thread.runsync( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_workerthread( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_workerthread return await future File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run result = context.run(func, args) File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 608, in run_sync_iteratorasync return next(iterator) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context response = gen.send(None) File "X:\Omost\gradio_app.py", line 164, in chatfn for text in streamer: File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\streamers.py", line 223, in next value = self.textqueue.get(timeout=self.timeout) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\queue.py", line 179, in get raise Empty _queue.Empty Last assistant response is not valid canvas: expected string or bytes-like object The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input'sattention_mask
to obtain reliable results. Settingpad_token_id
toeos_token_id
:128001 for open-end generation. Exception in thread Thread-13 (generate): Traceback (most recent call last): File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrapinner self.run() File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run self._target(self._args, self.kwargs) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 115, in decoratecontext return func(*args, **kwargs) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 1758, in generate result = self.sample( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2397, in sample outputs = self( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 166, in new_forward output = module._oldforward(*args, **kwargs) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modelingllama.py", line 1164, in forward outputs = self.model( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forwardcall(*args, **kwargs) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 166, in new_forward output = module._oldforward(*args, **kwargs) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 940, in forward causal_mask = self._update_causalmask( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 1061, in _update_causal_mask causal_mask = torch.triu(causal_mask, diagonal=1) RuntimeError: "triu_tril_cudatemplate" not implemented for 'BFloat16' Traceback (most recent call last): File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\queueing.py", line 528, in process_events response = await route_utils.call_processapi( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\route_utils.py", line 270, in call_process_api output = await app.get_blocks().processapi( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1908, in process_api result = await self.callfunction( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1497, in call_function prediction = await utils.asynciteration(iterator) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in asynciteration return await iterator.anext() File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 758, in asyncgen_wrapper response = await iterator.anext() File "X:\Omost\chat_interface.py", line 554, in _stream_fn first_response, first_interrupter = await asynciteration(generator) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 632, in asynciteration return await iterator.anext() File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 625, in anext return await anyio.to_thread.runsync( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_workerthread( File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_workerthread return await future File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run result = context.run(func, *args) File "C:\Users\john_\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 608, in run_sync_iteratorasync return next(iterator) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context response = gen.send(None) File "X:\Omost\gradio_app.py", line 164, in chatfn for text in streamer: File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\streamers.py", line 223, in next value = self.textqueue.get(timeout=self.timeout) File "C:\Users\john\AppData\Local\Programs\Python\Python310\lib\queue.py", line 179, in get raise Empty _queue.Empty Last assistant response is not valid canvas: expected string or bytes-like object