Open marksher opened 3 hours ago
You can set max_new_tokens=4096
before the following function. This is because the version difference. I will fix it later.
https://github.com/nv-tlabs/LLaMA-Mesh/blob/main/app.py#L142-L149
Getting somewhere! Thanks! After that I set the environment variable TOKENIZERS_PARALLELISM=false which cleared up another problem. None of them are "errors". Could my environment be the issue? I created a clean virtual environment and then just ran pip install -r requirements.
Gradio isn't included in that file. Is there a specific version I should try?
/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/helpers.py:987: UserWarning: Unexpected argument. Filling with None.
warnings.warn("Unexpected argument. Filling with None.")
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
(venv) ➜ LLaMA-Mesh git:(main) ✗ python app.py
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00, 1.96s/it]
Some parameters are on the meta device because they were offloaded to the disk.
/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/components/chatbot.py:225: UserWarning: You have not specified a value for the `type` parameter. Defaulting to the 'tuples' format for chatbot messages, but this is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style 'role' and 'content' keys.
warnings.warn(
* Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/helpers.py:987: UserWarning: Unexpected argument. Filling with None.
warnings.warn("Unexpected argument. Filling with None.")
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Traceback (most recent call last):
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/queueing.py", line 624, in process_events
response = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/route_utils.py", line 323, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/blocks.py", line 2015, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/blocks.py", line 1574, in call_function
prediction = await utils.async_iteration(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 710, in async_iteration
return await anext(iterator)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 815, in asyncgen_wrapper
response = await iterator.__anext__()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/chat_interface.py", line 678, in _stream_fn
first_response = await async_iteration(generator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 710, in async_iteration
return await anext(iterator)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 704, in __anext__
return await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 943, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 687, in run_sync_iterator_async
return next(iterator)
^^^^^^^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/app.py", line 159, in chat_llama3_8b
for text in streamer:
^^^^^^^^
File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/transformers/generation/streamers.py", line 223, in __next__
value = self.text_queue.get(timeout=self.timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/queue.py", line 179, in get
raise Empty
_queue.Empty
BTW. This error pops up about 10 seconds after clicking one of the example buttons. I'm down to just trying the 9000 * 9000 one to see if anything will get going before trying the bigger stuff.
Hello! I'm getting these errors when trying to generate one of the examples. Any thoughts? This is trying to run on a light Macbook Air M2 with 16GB memory, so could that be the issue?
Full trace: (venv) ➜ LLaMA-Mesh git:(main) ✗ python3 app.py Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00, 1.88s/it] Some parameters are on the meta device because they were offloaded to the disk. /Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/components/chatbot.py:225: UserWarning: You have not specified a value for the
type
parameter. Defaulting to the 'tuples' format for chatbot messages, but this is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style 'role' and 'content' keys. warnings.warn(To create a public link, set
share=True
inlaunch()
. /Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/helpers.py:987: UserWarning: Unexpected argument. Filling with None. warnings.warn("Unexpected argument. Filling with None.") The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input'sattention_mask
to obtain reliable results. Settingpad_token_id
toeos_token_id
:None for open-end generation. The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input'sattention_mask
to obtain reliable results. /Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/transformers/generation/utils.py:1375: UserWarning: Using the model-agnostic defaultmax_length
(=20) to control the generation length. We recommend settingmax_new_tokens
to control the maximum length of the generation. warnings.warn( Exception in thread Thread-9 (generate): Traceback (most recent call last): File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1075, in _bootstrap_inner self.run() File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1012, in run self._target(*self._args, *self._kwargs) File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2068, in generate self._validate_generated_length(generation_config, input_ids_length, has_default_max_length) File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 1383, in _validate_generated_length raise ValueError( ValueError: Input length of input_ids is 21, butmax_length
is set to 20. This can lead to unexpected behavior. You should consider increasingmax_length
or, better yet, settingmax_new_tokens
. Traceback (most recent call last): File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/queueing.py", line 624, in process_events response = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/blocks.py", line 2015, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/blocks.py", line 1574, in call_function prediction = await utils.async_iteration(iterator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 710, in async_iteration return await anext(iterator) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 815, in asyncgen_wrapper response = await iterator.anext() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/chat_interface.py", line 678, in _stream_fn first_response = await async_iteration(generator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 710, in async_iteration return await anext(iterator) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 704, in anext return await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 943, in run result = context.run(func, args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/gradio/utils.py", line 687, in run_sync_iterator_async return next(iterator) ^^^^^^^^^^^^^^ File "/Users/marksher/working/LLaMA-Mesh/app.py", line 158, in chat_llama3_8b for text in streamer: ^^^^^^^^ File "/Users/marksher/working/LLaMA-Mesh/venv/lib/python3.12/site-packages/transformers/generation/streamers.py", line 223, in next value = self.text_queue.get(timeout=self.timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/queue.py", line 179, in get raise Empty _queue.Empty