[BUG] Job requires 33 pages (only 32 available) and cannot be enqueued

kreolsky commented 1 month ago

Disclaimer: Github Issues are only for code related bugs. If you do not understand how to startup or use TabbyAPI, please ask in the Discord Server

Describe the bug ERROR: Task exception was never retrieved ERROR: future: \<Task finished name='Task-11' coro=\<_stream_collector() done, defined at /home/text-generation/tabby/endpoints/OAI/utils/chat_completion.py:205> exception=AssertionError('Job requires 33 pages (only 32 available) and cannot be enqueued. Total cache allocated is 32 256 = 8192 tokens')> ERROR: Traceback (most recent call last): ERROR: File "/home/text-generation/tabby/endpoints/OAI/utils/chat_completion.py", line 215, in _stream_collector ERROR: async for generation in new_generation: ERROR: File "/home/text-generation/tabby/backends/exllamav2/model.py", line 1060, in generate_gen ERROR: job = ExLlamaV2DynamicJobAsync( ERROR: ^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR: File "/home/serge/.miniconda/envs/tabby-chat/lib/python3.11/site-packages/exllamav2/generator/dynamic_async.py", line 70, in init ERROR: self.generator.enqueue(self) ERROR: File "/home/serge/.miniconda/envs/tabby-chat/lib/python3.11/site-packages/exllamav2/generator/dynamic_async.py", line 37, in enqueue ERROR: self.generator.enqueue(job.job) ERROR: File "/home/serge/.miniconda/envs/tabby-chat/lib/python3.11/site-packages/exllamav2/generator/dynamic.py", line 729, in enqueue ERROR: job.prepare_for_queue(self, self.job_serial) ERROR: File "/home/serge/.miniconda/envs/tabby-chat/lib/python3.11/site-packages/exllamav2/generator/dynamic.py", line 1875, in prepare_for_queue ERROR: assert total_pages \<= self.generator.max_pages, \ ERROR: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR: AssertionError: Job requires 33 pages (only 32 available) and cannot be enqueued. Total cache allocated is 32 256 = 8192 tokens INFO: Shutting down ERROR: ASGI callable returned without completing response. INFO: Waiting for application shutdown. INFO: Application shutdown complete.

To Reproduce

clone tabby
create conda env
activate conda env
python start.py "$@"
tabby start correctly (model llama3-8b)
try to use api
wait some time with no answer
cancel process
see error

Expected behavior work correctly :)

Logs If applicable, add logs and tracebacks to help explain your problem.

System info (Bugs without this information will go lower on our priority list!)

OS: fedora 40
Python version: 3.11
CUDA/ROCm version: 12.1

DocShotgun commented 1 month ago

This error means that the allocated k/v cache is too small for the generation you're requesting. This will require additional details to look into, such as:

Model and the configuration it was loaded with
Prompt length and generation settings

kreolsky commented 1 month ago

Sorry, it's my fault. I created new config.yml from scratch and it's work fine! Thanks for your work!

kreolsky commented 1 month ago

I figure out! It does not work when use negative_prompt and cfg_scale override > 1.0 in sample_preset.yml

turboderp commented 1 month ago

When using CFG you need to make sure your cache size is twice of your max sequence length, since CFG runs at a batch size of 2.

DocShotgun commented 1 month ago

Not a bug.

Per the documentation for cache_size in config_sample.yml:

"For CFG, set this to 2 * max_seq_len to make room for both positive and negative prompts."

theroyallab / tabbyAPI

[BUG] Job requires 33 pages (only 32 available) and cannot be enqueued #119