print_timings: prompt eval time = 6960.61 ms / 3323 tokens ( 2.09 ms per token, 477.40 tokens per second)
print_timings: eval time = 1162042209.58 ms / 0 runs ( inf ms per token, 0.00 tokens per second)
print_timings: total time = 1162049170.19 ms
slot 0 released (3324 tokens in cache)
Haven't yet caught exactly what is happening at the request level to cause this:
INFO | 2023-12-21 22:21:58 | __main__:bridge:159 - Job received from https://aihorde.net for 300 tokens and 4096 max context. Starting generation...
INFO | 2023-12-21 22:22:06 | __main__:bridge:159 - Job received from https://aihorde.net for 300 tokens and 4096 max context. Starting generation... INFO | 2023-12-21 22:22:14 | __main__:bridge:159 - Job received from https://aihorde.net for 300 tokens and 4096 max context. Starting generation...
INFO | 2023-12-21 22:22:23 | __main__:bridge:159 - Job received from https://aihorde.net for 300 tokens and 4096 max context. Starting generation...
INFO | 2023-12-21 22:22:31 | __main__:validate_kai:57 - llama.cpp server model=koboldcpp/openhermes-2.5-mistral-7b.Q5_K_M n_ctx=4096
INFO | 2023-12-21 22:22:31 | __main__:bridge:159 - Job received from https://aihorde.net for 300 tokens and 4096 max context. Starting generation... ERROR | 2023-12-21 22:22:39 | __main__:bridge:85 - Exceeded retry count 4 for generation id 694a2561-e374-4a97-a55e-b1e8f5524599. Aborting generation!
From the server side, they look like this:
Haven't yet caught exactly what is happening at the request level to cause this: