Closed jlamypoirier closed 1 year ago
I think it happens if a request arrives too early. Looking at some (self-explanatory) logs I made for calls to methods in cache.py
, for a good run:
2023-05-05T01:37:41.497346Z INFO text_generation_launcher: Starting shard 0
2023-05-05T01:37:49.486106Z INFO shard-manager: text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
rank=0
2023-05-05T01:37:49.505990Z INFO text_generation_launcher: Shard 0 ready in 8.006671363s
2023-05-05T01:37:49.603061Z INFO text_generation_launcher: Starting Webserver
2023-05-05T01:37:49.850832Z INFO shard-manager: text_generation_launcher: Clear
rank=0
2023-05-05T01:37:49.852124Z INFO text_generation_router: router/src/main.rs:174: Connected
2023-05-05T01:37:50.393492Z INFO shard-manager: text_generation_launcher: Clear
rank=0
2023-05-05T01:37:50.418931Z INFO shard-manager: text_generation_launcher: A 0
rank=0
2023-05-05T01:37:51.575388Z INFO shard-manager: text_generation_launcher: Set 0
rank=0
2023-05-05T01:37:51.575475Z INFO shard-manager: text_generation_launcher: B 0
rank=0
2023-05-05T01:37:51.576425Z INFO shard-manager: text_generation_launcher: Pop 0
rank=0
2023-05-05T01:37:52.015578Z INFO shard-manager: text_generation_launcher: Set 0
rank=0
And for a bad run:
2023-05-05T01:39:21.974282Z INFO text_generation_launcher: Starting shard 0
2023-05-05T01:39:30.411806Z INFO shard-manager: text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
rank=0
2023-05-05T01:39:30.482883Z INFO text_generation_launcher: Shard 0 ready in 8.506887441s
2023-05-05T01:39:30.511533Z INFO shard-manager: text_generation_launcher: Clear
rank=0
2023-05-05T01:39:30.532513Z INFO shard-manager: text_generation_launcher: A 0
rank=0
2023-05-05T01:39:30.580146Z INFO text_generation_launcher: Starting Webserver
2023-05-05T01:39:31.757543Z INFO shard-manager: text_generation_launcher: Set 0
rank=0
2023-05-05T01:39:31.757626Z INFO shard-manager: text_generation_launcher: B 0
rank=0
2023-05-05T01:39:31.758666Z INFO shard-manager: text_generation_launcher: Pop 0
rank=0
2023-05-05T01:39:32.208231Z INFO shard-manager: text_generation_launcher: Set 0
rank=0
2023-05-05T01:39:32.208297Z INFO shard-manager: text_generation_launcher: B 0
rank=0
2023-05-05T01:39:32.208990Z INFO shard-manager: text_generation_launcher: Clear
rank=0
2023-05-05T01:39:32.214262Z INFO shard-manager: text_generation_launcher: Pop 0
rank=0
It looks like the request came before the server was fully operational and the startup did a cache clearing that broke the request.
@jlamypoirier I think you just need to wait until the server has fully started before running the benchmark command. I.e. only after this has been logged
2023-05-05T01:37:49.852124Z INFO text_generation_router: router/src/main.rs:174: Connected
The benchmark tool is hitting the internal API and I don't think this problem would happen in "real" use.
Yes the benchmarking starts before the router had the opportunity to call this block. Since the router is unaware that another process is using the internal gRPC API it removes the current batch of the benchmarking process.
It's still a bug, shouldn't this still be prevented in some way?
I am experiencing loops of this bug that last forever (in what I would consider real use), so it's not just waiting for the server to start up.
Same problem. Any updates about this?
System Info
Running on a DGX-A100 server with the provided docker image, with the unrelated modifications in #272.
Information
Tasks
Reproduction
The first batch sometimes fails randomly (~5-10% of the time) with
Batch ID 0 not found in cache
. I observed it a few times with the benchmarking tool, no idea when exactly it can happen.Expected behavior
Things should work.