Batch ID 0 not found in cache

jlamypoirier commented 1 year ago

System Info

Running on a DGX-A100 server with the provided docker image, with the unrelated modifications in #272.

Information

[x] Docker
[ ] The CLI directly

Tasks

[x] An officially supported command
[ ] My own modifications

Reproduction

The first batch sometimes fails randomly (~5-10% of the time) with Batch ID 0 not found in cache. I observed it a few times with the benchmarking tool, no idea when exactly it can happen.

/usr/src$ text-generation-launcher --model-id bigcode/gpt_bigcode-santacoder  --port 8080
# (In other terminal, launch benchmark or send request):
# /usr/src$ text-generation-benchmark --tokenizer-name bigcode/santacoder -s 128 -d 100
[...]
2023-05-05T01:12:02.363971Z ERROR shard-manager: text_generation_launcher: Method Decode encountered an error.
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/usr/src/server/text_generation_server/cli.py", line 58, in serve
    server.serve(model_id, revision, sharded, quantize, uds_path)
  File "/usr/src/server/text_generation_server/server.py", line 155, in serve
    asyncio.run(serve_inner(model_id, revision, sharded, quantize))
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/opt/conda/lib/python3.9/site-packages/grpc_interceptor/server.py", line 159, in invoke_intercept_method
    return await self.intercept(
> File "/usr/src/server/text_generation_server/interceptor.py", line 20, in intercept
    return await response
  File "/opt/conda/lib/python3.9/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
    raise error
  File "/opt/conda/lib/python3.9/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
    return await behavior(request_or_iterator, context)
  File "/usr/src/server/text_generation_server/server.py", line 79, in Decode
    raise ValueError(f"Batch ID {batch_pb.id} not found in cache.")
ValueError: Batch ID 0 not found in cache.
 rank=0

Expected behavior

Things should work.

jlamypoirier commented 1 year ago

I think it happens if a request arrives too early. Looking at some (self-explanatory) logs I made for calls to methods in cache.py, for a good run:

2023-05-05T01:37:41.497346Z  INFO text_generation_launcher: Starting shard 0
2023-05-05T01:37:49.486106Z  INFO shard-manager: text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
 rank=0
2023-05-05T01:37:49.505990Z  INFO text_generation_launcher: Shard 0 ready in 8.006671363s
2023-05-05T01:37:49.603061Z  INFO text_generation_launcher: Starting Webserver
2023-05-05T01:37:49.850832Z  INFO shard-manager: text_generation_launcher: Clear
 rank=0
2023-05-05T01:37:49.852124Z  INFO text_generation_router: router/src/main.rs:174: Connected
2023-05-05T01:37:50.393492Z  INFO shard-manager: text_generation_launcher: Clear
 rank=0
2023-05-05T01:37:50.418931Z  INFO shard-manager: text_generation_launcher: A 0
 rank=0
2023-05-05T01:37:51.575388Z  INFO shard-manager: text_generation_launcher: Set 0
 rank=0
2023-05-05T01:37:51.575475Z  INFO shard-manager: text_generation_launcher: B 0
 rank=0
2023-05-05T01:37:51.576425Z  INFO shard-manager: text_generation_launcher: Pop 0
 rank=0
2023-05-05T01:37:52.015578Z  INFO shard-manager: text_generation_launcher: Set 0
 rank=0

And for a bad run:

2023-05-05T01:39:21.974282Z  INFO text_generation_launcher: Starting shard 0
2023-05-05T01:39:30.411806Z  INFO shard-manager: text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
 rank=0
2023-05-05T01:39:30.482883Z  INFO text_generation_launcher: Shard 0 ready in 8.506887441s
2023-05-05T01:39:30.511533Z  INFO shard-manager: text_generation_launcher: Clear
 rank=0
2023-05-05T01:39:30.532513Z  INFO shard-manager: text_generation_launcher: A 0
 rank=0
2023-05-05T01:39:30.580146Z  INFO text_generation_launcher: Starting Webserver
2023-05-05T01:39:31.757543Z  INFO shard-manager: text_generation_launcher: Set 0
 rank=0
2023-05-05T01:39:31.757626Z  INFO shard-manager: text_generation_launcher: B 0
 rank=0
2023-05-05T01:39:31.758666Z  INFO shard-manager: text_generation_launcher: Pop 0
 rank=0
2023-05-05T01:39:32.208231Z  INFO shard-manager: text_generation_launcher: Set 0
 rank=0
2023-05-05T01:39:32.208297Z  INFO shard-manager: text_generation_launcher: B 0
 rank=0
2023-05-05T01:39:32.208990Z  INFO shard-manager: text_generation_launcher: Clear
 rank=0
2023-05-05T01:39:32.214262Z  INFO shard-manager: text_generation_launcher: Pop 0
 rank=0

It looks like the request came before the server was fully operational and the startup did a cache clearing that broke the request.

njhill commented 1 year ago

@jlamypoirier I think you just need to wait until the server has fully started before running the benchmark command. I.e. only after this has been logged

2023-05-05T01:37:49.852124Z  INFO text_generation_router: router/src/main.rs:174: Connected

The benchmark tool is hitting the internal API and I don't think this problem would happen in "real" use.

OlivierDehaene commented 1 year ago

Yes the benchmarking starts before the router had the opportunity to call this block. Since the router is unaware that another process is using the internal gRPC API it removes the current batch of the benchmarking process.

jlamypoirier commented 1 year ago

It's still a bug, shouldn't this still be prevented in some way?

Holmes5 commented 1 year ago

I am experiencing loops of this bug that last forever (in what I would consider real use), so it's not just waiting for the server to start up.

BrightXiaoHan commented 11 months ago

Same problem. Any updates about this?

huggingface / text-generation-inference