Open yunmanger1 opened 1 week ago
Stacktrace:
2024-06-22T01:35:24.940390508Z 2024-06-22T01:35:24.940129Z ERROR lorax_launcher: interceptor.py:41 Method Prefill encountered an error.
2024-06-22T01:35:24.940438547Z Traceback (most recent call last):
2024-06-22T01:35:24.940442357Z File "/opt/conda/bin/lorax-server", line 8, in <module>
2024-06-22T01:35:24.940444837Z sys.exit(app())
2024-06-22T01:35:24.940447727Z File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
2024-06-22T01:35:24.940450087Z return get_command(self)(*args, **kwargs)
2024-06-22T01:35:24.940452947Z File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
2024-06-22T01:35:24.940455237Z return self.main(*args, **kwargs)
2024-06-22T01:35:24.940457377Z File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
2024-06-22T01:35:24.940459427Z return _main(
2024-06-22T01:35:24.940461527Z File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
2024-06-22T01:35:24.940463547Z rv = self.invoke(ctx)
2024-06-22T01:35:24.940465637Z File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
2024-06-22T01:35:24.940467637Z return _process_result(sub_ctx.command.invoke(sub_ctx))
2024-06-22T01:35:24.940469767Z File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
2024-06-22T01:35:24.940471797Z return ctx.invoke(self.callback, **ctx.params)
2024-06-22T01:35:24.940473867Z File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
2024-06-22T01:35:24.940475907Z return __callback(*args, **kwargs)
2024-06-22T01:35:24.940477937Z File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
2024-06-22T01:35:24.940479937Z return callback(**use_params) # type: ignore
2024-06-22T01:35:24.940481977Z File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 89, in serve
2024-06-22T01:35:24.940483977Z server.serve(
2024-06-22T01:35:24.940486097Z File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 321, in serve
2024-06-22T01:35:24.940488187Z asyncio.run(
2024-06-22T01:35:24.940490297Z File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
2024-06-22T01:35:24.940492517Z return loop.run_until_complete(main)
2024-06-22T01:35:24.940494587Z File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
2024-06-22T01:35:24.940496737Z self.run_forever()
2024-06-22T01:35:24.940498877Z File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
2024-06-22T01:35:24.940500997Z self._run_once()
2024-06-22T01:35:24.940503147Z File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
2024-06-22T01:35:24.940505417Z handle._run()
2024-06-22T01:35:24.940507627Z File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
2024-06-22T01:35:24.940509857Z self._context.run(self._callback, *self._args)
2024-06-22T01:35:24.940518256Z File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
2024-06-22T01:35:24.940521216Z return await self.intercept(
2024-06-22T01:35:24.940523476Z > File "/opt/conda/lib/python3.10/site-packages/lorax_server/interceptor.py", line 38, in intercept
2024-06-22T01:35:24.940525606Z return await response
2024-06-22T01:35:24.940527986Z File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
2024-06-22T01:35:24.940530426Z raise error
2024-06-22T01:35:24.940532486Z File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
2024-06-22T01:35:24.940534576Z return await behavior(request_or_iterator, context)
2024-06-22T01:35:24.940538416Z File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 88, in Prefill
2024-06-22T01:35:24.940540536Z batch = self.model.batch_type.from_pb(
2024-06-22T01:35:24.940542666Z File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_causal_lm.py", line 272, in from_pb
2024-06-22T01:35:24.940544706Z adapter_indices = torch.cat(adapter_indices_list).to(dtype=torch.int64, device=device)
2024-06-22T01:35:24.940550316Z RuntimeError: CUDA error: device-side assert triggered
2024-06-22T01:35:24.940552636Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2024-06-22T01:35:24.940554636Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2024-06-22T01:35:24.940556746Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
cc @tgaddair
System Info
We are using streaming v1 chat completions API. After some amount of requests or a request with large enough context lorax server fails to respond. And all consequent requests also fail.
we are running it in docker with 1 GPU on A100 PCIe runpod.io:
full request log:
Information
Tasks
Reproduction
Expected behavior
if one request fails consequent request should not be failing.