ray-project / ray-llm

RayLLM - LLMs on Ray
https://aviary.anyscale.com
Apache License 2.0
1.22k stars 87 forks source link

Follow the doc to deploy llama2 70b throws error #58

Closed YQ-Wang closed 11 months ago

YQ-Wang commented 11 months ago

Followed the document https://github.com/ray-project/ray-llm/blob/master/docs/kuberay/deploy-on-eks.md.

Env: latest aviary docker image and kuberay-operator 0.6.0.

(HTTPProxyActor pid=373) ERROR 2023-09-22 16:14:03,473 http_proxy 10.0.136.122 2e2d3085-f043-473d-920b-fbebe1572747 /v1/chat/completions router http_proxy.py:1282 - Unexpected ASGI message 'http.response.start' sent, after response already completed.
(HTTPProxyActor pid=373) Traceback (most recent call last):
(HTTPProxyActor pid=373)   File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/serve/_private/http_proxy.py", line 1258, in send_request_to_replica_streaming
(HTTPProxyActor pid=373)     status_code = await self._consume_and_send_asgi_message_generator(
(HTTPProxyActor pid=373)   File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/serve/_private/http_proxy.py", line 1174, in _consume_and_send_asgi_message_generator
(HTTPProxyActor pid=373)     await send(asgi_message)
(HTTPProxyActor pid=373)   File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/serve/_private/http_proxy.py", line 1331, in send_with_request_id
(HTTPProxyActor pid=373)     await send(message)
(HTTPProxyActor pid=373)   File "/home/ray/anaconda3/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 544, in send
(HTTPProxyActor pid=373)     raise RuntimeError(msg % message_type)
(HTTPProxyActor pid=373) RuntimeError: Unexpected ASGI message 'http.response.start' sent, after response already completed.
(HTTPProxyActor pid=373) Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::ServeReplica:router:Router.handle_request_streaming() (pid=265, ip=10.0.150.166, actor_id=783369e101eb514895ce0ed202000000, repr=<ray.serve._private.replica.ServeReplica:router:Router object at 0x7f8df213e7c0>)
(HTTPProxyActor pid=373)   File "/home/ray/anaconda3/lib/python3.9/concurrent/futures/_base.py", line 439, in result
(HTTPProxyActor pid=373)     return self.__get_result()
(HTTPProxyActor pid=373)   File "/home/ray/anaconda3/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
(HTTPProxyActor pid=373)     raise self._exception
(HTTPProxyActor pid=373)   File "/home/ray/anaconda3/lib/python3.9/concurrent/futures/thread.py", line 58, in run
(HTTPProxyActor pid=373)     result = self.fn(*self.args, **self.kwargs)
(HTTPProxyActor pid=373)   File "stringsource", line 67, in cfunc.to_py.__Pyx_CFunc_object____object____StreamingGeneratorExecutionContext___to_py.wrap
(HTTPProxyActor pid=373)   File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/exceptions.py", line 32, in to_bytes
(HTTPProxyActor pid=373)     serialized_exception=pickle.dumps(self),
(HTTPProxyActor pid=373)   File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 88, in dumps
(HTTPProxyActor pid=373)     cp.dump(obj)
(HTTPProxyActor pid=373)   File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 733, in dump
(HTTPProxyActor pid=373)     return Pickler.dump(self, obj)
(HTTPProxyActor pid=373) TypeError: can't pickle multidict._multidict.CIMultiDictProxy objects
(AviaryTGIInferenceWorker:meta-llama/Llama-2-70b-chat-hf pid=1345, ip=10.0.150.166) [INFO 2023-09-22 16:01:28,721] tgi_worker.py: 663  Model finished warming up (max_batch_total_tokens=19840) and is ready to serve requests. [repeated 7x across cluster]
(AviaryTGIInferenceWorker:meta-llama/Llama-2-70b-chat-hf pid=1345, ip=10.0.150.166) [INFO 2023-09-22 16:01:26,554] tgi_worker.py: 650  Model is warming up. Num requests: 2 Prefill tokens: 6000 Max batch total tokens: 19831 [repeated 3x across cluster]
(ServeReplica:router:Router pid=406) INFO 2023-09-22 16:14:03,465 Router router#Router#GRIFiv 1cb0cd39-1775-4947-9ce0-1207279eb553 /meta-llama--Llama-2-70b-chat-hf/stream router replica.py:741 - __CALL__ OK 5.1ms
(ServeReplica:router:Router pid=265, ip=10.0.150.166) /home/ray/anaconda3/lib/python3.9/site-packages/aviary/backend/server/routers/router_app.py:285: DeprecationWarning: with timeout() is deprecated, use async with timeout() instead
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   with async_timeout.timeout(TIMEOUT):
(ServeReplica:router:Router pid=265, ip=10.0.150.166) [INFO 2023-09-22 16:14:03,467] router_query_engine.py: 120  No tokens produced. Id: 6cf85a69d3d77edbc8df8bd4d5af98b6
(ServeReplica:router:Router pid=265, ip=10.0.150.166) ERROR 2023-09-22 16:14:03,470 Router router#Router#shBjkX 2e2d3085-f043-473d-920b-fbebe1572747 /v1/chat/completions router replica.py:733 - Request failed due to RayTaskError:
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/serve/_private/replica.py", line 730, in wrap_user_method_call
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     yield
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/serve/_private/replica.py", line 870, in call_user_method
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     raise e from None
(ServeReplica:router:Router pid=265, ip=10.0.150.166) ray.exceptions.RayTaskError: ray::ServeReplica:router:Router() (pid=265, ip=10.0.150.166)
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/anyio/streams/memory.py", line 98, in receive
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     return self.receive_nowait()
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/anyio/streams/memory.py", line 93, in receive_nowait
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     raise WouldBlock
(ServeReplica:router:Router pid=265, ip=10.0.150.166) anyio.WouldBlock
(ServeReplica:router:Router pid=265, ip=10.0.150.166)
(ServeReplica:router:Router pid=265, ip=10.0.150.166) During handling of the above exception, another exception occurred:
(ServeReplica:router:Router pid=265, ip=10.0.150.166)
(ServeReplica:router:Router pid=265, ip=10.0.150.166) ray::ServeReplica:router:Router() (pid=265, ip=10.0.150.166)
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/starlette/middleware/base.py", line 78, in call_next
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     message = await recv_stream.receive()
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/anyio/streams/memory.py", line 118, in receive
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     raise EndOfStream
(ServeReplica:router:Router pid=265, ip=10.0.150.166) anyio.EndOfStream
(ServeReplica:router:Router pid=265, ip=10.0.150.166)
(ServeReplica:router:Router pid=265, ip=10.0.150.166) During handling of the above exception, another exception occurred:
(ServeReplica:router:Router pid=265, ip=10.0.150.166)
(ServeReplica:router:Router pid=265, ip=10.0.150.166) ray::ServeReplica:router:Router() (pid=265, ip=10.0.150.166)
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/serve/_private/utils.py", line 225, in wrap_to_ray_error
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     raise exception
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/serve/_private/replica.py", line 851, in call_user_method
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     result = await method_to_call(*request_args, **request_kwargs)
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/serve/_private/http_util.py", line 437, in __call__
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     await self._asgi_app(
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/fastapi/applications.py", line 290, in __call__
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     await super().__call__(scope, receive, send)
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/starlette/applications.py", line 122, in __call__
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     await self.middleware_stack(scope, receive, send)
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     raise exc
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     await self.app(scope, receive, _send)
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 576, in __call__
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     await self.app(scope, otel_receive, otel_send)
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/starlette/middleware/base.py", line 108, in __call__
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     response = await self.dispatch_func(request, call_next)
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/aviary/backend/server/routers/middleware.py", line 12, in add_request_id
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     return await call_next(request)
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/starlette/middleware/base.py", line 84, in call_next
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     raise app_exc
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/starlette/middleware/base.py", line 70, in coro
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     await self.app(scope, receive_or_disconnect, send_no_error)
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/starlette/middleware/cors.py", line 83, in __call__
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     await self.app(scope, receive, send)
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     raise exc
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     await self.app(scope, receive, sender)
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     raise e
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     await self.app(scope, receive, send)
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/starlette/routing.py", line 718, in __call__
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     await route.handle(scope, receive, send)
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     await self.app(scope, receive, send)
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/starlette/routing.py", line 66, in app
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     response = await func(request)
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/fastapi/routing.py", line 241, in app
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     raw_response = await run_endpoint_function(
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/fastapi/routing.py", line 167, in run_endpoint_function
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     return await dependant.call(**values)
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/aviary/backend/server/routers/router_app.py", line 286, in chat
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     results = await self.query_engine.query(body.model, prompt, request)
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/aviary/backend/server/plugins/router_query_engine.py", line 48, in query
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     responses = [resp async for resp in response_stream]
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/aviary/backend/server/plugins/router_query_engine.py", line 48, in <listcomp>
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     responses = [resp async for resp in response_stream]
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/aviary/backend/observability/fn_call_metrics.py", line 192, in new_gen
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     async for x in async_generator_fn(*args, **kwargs):
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/aviary/backend/server/plugins/router_query_engine.py", line 98, in stream
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     async for response in stream_model_responses(url, json=json):
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/aviary/backend/server/utils.py", line 192, in stream_model_responses
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     async with session.post(
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/aiohttp/client.py", line 1141, in __aenter__
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     self._resp = await self._coro
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/aiohttp/client.py", line 643, in _request
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     resp.raise_for_status()
(ServeReplica:router:Router pid=265, ip=10.0.150.166)   File "/home/ray/anaconda3/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 1005, in raise_for_status
(ServeReplica:router:Router pid=265, ip=10.0.150.166)     raise ClientResponseError(
(ServeReplica:router:Router pid=265, ip=10.0.150.166) aiohttp.client_exceptions.ClientResponseError: 404, message='Not Found', url=URL('http://localhost:8000/meta-llama--Llama-2-70b-chat-hf/stream')
(ServeReplica:router:Router pid=265, ip=10.0.150.166) INFO 2023-09-22 16:14:03,471 Router router#Router#shBjkX 2e2d3085-f043-473d-920b-fbebe1572747 /v1/chat/completions router replica.py:741 - __CALL__ ERROR 49.6ms
YQ-Wang commented 11 months ago

cc @kevin85421