meta-llama / llama-stack

Model components of the Llama Stack APIs
MIT License
3.41k stars 487 forks source link

run client failed with httpx.ReadError #218

Open alexhegit opened 17 hours ago

alexhegit commented 17 hours ago

Step1: Start the server with docker docker run --rm -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu

Step2: run the client python -m llama_stack.apis.inference.client localhost 5000

Logs:

(LStack) opea@acc:~/Repo/llama-stack$ python -m llama_stack.apis.inference.client localhost 5000
User>hello world, write me a 2 sentence poem about the moon
Traceback (most recent call last):
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/httpx/_transports/default.py", line 72, in map_httpcore_exceptions
    yield
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/httpx/_transports/default.py", line 377, in handle_async_request
    resp = await self._pool.handle_async_request(req)
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 216, in handle_async_request
    raise exc from None
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 196, in handle_async_request
    response = await connection.handle_async_request(
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/httpcore/_async/connection.py", line 101, in handle_async_request
    return await self._connection.handle_async_request(request)
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/httpcore/_async/http11.py", line 143, in handle_async_request
    raise exc
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/httpcore/_async/http11.py", line 113, in handle_async_request
    ) = await self._receive_response_headers(**kwargs)
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/httpcore/_async/http11.py", line 186, in _receive_response_headers
    event = await self._receive_event(timeout=timeout)
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/httpcore/_async/http11.py", line 224, in _receive_event
    data = await self._network_stream.read(
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/httpcore/_backends/anyio.py", line 32, in read
    with map_exceptions(exc_map):
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ReadError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/opea/Repo/llama-stack/llama_stack/apis/inference/client.py", line 178, in <module>
    fire.Fire(main)
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/opea/Repo/llama-stack/llama_stack/apis/inference/client.py", line 174, in main
    asyncio.run(run_main(host, port, stream, model, logprobs))
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/opea/Repo/llama-stack/llama_stack/apis/inference/client.py", line 134, in run_main
    async for log in EventLogger().log(iterator):
  File "/home/opea/Repo/llama-stack/llama_stack/apis/inference/event_logger.py", line 32, in log
    async for chunk in event_generator:
  File "/home/opea/Repo/llama-stack/llama_stack/apis/inference/client.py", line 70, in chat_completion
    async with client.stream(
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/httpx/_client.py", line 1628, in stream
    response = await self.send(
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/httpx/_client.py", line 1674, in send
    response = await self._send_handling_auth(
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/httpx/_client.py", line 1702, in _send_handling_auth
    response = await self._send_handling_redirects(
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/httpx/_client.py", line 1739, in _send_handling_redirects
    response = await self._send_single_request(request)
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/httpx/_client.py", line 1776, in _send_single_request
    response = await transport.handle_async_request(request)
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/httpx/_transports/default.py", line 376, in handle_async_request
    with map_httpcore_exceptions():
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/opea/anaconda3/envs/LStack/lib/python3.10/site-packages/httpx/_transports/default.py", line 89, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ReadError
Alevs2R commented 14 hours ago

the same error

Alevs2R commented 14 hours ago

The problem is that the server runs on IPv6, but I don't know how to disable this using docker run

Alevs2R commented 13 hours ago

use docker run --rm -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu --disable-ipv6