运行环境:docker: hpcaitech/energon-ai:latest
运行目录:docker内:/workspace/EnergonAI/examples/opt
运行命令:python opt_fastapi.py opt-125m
服务启动时log:
==> Args:
model = opt-125m
tp = 1
master_host = localhost
master_port = 19990
rpc_port = 19980
max_batch_size = 8
pipe_size = 1
queue_size = 0
http_host = 0.0.0.0
http_port = 7070
checkpoint = None
cache_size = 0
cache_list_size = 1
[W ProcessGroupGloo.cpp:685] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[05/06/23 10:09:02] INFO colossalai - colossalai - INFO:
/opt/conda/lib/python3.9/site-packages/colossalai/context/parallel_context.py:521 set_device
INFO colossalai - colossalai - INFO: process rank 0 is bound to device 0
[05/06/23 10:09:03] INFO colossalai - colossalai - INFO:
/opt/conda/lib/python3.9/site-packages/colossalai/context/parallel_context.py:557 set_seed
INFO colossalai - colossalai - INFO: initialized seed on rank 0, numpy: 1024, python random: 1024,
ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1024,the default parallel seed is
ParallelMode.DATA.
INFO colossalai - colossalai - INFO:
/opt/conda/lib/python3.9/site-packages/colossalai/initialize.py:117 launch
INFO colossalai - colossalai - INFO: Distributed environment is initialized, data parallel size: 1,
pipeline parallel size: 1, tensor parallel size: 1
[05/06/23 10:09:03] INFO colossalai - energonai - INFO:
/opt/conda/lib/python3.9/site-packages/energonai/model/model_factory.py:195
create_pipeline_model
INFO colossalai - energonai - INFO: ==> Rank 0 built layer 0-12 / total 12
INFO colossalai - energonai - INFO:
/opt/conda/lib/python3.9/site-packages/energonai/model/model_factory.py:200
create_pipeline_model
INFO colossalai - energonai - INFO: Rank0/0 model size = 0.327696384 GB
[W ProcessGroupGloo.cpp:685] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[W tensorpipe_agent.cpp:180] Failed to look up the IP address for the hostname (EAI_NONAME: unknown node or service (this error originated at tensorpipe/transport/uv/utility.cc:97)), defaulting to 127.0.0.1
[W tensorpipe_agent.cpp:180] Failed to look up the IP address for the hostname (EAI_NONAME: unknown node or service (this error originated at tensorpipe/transport/uv/utility.cc:97)), defaulting to 127.0.0.1
INFO colossalai - energonai - INFO: /opt/conda/lib/python3.9/site-packages/energonai/worker.py:55
init
INFO colossalai - energonai - INFO: worker0 start
[05/06/23 10:09:04] INFO colossalai - energonai - INFO: /opt/conda/lib/python3.9/site-packages/energonai/engine.py:60
init
INFO colossalai - energonai - INFO: Engine start
INFO: Started server process [1705]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:7070 (Press CTRL+C to quit)
问题描述:
请求访问时:curl -XPOST -d '{"prompt": "What is the name of the largest continent on earth?","max_tokens": 128}' -H 'Content-type:application/json;charset=UTF-8' "http://xxxxxip:7070/generation"时,服务端阻塞在 opt_fastapi.py: async def generate(data: GenerationTaskReq, request: Request): output = await engine.wait(uid) 不返回结果,麻烦帮忙看一下是什么原因,感谢。
运行环境:docker: hpcaitech/energon-ai:latest 运行目录:docker内:/workspace/EnergonAI/examples/opt 运行命令:python opt_fastapi.py opt-125m 服务启动时log: ==> Args: model = opt-125m tp = 1 master_host = localhost master_port = 19990 rpc_port = 19980 max_batch_size = 8 pipe_size = 1 queue_size = 0 http_host = 0.0.0.0 http_port = 7070 checkpoint = None cache_size = 0 cache_list_size = 1 [W ProcessGroupGloo.cpp:685] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator()) [05/06/23 10:09:02] INFO colossalai - colossalai - INFO: /opt/conda/lib/python3.9/site-packages/colossalai/context/parallel_context.py:521 set_device INFO colossalai - colossalai - INFO: process rank 0 is bound to device 0 [05/06/23 10:09:03] INFO colossalai - colossalai - INFO: /opt/conda/lib/python3.9/site-packages/colossalai/context/parallel_context.py:557 set_seed INFO colossalai - colossalai - INFO: initialized seed on rank 0, numpy: 1024, python random: 1024, ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1024,the default parallel seed is ParallelMode.DATA. INFO colossalai - colossalai - INFO: /opt/conda/lib/python3.9/site-packages/colossalai/initialize.py:117 launch INFO colossalai - colossalai - INFO: Distributed environment is initialized, data parallel size: 1, pipeline parallel size: 1, tensor parallel size: 1 [05/06/23 10:09:03] INFO colossalai - energonai - INFO: /opt/conda/lib/python3.9/site-packages/energonai/model/model_factory.py:195 create_pipeline_model INFO colossalai - energonai - INFO: ==> Rank 0 built layer 0-12 / total 12 INFO colossalai - energonai - INFO: /opt/conda/lib/python3.9/site-packages/energonai/model/model_factory.py:200 create_pipeline_model INFO colossalai - energonai - INFO: Rank0/0 model size = 0.327696384 GB [W ProcessGroupGloo.cpp:685] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator()) [W tensorpipe_agent.cpp:180] Failed to look up the IP address for the hostname (EAI_NONAME: unknown node or service (this error originated at tensorpipe/transport/uv/utility.cc:97)), defaulting to 127.0.0.1 [W tensorpipe_agent.cpp:180] Failed to look up the IP address for the hostname (EAI_NONAME: unknown node or service (this error originated at tensorpipe/transport/uv/utility.cc:97)), defaulting to 127.0.0.1 INFO colossalai - energonai - INFO: /opt/conda/lib/python3.9/site-packages/energonai/worker.py:55 init INFO colossalai - energonai - INFO: worker0 start [05/06/23 10:09:04] INFO colossalai - energonai - INFO: /opt/conda/lib/python3.9/site-packages/energonai/engine.py:60 init INFO colossalai - energonai - INFO: Engine start INFO: Started server process [1705] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:7070 (Press CTRL+C to quit)
问题描述: 请求访问时:curl -XPOST -d '{"prompt": "What is the name of the largest continent on earth?","max_tokens": 128}' -H 'Content-type:application/json;charset=UTF-8' "http://xxxxxip:7070/generation"时,服务端阻塞在 opt_fastapi.py: async def generate(data: GenerationTaskReq, request: Request): output = await engine.wait(uid) 不返回结果,麻烦帮忙看一下是什么原因,感谢。