marvik-ai / triton-llama2-adapter

MIT License
18 stars 3 forks source link

Timeout when starting client.py #2

Open dantepalacio opened 7 months ago

dantepalacio commented 7 months ago

I copied the python_backend folder completely. The server starts, everything is fine:

I0221 07:28:34.912813 342 python_be.cc:2136] TRITONBACKEND_ModelInstanceInitialize: instance initialization successful llamav2_0_0 (device 0) I0221 07:28:34.912929 342 backend_model_instance.cc:734] Starting backend thread for llamav2_0_0 at nice 0 on device 0... I0221 07:28:34.913053 342 backend_model.cc:536] Created model instance named 'llamav2_0_0' with device id '0' I0221 07:28:34.913175 342 dynamic_batch_scheduler.cc:295] Starting dynamic-batcher thread for llamav2 at nice 0... I0221 07:28:34.913180 342 model_lifecycle.cc:676] OnLoadComplete() 'llamav2' version 1 I0221 07:28:34.913188 342 model_lifecycle.cc:714] OnLoadFinal() 'llamav2' for all version(s) I0221 07:28:34.913191 342 model_lifecycle.cc:819] successfully loaded 'llamav2' I0221 07:28:34.913209 342 model_lifecycle.cc:286] VersionStates() 'llamav2' I0221 07:28:34.913223 342 model_lifecycle.cc:286] VersionStates() 'llamav2' I0221 07:28:34.913243 342 server.cc:604] +------------------+------+ | Repository Agent | Path | +------------------+------+ +------------------+------+

I0221 07:28:34.913272 342 server.cc:631] +---------+---------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Backend | Path | Config | +---------+---------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ | pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {} | | python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4 | | | | "}} | +---------+---------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0221 07:28:34.913278 342 model_lifecycle.cc:265] ModelStates() I0221 07:28:34.913287 342 server.cc:674] +---------+---------+--------+ | Model | Version | Status | +---------+---------+--------+ | llamav2 | 1 | READY | +---------+---------+--------+

I0221 07:28:34.924965 342 metrics.cc:810] Collecting metrics for GPU 0: NVIDIA GeForce RTX 4060 Ti I0221 07:28:34.925061 342 metrics.cc:703] Collecting CPU metrics I0221 07:28:34.925130 342 tritonserver.cc:2435] +----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Option | Value | +----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | server_id | triton | | server_version | 2.37.0 | | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters sta | | | tistics trace logging | | model_repository_path[0] | /models | | model_control_mode | MODE_NONE | | strict_model_config | 0 | | rate_limit | OFF | | pinned_memory_pool_byte_size | 268435456 | | cuda_memory_pool_byte_size{0} | 67108864 | | min_supported_compute_capability | 6.0 | | strict_readiness | 1 | | exit_timeout | 30 | | cache_enabled | 0 | +----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0221 07:28:34.925389 342 grpc_server.cc:2345] +----------------------------------------------+---------+ | GRPC KeepAlive Option | Value | +----------------------------------------------+---------+ | keepalive_time_ms | 7200000 | | keepalive_timeout_ms | 20000 | | keepalive_permit_without_calls | 0 | | http2_max_pings_without_data | 2 | | http2_min_recv_ping_interval_without_data_ms | 300000 | | http2_max_ping_strikes | 2 | +----------------------------------------------+---------+

I0221 07:28:34.925676 342 grpc_server.cc:101] Ready for RPC 'Check', 0 I0221 07:28:34.925689 342 grpc_server.cc:101] Ready for RPC 'ServerLive', 0 I0221 07:28:34.925692 342 grpc_server.cc:101] Ready for RPC 'ServerReady', 0 I0221 07:28:34.925695 342 grpc_server.cc:101] Ready for RPC 'ModelReady', 0 I0221 07:28:34.925698 342 grpc_server.cc:101] Ready for RPC 'ServerMetadata', 0 I0221 07:28:34.925701 342 grpc_server.cc:101] Ready for RPC 'ModelMetadata', 0 I0221 07:28:34.925705 342 grpc_server.cc:101] Ready for RPC 'ModelConfig', 0 I0221 07:28:34.925709 342 grpc_server.cc:101] Ready for RPC 'SystemSharedMemoryStatus', 0 I0221 07:28:34.925712 342 grpc_server.cc:101] Ready for RPC 'SystemSharedMemoryRegister', 0 I0221 07:28:34.925716 342 grpc_server.cc:101] Ready for RPC 'SystemSharedMemoryUnregister', 0 I0221 07:28:34.925719 342 grpc_server.cc:101] Ready for RPC 'CudaSharedMemoryStatus', 0 I0221 07:28:34.925721 342 grpc_server.cc:101] Ready for RPC 'CudaSharedMemoryRegister', 0 I0221 07:28:34.925724 342 grpc_server.cc:101] Ready for RPC 'CudaSharedMemoryUnregister', 0 I0221 07:28:34.925728 342 grpc_server.cc:101] Ready for RPC 'RepositoryIndex', 0 I0221 07:28:34.925732 342 grpc_server.cc:101] Ready for RPC 'RepositoryModelLoad', 0 I0221 07:28:34.925734 342 grpc_server.cc:101] Ready for RPC 'RepositoryModelUnload', 0 I0221 07:28:34.925738 342 grpc_server.cc:101] Ready for RPC 'ModelStatistics', 0 I0221 07:28:34.925742 342 grpc_server.cc:101] Ready for RPC 'Trace', 0 I0221 07:28:34.925747 342 grpc_server.cc:101] Ready for RPC 'Logging', 0 I0221 07:28:34.925754 342 grpc_server.cc:350] Thread started for CommonHandler I0221 07:28:34.925786 342 infer_handler.cc:703] New request handler for ModelInferHandler, 0 I0221 07:28:34.925792 342 infer_handler.h:1048] Thread started for ModelInferHandler I0221 07:28:34.925879 342 infer_handler.cc:703] New request handler for ModelInferHandler, 0 I0221 07:28:34.925890 342 infer_handler.h:1048] Thread started for ModelInferHandler I0221 07:28:34.925993 342 stream_infer_handler.cc:128] New request handler for ModelStreamInferHandler, 0 I0221 07:28:34.926003 342 infer_handler.h:1048] Thread started for ModelStreamInferHandler I0221 07:28:34.926007 342 grpc_server.cc:2451] Started GRPCInferenceService at 0.0.0.0:8001 I0221 07:28:34.926127 342 http_server.cc:3558] Started HTTPService at 0.0.0.0:8000 I0221 07:28:34.966992 342 http_server.cc:187] Started Metrics Service at 0.0.0.0:8002

But when I start the client side, run the client.py file I get the following

Server side:

I0221 07:49:36.738491 342 http_server.cc:3452] HTTP request: 2 /v2/models/llamav2/versions/1/infer I0221 07:49:36.738501 342 model_lifecycle.cc:328] GetModel() 'llamav2' version 1 I0221 07:49:36.738504 342 model_lifecycle.cc:328] GetModel() 'llamav2' version 1 I0221 07:49:36.738520 342 infer_request.cc:751] [request id: ] prepared: [0x0x7fc098009e10] request id: , model: llamav2, requested version: 1, actual version: 1, flags: 0x0, correlation id : 0, batch size: 3, priority: 0, timeout (us): 0 original inputs: [0x0x7fc098005b18] input: prompt, type: BYTES, original shape: [3,1], batch + shape: [3,1], shape: [1] override inputs: inputs: [0x0x7fc098005b18] input: prompt, type: BYTES, original shape: [3,1], batch + shape: [3,1], shape: [1] original requested outputs: generated_text requested outputs: generated_text

I0221 07:49:39.731564 342 python_be.cc:1270] model llamav2, instance llamav2_0_0, executing 2 requests I0221 07:50:57.538984 342 infer_response.cc:167] add response output: output: generated_text, type: BYTES, shape: [1,3] I0221 07:50:57.539009 342 http_server.cc:1123] HTTP using buffer for: 'generated_text', size: 1099, addr: 0x7fc0c80052c0 I0221 07:50:57.539032 342 http_server.cc:1197] HTTP release: size 1099, addr 0x7fc0c80052c0 I0221 07:50:57.539038 342 infer_response.cc:167] add response output: output: generated_text, type: BYTES, shape: [1,3] I0221 07:50:57.539042 342 http_server.cc:1123] HTTP using buffer for: 'generated_text', size: 1099, addr: 0x7fc0c8008040 I0221 07:50:57.539047 342 http_server.cc:1197] HTTP release: size 1099, addr 0x7fc0c8008040 I0221 07:50:57.539073 342 python_be.cc:2237] TRITONBACKEND_ModelInstanceExecute: model instance name llamav2_0_0 released 2 requests

Client side:

Traceback (most recent call last): File "src/gevent/greenlet.py", line 908, in gevent._gevent_cgreenlet.Greenlet.run File "/usr/local/lib/python3.10/dist-packages/tritonclient/http/_client.py", line 1577, in wrapped_post return self._post(request_uri, request_body, headers, query_params) File "/usr/local/lib/python3.10/dist-packages/tritonclient/http/_client.py", line 290, in _post response = self._client_stub.post( File "/usr/local/lib/python3.10/dist-packages/geventhttpclient/client.py", line 272, in post return self.request(METHOD_POST, request_uri, body=body, headers=headers) File "/usr/local/lib/python3.10/dist-packages/geventhttpclient/client.py", line 253, in request response = HTTPSocketPoolResponse(sock, self._connection_pool, File "/usr/local/lib/python3.10/dist-packages/geventhttpclient/response.py", line 292, in init super(HTTPSocketPoolResponse, self).init(sock, **kw) File "/usr/local/lib/python3.10/dist-packages/geventhttpclient/response.py", line 164, in init self._read_headers() File "/usr/local/lib/python3.10/dist-packages/geventhttpclient/response.py", line 184, in _read_headers data = self._sock.recv(self.block_size) File "/usr/local/lib/python3.10/dist-packages/gevent/_socketcommon.py", line 666, in recv self._wait(self._read_event) File "src/gevent/_hub_primitives.py", line 317, in gevent._gevent_c_hub_primitives.wait_on_socket File "src/gevent/_hub_primitives.py", line 322, in gevent._gevent_c_hub_primitives.wait_on_socket File "src/gevent/_hub_primitives.py", line 313, in gevent._gevent_c_hub_primitives._primitive_wait File "src/gevent/_hub_primitives.py", line 314, in gevent._gevent_c_hub_primitives._primitive_wait File "src/gevent/_hub_primitives.py", line 46, in gevent._gevent_c_hub_primitives.WaitOperationsGreenlet.wait File "src/gevent/_hub_primitives.py", line 46, in gevent._gevent_c_hub_primitives.WaitOperationsGreenlet.wait File "src/gevent/_hub_primitives.py", line 55, in gevent._gevent_c_hub_primitives.WaitOperationsGreenlet.wait File "src/gevent/_waiter.py", line 154, in gevent._gevent_c_waiter.Waiter.get File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch File "src/gevent/_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch File "src/gevent/_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch TimeoutError: timed out 2024-02-21T07:50:36Z <Greenlet at 0x7f2817908ae0: wrapped_post('v2/models/llamav2/versions/1/infer', b'{"inputs":[{"name":"prompt","shape":[3,1],"datat, {'Inference-Header-Content-Length': 172}, None)> failed with TimeoutError

Traceback (most recent call last): File "src/gevent/greenlet.py", line 908, in gevent._gevent_cgreenlet.Greenlet.run File "/usr/local/lib/python3.10/dist-packages/tritonclient/http/_client.py", line 1577, in wrapped_post return self._post(request_uri, request_body, headers, query_params) File "/usr/local/lib/python3.10/dist-packages/tritonclient/http/_client.py", line 290, in _post response = self._client_stub.post( File "/usr/local/lib/python3.10/dist-packages/geventhttpclient/client.py", line 272, in post return self.request(METHOD_POST, request_uri, body=body, headers=headers) File "/usr/local/lib/python3.10/dist-packages/geventhttpclient/client.py", line 253, in request response = HTTPSocketPoolResponse(sock, self._connection_pool, File "/usr/local/lib/python3.10/dist-packages/geventhttpclient/response.py", line 292, in init super(HTTPSocketPoolResponse, self).init(sock, **kw) File "/usr/local/lib/python3.10/dist-packages/geventhttpclient/response.py", line 164, in init self._read_headers() File "/usr/local/lib/python3.10/dist-packages/geventhttpclient/response.py", line 184, in _read_headers data = self._sock.recv(self.block_size) File "/usr/local/lib/python3.10/dist-packages/gevent/_socketcommon.py", line 666, in recv self._wait(self._read_event) File "src/gevent/_hub_primitives.py", line 317, in gevent._gevent_c_hub_primitives.wait_on_socket File "src/gevent/_hub_primitives.py", line 322, in gevent._gevent_c_hub_primitives.wait_on_socket File "src/gevent/_hub_primitives.py", line 313, in gevent._gevent_c_hub_primitives._primitive_wait File "src/gevent/_hub_primitives.py", line 314, in gevent._gevent_c_hub_primitives._primitive_wait File "src/gevent/_hub_primitives.py", line 46, in gevent._gevent_c_hub_primitives.WaitOperationsGreenlet.wait File "src/gevent/_hub_primitives.py", line 46, in gevent._gevent_c_hub_primitives.WaitOperationsGreenlet.wait File "src/gevent/_hub_primitives.py", line 55, in gevent._gevent_c_hub_primitives.WaitOperationsGreenlet.wait File "src/gevent/_waiter.py", line 154, in gevent._gevent_c_waiter.Waiter.get File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch File "src/gevent/_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch File "src/gevent/_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch TimeoutError: timed out 2024-02-21T07:50:36Z <Greenlet at 0x7f2817909d00: wrapped_post('v2/models/llamav2/versions/1/infer', b'{"inputs":[{"name":"prompt","shape":[3,1],"datat, {'Inference-Header-Content-Length': 172}, None)> failed with TimeoutError

Traceback (most recent call last): File "/workspace/sc_office_server/test_triton/client.py", line 35, in result = r.get_result() File "/usr/local/lib/python3.10/dist-packages/tritonclient/http/_client.py", line 92, in get_result response = self._greenlet.get(block=block, timeout=timeout) File "src/gevent/greenlet.py", line 805, in gevent._gevent_cgreenlet.Greenlet.get File "src/gevent/greenlet.py", line 373, in gevent._gevent_cgreenlet.Greenlet._raise_exception File "/usr/local/lib/python3.10/dist-packages/gevent/_compat.py", line 48, in reraise raise value.with_traceback(tb) File "src/gevent/greenlet.py", line 908, in gevent._gevent_cgreenlet.Greenlet.run File "/usr/local/lib/python3.10/dist-packages/tritonclient/http/_client.py", line 1577, in wrapped_post return self._post(request_uri, request_body, headers, query_params) File "/usr/local/lib/python3.10/dist-packages/tritonclient/http/_client.py", line 290, in _post response = self._client_stub.post( File "/usr/local/lib/python3.10/dist-packages/geventhttpclient/client.py", line 272, in post return self.request(METHOD_POST, request_uri, body=body, headers=headers) File "/usr/local/lib/python3.10/dist-packages/geventhttpclient/client.py", line 253, in request response = HTTPSocketPoolResponse(sock, self._connection_pool, File "/usr/local/lib/python3.10/dist-packages/geventhttpclient/response.py", line 292, in init super(HTTPSocketPoolResponse, self).init(sock, **kw) File "/usr/local/lib/python3.10/dist-packages/geventhttpclient/response.py", line 164, in init self._read_headers() File "/usr/local/lib/python3.10/dist-packages/geventhttpclient/response.py", line 184, in _read_headers data = self._sock.recv(self.block_size) File "/usr/local/lib/python3.10/dist-packages/gevent/_socketcommon.py", line 666, in recv self._wait(self._read_event) File "src/gevent/_hub_primitives.py", line 317, in gevent._gevent_c_hub_primitives.wait_on_socket File "src/gevent/_hub_primitives.py", line 322, in gevent._gevent_c_hub_primitives.wait_on_socket File "src/gevent/_hub_primitives.py", line 313, in gevent._gevent_c_hub_primitives._primitive_wait File "src/gevent/_hub_primitives.py", line 314, in gevent._gevent_c_hub_primitives._primitive_wait File "src/gevent/_hub_primitives.py", line 46, in gevent._gevent_c_hub_primitives.WaitOperationsGreenlet.wait File "src/gevent/_hub_primitives.py", line 46, in gevent._gevent_c_hub_primitives.WaitOperationsGreenlet.wait File "src/gevent/_hub_primitives.py", line 55, in gevent._gevent_c_hub_primitives.WaitOperationsGreenlet.wait File "src/gevent/_waiter.py", line 154, in gevent._gevent_c_waiter.Waiter.get File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch File "src/gevent/_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch File "src/gevent/_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch TimeoutError: timed out

Directory: image