Closed zbloss closed 6 months ago
I appreciate your feedback and I’m sorry to hear that you are having trouble with PyTriton and Redis cache. I have some suggestions that might help you resolve the issue.
First, please make sure that you are using the latest version of PyTriton, which is 0.4.1 as of now.
I tried to reproduce your issue with PyTriton 0.4.1 at Linux Ubuntu 22.04 at AMD64 CPU. I see that redis cache is working exactly as expected. The first inference is processed by model and all further requests are served from cache.
I used interactive docker mode and ipython to run all steps so it is easier to inspect what is going on. Please use my reproduction path as reference for your own testing.
Docker was started in interactive mode:
docker run -ti --network=host --platform linux/amd64 --ulimit core=-1 --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --cap-add=SYS_PTRACE --shm-size 2G nvcr.io/nvidia/tritonserver:23.10-pyt-python-py3 bash
Install poetry
root:/opt/tritonserver# pip install poetry
Install dependencies
root:/opt/tritonserver# DEBIAN_FRONTEND=noninteractive apt-get update && apt-get install -y python3 python3-distutils python-is-python3 git \
build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev curl \
libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev \
openssh-client cmake rapidjson-dev
Make folder app:
mkdir /app
cd /app/
Clone redis cache:
git clone https://github.com/triton-inference-server/redis_cache.git
Compile redis cache:
root:/app/redis_cache# ./build.sh
Copy cache library:
``
Library from build folder copied to caches:
root:/app/redis_cache# cp /app/redis_cache/build/libtritoncache_redis.so /opt/tritonserver/caches/redis/libtritoncache_redis.so
Add repositories tools for redis installation:
apt install lsb-release curl gpg
Add redis repository:
curl -fsSL https://packages.redis.io/gpg | gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | tee /etc/apt/sources.list.d/redis.list
Install redis
apt-get update
apt-get install redis
Start redis:
redis-server
Install current version of pytriton and ipython:
pip install nvidia-pytriton
pip install ipython
Start ipython
Enable logging:
import logging
logging.basicConfig(
level=logging.DEBUG, format="%(asctime)s - %(levelname)s - %(name)s: %(message)s"
)
Create inference callable:
import time
import numpy as np
def _infer_fn(requests):
#print(requests)
if len(requests) > 1:
raise Exception("Only one request is supported")
request = requests[0]
text = np.char.decode(request["text"].astype("bytes"), "utf-8").item()
for response in text.split():
return_value = {
"text": np.char.encode(response, "utf-8"),
}
return [return_value]
from pytriton.model_config import ModelConfig, Tensor
from pytriton.triton import Triton, TritonConfig
Prepare config with redis:
triton_config = TritonConfig(
cache_config=[f"redis,host=localhost", f"redis,port=6379"],
cache_directory="/opt/tritonserver/caches",
)
Prepare server configuration:
triton = Triton(config=triton_config)
triton.bind(
model_name="Test",
infer_func=_infer_fn,
inputs=[
Tensor(name="text", dtype=bytes, shape=(-1,)),
],
outputs=[
Tensor(name="text", dtype=bytes, shape=(-1,)),
],
config=ModelConfig(max_batch_size=1, response_cache=True),
)
Run Triton:
triton.run()
Log from starting Triton:
2023-11-30 10:29:38,466 - DEBUG - pytriton.triton: Preparing Triton Inference Server binaries and libs for execution.
2023-11-30 10:29:38,492 - DEBUG - pytriton.triton: Triton Inference Server binaries copied to /root/.cache/pytriton/workspace__lhrs25_/tritonserver without stubs.
2023-11-30 10:29:38,492 - DEBUG - pytriton.utils.distribution: Obtained pytriton module path: /usr/local/lib/python3.10/dist-packages/pytriton
2023-11-30 10:29:38,492 - DEBUG - pytriton.utils.distribution: Obtained pytriton stubs path for 3.10: /usr/local/lib/python3.10/dist-packages/pytriton/tritonserver/python_backend_stubs/3.10/triton_python_backend_stub
2023-11-30 10:29:38,492 - DEBUG - pytriton.triton: Copying stub for version 3.10 from /usr/local/lib/python3.10/dist-packages/pytriton/tritonserver/python_backend_stubs/3.10/triton_python_backend_stub to /root/.cache/pytriton/workspace__lhrs25_/tritonserver/backends/python/triton_python_backend_stub
2023-11-30 10:29:38,494 - DEBUG - pytriton.triton: Triton Inference Server binaries ready in /root/.cache/pytriton/workspace__lhrs25_/tritonserver
2023-11-30 10:29:38,494 - DEBUG - pytriton.utils.distribution: Obtained pytriton module path: /usr/local/lib/python3.10/dist-packages/pytriton
2023-11-30 10:29:38,494 - DEBUG - pytriton.utils.distribution: Obtained pytriton module path: /usr/local/lib/python3.10/dist-packages/pytriton
2023-11-30 10:29:38,494 - DEBUG - pytriton.utils.distribution: pytriton is installed in editable mode: False
2023-11-30 10:29:38,494 - DEBUG - pytriton.utils.distribution: Obtained nvidia_pytriton.libs path: /usr/local/lib/python3.10/dist-packages/nvidia_pytriton.libs
2023-11-30 10:29:38,495 - DEBUG - pytriton.client.client: Creating InferenceServerClient for http://127.0.0.1:8000 with {'network_timeout': 60.0, 'connection_timeout': 60.0}
2023-11-30 10:29:38,495 - DEBUG - pytriton.client.client: Creating InferenceServerClient for http://127.0.0.1:8000 with {'network_timeout': 60.0, 'connection_timeout': 60.0}
2023-11-30 10:29:38,495 - DEBUG - pytriton.triton: Starting Triton Inference
2023-11-30 10:29:38,495 - DEBUG - pytriton.server.triton_server: Triton Server binary /root/.cache/pytriton/workspace__lhrs25_/tritonserver/bin/tritonserver. Environment:
{
"NPP_VERSION": "12.2.1.4",
"SHELL": "/bin/bash",
"NVIDIA_VISIBLE_DEVICES": "all",
"DALI_BUILD": "9783408",
"CUSOLVER_VERSION": "11.5.2.141",
"CUBLAS_VERSION": "12.2.5.6",
"HOSTNAME": "piotrmubuntu2204",
"DCGM_VERSION": "2.4.7",
"NVIDIA_REQUIRE_CUDA": "cuda>=9.0",
"CUFFT_VERSION": "11.0.8.103",
"CUDA_CACHE_DISABLE": "1",
"NCCL_VERSION": "2.19.3",
"CUSPARSE_VERSION": "12.1.2.141",
"ENV": "/etc/shinit_v2",
"PWD": "/opt/tritonserver",
"OPENUCX_VERSION": "1.15.0",
"NSIGHT_SYSTEMS_VERSION": "2023.3.1.92",
"NVIDIA_DRIVER_CAPABILITIES": "compute,utility,video",
"POLYGRAPHY_VERSION": "0.49.0",
"TF_ENABLE_WINOGRAD_NONFUSED": "1",
"TRT_VERSION": "8.6.1.6+cuda12.0.1.011",
"NVIDIA_PRODUCT_NAME": "Triton Server",
"RDMACORE_VERSION": "39.0",
"HOME": "/root",
"LS_COLORS": "rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:",
"CUDA_VERSION": "12.2.2.009",
"CURAND_VERSION": "10.3.3.141",
"TCMALLOC_RELEASE_RATE": "200",
"CUTENSOR_VERSION": "1.7.0.1",
"TRITON_SERVER_GPU_ENABLED": "1",
"HPCX_VERSION": "2.16rc4",
"LESSCLOSE": "/usr/bin/lesspipe %s %s",
"TERM": "xterm",
"TRITON_SERVER_VERSION": "2.39.0",
"GDRCOPY_VERSION": "2.3",
"LESSOPEN": "| /usr/bin/lesspipe %s",
"OPENMPI_VERSION": "4.1.5rc2",
"NVJPEG_VERSION": "12.2.2.4",
"LIBRARY_PATH": "/usr/local/cuda/lib64/stubs:",
"SHLVL": "1",
"BASH_ENV": "/etc/bash.bashrc",
"TF_AUTOTUNE_THRESHOLD": "2",
"CUDNN_VERSION": "8.9.5.29",
"NVIDIA_TRITON_SERVER_BASE_VERSION": "23.10",
"NSIGHT_COMPUTE_VERSION": "2023.2.2.3",
"DALI_VERSION": "1.30.0",
"NVIDIA_TRITON_SERVER_VERSION": "23.10",
"LD_LIBRARY_PATH": "/opt/hpcx/ucc/lib/:/opt/hpcx/ucx/lib/:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/lib/python3.10/dist-packages/nvidia_pytriton.libs",
"NVIDIA_BUILD_ID": "72127154",
"OMPI_MCA_coll_hcoll_enable": "0",
"OPAL_PREFIX": "/opt/hpcx/ompi",
"CUDA_DRIVER_VERSION": "535.104.05",
"TRANSFORMER_ENGINE_VERSION": "0.12",
"_CUDA_COMPAT_PATH": "/usr/local/cuda/compat",
"NVIDIA_REQUIRE_JETPACK_HOST_MOUNTS": "",
"PATH": "/usr/bin:/opt/tritonserver/bin:/usr/local/mpi/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/ucx/bin",
"TRITON_SERVER_USER": "triton-server",
"MOFED_VERSION": "5.4-rdmacore39.0",
"TRTOSS_VERSION": "23.10",
"DEBIAN_FRONTEND": "noninteractive",
"TF_ADJUST_HUE_FUSED": "1",
"TF_ADJUST_SATURATION_FUSED": "1",
"UCX_MEM_EVENTS": "no",
"_": "/usr/local/bin/ipython",
"LC_CTYPE": "C.UTF-8"
}
2023-11-30 10:29:38,526 - DEBUG - pytriton.client.utils: Waiting for server to be ready (timeout=119.9999487400055)
I1130 10:29:38.546780 696 cache_manager.cc:174] Creating TritonCache with name: 'redis', libpath: '/opt/tritonserver/caches/redis/libtritoncache_redis.so', cache_config: '{"host":"localhost","port":"6379"}'
I1130 10:29:38.718499 696 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x7f5438000000' with size 268435456
I1130 10:29:38.718715 696 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I1130 10:29:38.719366 696 server.cc:592]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I1130 10:29:38.719381 696 server.cc:619]
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+
I1130 10:29:38.719391 696 server.cc:662]
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
+-------+---------+--------+
I1130 10:29:38.777869 696 metrics.cc:817] Collecting metrics for GPU 0: NVIDIA RTX A6000
I1130 10:29:38.778072 696 metrics.cc:710] Collecting CPU metrics
I1130 10:29:38.778203 696 tritonserver.cc:2458]
+----------------------------------+------------------------------------------+
| Option | Value |
+----------------------------------+------------------------------------------+
| server_id | triton |
| server_version | 2.39.0 |
| server_extensions | classification sequence model_repository |
| | model_repository(unload_dependents) sch |
| | edule_policy model_configuration system_ |
| | shared_memory cuda_shared_memory binary_ |
| | tensor_data parameters statistics trace |
| | logging |
| model_repository_path[0] | /root/.cache/pytriton/workspace__lhrs25_ |
| model_control_mode | MODE_EXPLICIT |
| strict_model_config | 0 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 1 |
+----------------------------------+------------------------------------------+
I1130 10:29:38.779899 696 grpc_server.cc:2513] Started GRPCInferenceService at 0.0.0.0:8001
I1130 10:29:38.780082 696 http_server.cc:4497] Started HTTPService at 0.0.0.0:8000
I1130 10:29:38.825539 696 http_server.cc:270] Started Metrics Service at 0.0.0.0:8002
2023-11-30 10:29:39,532 - DEBUG - pytriton.client.client: Creating InferenceServerClient for http://127.0.0.1:8000 with {'network_timeout': 60.0, 'connection_timeout': 60.0}
2023-11-30 10:29:39,533 - DEBUG - pytriton.client.client: Creating InferenceServerClient for http://127.0.0.1:8000 with {'network_timeout': 60.0, 'connection_timeout': 60.0}
2023-11-30 10:29:39,533 - DEBUG - pytriton.client.utils: Waiting for server to be ready (timeout=119.99999141693115)
2023-11-30 10:29:39,535 - DEBUG - pytriton.client.client: Closing ModelClient
[INFO/BlocksStoreManager-2] child process calling self.run()
[INFO/BlocksStoreManager-2] manager serving at '/root/.cache/pytriton/workspace__lhrs25_/data_store.sock'
2023-11-30 10:29:39,880 - DEBUG - pytriton.proxy.communication: Started remote block store at /root/.cache/pytriton/workspace__lhrs25_/data_store.sock (pid=733)
2023-11-30 10:29:39,880 - DEBUG - pytriton.models.manager: Crating model Test with version 1.
2023-11-30 10:29:39,882 - DEBUG - pytriton.proxy.inference_handler: Binding IPC socket at ipc:///root/.cache/pytriton/workspace__lhrs25_/ipc_proxy_backend_Test_0.
2023-11-30 10:29:39,884 - DEBUG - pytriton.proxy.communication: Already connectd to remote block store at /root/.cache/pytriton/workspace__lhrs25_/data_store.sock
2023-11-30 10:29:39,885 - DEBUG - pytriton.proxy.inference_handler: Waiting for requests from proxy model for Test.
2023-11-30 10:29:39,885 - DEBUG - pytriton.client.client: Creating InferenceServerClient for http://127.0.0.1:8000 with {'network_timeout': 60.0, 'connection_timeout': 60.0}
2023-11-30 10:29:39,886 - DEBUG - pytriton.client.client: Creating InferenceServerClient for http://127.0.0.1:8000 with {'network_timeout': 60.0, 'connection_timeout': 60.0}
2023-11-30 10:29:39,886 - DEBUG - pytriton.client.utils: Waiting for server to be ready (timeout=119.99999284744263)
I1130 10:29:39.894993 696 model_lifecycle.cc:461] loading: Test:1
I1130 10:29:41.234660 696 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: Test_0_0 (CPU device 0)
2023-11-30 10:29:41,529 - DEBUG - pytriton.models.model: Closing handshake socket
2023-11-30 10:29:41,536 - DEBUG - pytriton.client.client: Closing ModelClient
2023-11-30 10:29:41,537 - DEBUG - pytriton.models.manager: Done.
2023-11-30 10:29:41,537 - DEBUG - pytriton.client.client: Creating InferenceServerClient for http://127.0.0.1:8000 with {'network_timeout': 60.0, 'connection_timeout': 60.0}
2023-11-30 10:29:41,538 - DEBUG - pytriton.client.client: Creating InferenceServerClient for http://127.0.0.1:8000 with {'network_timeout': 60.0, 'connection_timeout': 60.0}
2023-11-30 10:29:41,538 - DEBUG - pytriton.client.utils: Waiting for server to be ready (timeout=59.99999165534973)
2023-11-30 10:29:41,541 - DEBUG - pytriton.client.utils: Waiting for model Test/1 to be ready (timeout=59.99741291999817)
2023-11-30 10:29:41,542 - DEBUG - pytriton.client.client: Closing ModelClient
2023-11-30 10:29:41,542 - INFO - pytriton.triton: Infer function available as model: `/v2/models/Test`
2023-11-30 10:29:41,542 - INFO - pytriton.triton: Status: `GET /v2/models/Test/ready/`
2023-11-30 10:29:41,542 - INFO - pytriton.triton: Model config: `GET /v2/models/Test/config/`
2023-11-30 10:29:41,543 - INFO - pytriton.triton: Inference: `POST /v2/models/Test/infer/`
2023-11-30 10:29:41,543 - INFO - pytriton.triton: Read more about configuring and serving models in documentation: https://triton-inference-server.github.io/pytriton.
2023-11-30 10:29:41,543 - INFO - pytriton.triton: (Press CTRL+C or use the command `kill -SIGINT 503` to send a SIGINT signal and quit)
I1130 10:29:41.536328 696 model_lifecycle.cc:818] successfully loaded
'Test'
The log indicated that cache is active and cache library is loaded.
First client run with empty cache:
from pytriton.client import ModelClient
import numpy as np
cl = ModelClient("localhost", "Test")
cl.infer_batch(np.array([["Test text ".encode('utf-8')]]))
Log output:
2023-11-30 10:29:54,252 - DEBUG - pytriton.client.utils: Adding http scheme to localhost
2023-11-30 10:29:54,253 - DEBUG - pytriton.client.client: Creating InferenceServerClient for http://localhost:8000 with {'network_timeout': 60.0, 'connection_timeout': 60.0}
2023-11-30 10:29:54,253 - DEBUG - pytriton.client.utils: Adding http scheme to localhost
2023-11-30 10:29:54,253 - DEBUG - pytriton.client.client: Creating InferenceServerClient for http://localhost:8000 with {'network_timeout': 60.0, 'connection_timeout': 60.0}
2023-11-30 10:29:54,254 - DEBUG - pytriton.client.utils: Waiting for server to be ready (timeout=299.9999930858612)
2023-11-30 10:29:54,256 - DEBUG - pytriton.client.utils: Waiting for model Test/<latest> to be ready (timeout=299.99828147888184)
2023-11-30 10:29:54,256 - DEBUG - pytriton.client.utils: Obtaining model Test config
2023-11-30 10:29:54,258 - DEBUG - pytriton.model_config.parser: Parsing Triton config model from dict:
{
"name": "Test",
"platform": "",
"backend": "python",
"version_policy": {
"latest": {
"num_versions": 1
}
},
"max_batch_size": 1,
"input": [
{
"name": "text",
"data_type": "TYPE_STRING",
"format": "FORMAT_NONE",
"dims": [
-1
],
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": false
}
],
"output": [
{
"name": "text",
"data_type": "TYPE_STRING",
"dims": [
-1
],
"label_filename": "",
"is_shape_tensor": false
}
],
"batch_input": [],
"batch_output": [],
"optimization": {
"priority": "PRIORITY_DEFAULT",
"input_pinned_memory": {
"enable": true
},
"output_pinned_memory": {
"enable": true
},
"gather_kernel_buffer_threshold": 0,
"eager_batching": false
},
"dynamic_batching": {
"preferred_batch_size": [
1
],
"max_queue_delay_microseconds": 0,
"preserve_ordering": false,
"priority_levels": 0,
"default_priority_level": 0,
"priority_queue_policy": {}
},
"instance_group": [
{
"name": "Test_0",
"kind": "KIND_CPU",
"count": 1,
"gpus": [],
"secondary_devices": [],
"profile": [],
"passive": false,
"host_policy": ""
}
],
"default_model_filename": "model.py",
"cc_model_filenames": {},
"metric_tags": {},
"parameters": {
"shared-memory-socket": {
"string_value": "ipc:///root/.cache/pytriton/workspace__lhrs25_/ipc_proxy_backend_Test"
}
},
"model_warmup": [],
"response_cache": {
"enable": true
}
}
2023-11-30 10:29:54,258 - DEBUG - pytriton.model_config.parser: backend_parameters_config is a dictionary: {'shared-memory-socket': {'string_value': 'ipc:///root/.cache/pytriton/workspace__lhrs25_/ipc_proxy_backend_Test'}}
2023-11-30 10:29:54,259 - DEBUG - pytriton.client.utils: Model config: TritonModelConfig(model_name='Test', model_version=1, max_batch_size=1, batching=True, batcher=DynamicBatcher(max_queue_delay_microseconds=0, preferred_batch_size=[1], preserve_ordering=False, priority_levels=0, default_priority_level=0, default_queue_policy=None, priority_queue_policy=None), instance_group={<DeviceKind.KIND_CPU: 'KIND_CPU'>: 1}, decoupled=False, backend_parameters={'shared-memory-socket': 'ipc:///root/.cache/pytriton/workspace__lhrs25_/ipc_proxy_backend_Test'}, inputs=[TensorSpec(name='text', shape=(-1,), dtype=<class 'numpy.bytes_'>, optional=False)], outputs=[TensorSpec(name='text', shape=(-1,), dtype=<class 'numpy.bytes_'>, optional=False)], response_cache=ResponseCache(enable=True))
2023-11-30 10:29:54,259 - DEBUG - pytriton.client.utils: Waiting for server to be ready (timeout=299.9999933242798)
2023-11-30 10:29:54,259 - DEBUG - pytriton.client.utils: Waiting for model Test/<latest> to be ready (timeout=299.9993267059326)
2023-11-30 10:29:54,260 - DEBUG - pytriton.client.utils: Waiting for server to be ready (timeout=299.9989459514618)
2023-11-30 10:29:54,260 - DEBUG - pytriton.client.utils: Waiting for model Test/<latest> to be ready (timeout=299.9983859062195)
2023-11-30 10:29:54,261 - DEBUG - pytriton.client.utils: Obtaining model Test config
2023-11-30 10:29:54,263 - DEBUG - pytriton.model_config.parser: Parsing Triton config model from dict:
{
"name": "Test",
"platform": "",
"backend": "python",
"version_policy": {
"latest": {
"num_versions": 1
}
},
"max_batch_size": 1,
"input": [
{
"name": "text",
"data_type": "TYPE_STRING",
"format": "FORMAT_NONE",
"dims": [
-1
],
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": false
}
],
"output": [
{
"name": "text",
"data_type": "TYPE_STRING",
"dims": [
-1
],
"label_filename": "",
"is_shape_tensor": false
}
],
"batch_input": [],
"batch_output": [],
"optimization": {
"priority": "PRIORITY_DEFAULT",
"input_pinned_memory": {
"enable": true
},
"output_pinned_memory": {
"enable": true
},
"gather_kernel_buffer_threshold": 0,
"eager_batching": false
},
"dynamic_batching": {
"preferred_batch_size": [
1
],
"max_queue_delay_microseconds": 0,
"preserve_ordering": false,
"priority_levels": 0,
"default_priority_level": 0,
"priority_queue_policy": {}
},
"instance_group": [
{
"name": "Test_0",
"kind": "KIND_CPU",
"count": 1,
"gpus": [],
"secondary_devices": [],
"profile": [],
"passive": false,
"host_policy": ""
}
],
"default_model_filename": "model.py",
"cc_model_filenames": {},
"metric_tags": {},
"parameters": {
"shared-memory-socket": {
"string_value": "ipc:///root/.cache/pytriton/workspace__lhrs25_/ipc_proxy_backend_Test"
}
},
"model_warmup": [],
"response_cache": {
"enable": true
}
}
2023-11-30 10:29:54,263 - DEBUG - pytriton.model_config.parser: backend_parameters_config is a dictionary: {'shared-memory-socket': {'string_value': 'ipc:///root/.cache/pytriton/workspace__lhrs25_/ipc_proxy_backend_Test'}}
2023-11-30 10:29:54,263 - DEBUG - pytriton.client.utils: Model config: TritonModelConfig(model_name='Test', model_version=1, max_batch_size=1, batching=True, batcher=DynamicBatcher(max_queue_delay_microseconds=0, preferred_batch_size=[1], preserve_ordering=False, priority_levels=0, default_priority_level=0, default_queue_policy=None, priority_queue_policy=None), instance_group={<DeviceKind.KIND_CPU: 'KIND_CPU'>: 1}, decoupled=False, backend_parameters={'shared-memory-socket': 'ipc:///root/.cache/pytriton/workspace__lhrs25_/ipc_proxy_backend_Test'}, inputs=[TensorSpec(name='text', shape=(-1,), dtype=<class 'numpy.bytes_'>, optional=False)], outputs=[TensorSpec(name='text', shape=(-1,), dtype=<class 'numpy.bytes_'>, optional=False)], response_cache=ResponseCache(enable=True))
2023-11-30 10:29:54,264 - DEBUG - pytriton.client.client: Sending inference request to Triton Inference Server
2023-11-30 10:29:54,300 - DEBUG - pytriton.proxy.inference_handler: Preparing inputs for Test.
2023-11-30 10:29:54,300 - DEBUG - pytriton.proxy.inference_handler: Processing inference callback for Test.
2023-11-30 10:29:54,301 - DEBUG - pytriton.proxy.inference_handler: Validating outputs for Test.
2023-11-30 10:29:54,301 - DEBUG - pytriton.proxy.validators: Outputs: [{'text': array(b'Test', dtype='|S4')}]
2023-11-30 10:29:54,301 - DEBUG - pytriton.proxy.validators: Response: {'text': array(b'Test', dtype='|S4')}
2023-11-30 10:29:54,301 - DEBUG - pytriton.proxy.validators: text: b'Test'
2023-11-30 10:29:54,301 - DEBUG - pytriton.proxy.inference_handler: Copying outputs to shared memory for Test.
2023-11-30 10:29:54,303 - DEBUG - pytriton.proxy.inference_handler: Sending response: InferenceHandlerResponses(responses=[MetaRequestResponse(idx=0, data={'text': 'psm_75d63589:65'}, parameters=None, eos=False)], error=None)
2023-11-30 10:29:54,304 - DEBUG - pytriton.proxy.inference_handler: Send eos response to proxy model for Test.
2023-11-30 10:29:54,304 - DEBUG - pytriton.proxy.communication: Releasing shared memory block for tensor psm_75d63589:0
2023-11-30 10:29:54,304 - DEBUG - pytriton.proxy.inference_handler: Waiting for requests from proxy model for Test.
Out[12]: {'text': array(b'Test', dtype=object)}
The inference callable was called here so cache was not used.
Second run with cache containging response:
cl.infer_batch(np.array([["Test text ".encode('utf-8')]]))
Log output:
In [7]: cl.infer_batch(np.array([["Test text ".encode('utf-8')]]))
2023-11-30 10:43:56,994 - DEBUG - pytriton.client.client: Sending inference request to Triton Inference Server
Out[7]: {'text': array(b'Test', dtype=object)}
Only client is present in log because cashe served answer.
Thanks for helping out @piotrm-nvidia , I am using nvidia-pytriton==0.4.1.
nvcr.io/nvidia/tritonserver:23
.docker-compose.yml
Also, would you mind sharing how large your compiled docker image is? docker image ls
With redis disabled I'm building images that are roughly 21GB and I'm not sure why
I used this docker image for AMD64 Linux:
nvcr.io/nvidia/tritonserver:23.10-pyt-python-py3
redis-cli
.I can ps my instance of container:
$ docker ps --all --size
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES SIZE
6d7a1947efcb nvcr.io/nvidia/tritonserver:23.10-pyt-python-py3 "/opt/nvidia/nvidia_…" 27 hours ago Exited (0) 27 hours ago reverent_joliot 51.3kB (virtual 9.72GB)
This issue is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.
Description
Running the docker image (below) with redis cache settings enabled via
cache_config
causes the triton server to fail on launch.When I comment out both the
cache_config
andcache_directory
options, the server starts and runs successfully but does not utilize the redis cache.When I uncomment the
cache_config
andcache_directory
options, the server fails to find thelibtritonserver.so
shared library file.To reproduce
Fails with
unable to find shared library libtritonserver.so
Does not fail, but does not utilize
Redis
Observed results and expected behavior
Please describe the observed results as well as the expected results. If possible, attach relevant log output to help analyze your problem. If an error is raised, please paste the full traceback of the exception.
Environment
Additional context Add any other context about the problem here.