triton-inference-server / local_cache

Implementation of a local in-memory cache for Triton Inference Server's TRITONCACHE API
BSD 3-Clause "New" or "Revised" License
5 stars 1 forks source link

License

Triton Local Cache

This repo contains an example TRITONCACHE API implementation for caching data locally in-memory.

Ask questions or report problems in the main Triton issues page.

Build the Cache

Use a recent cmake to build. First install the required dependencies.

$ apt-get install libboost-dev rapidjson-dev

To build the cache:

$ mkdir build
$ cd build
$ cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install ..
$ make install

The following required Triton repositories will be pulled and used in the build. By default the "main" branch/tag will be used for each repo but the following CMake arguments can be used to override.

Configuring the Cache

Like other TRITONCACHE implementations, this cache is configured through the tritonserver --cache-config CLI arg or through the TRITONSERVER_SetCacheConfig API.

Currently, the following config fields are supported:

Metrics

When TRITON_ENABLE_METRICS is enabled in this cache (enabled by default), it will check to see if the running Triton server has metrics enabled as well. If so, the cache will publish additional cache-specific metrics to Triton's metrics endpoint through the Custom Metrics API.

Cache Metrics

The following metrics are reported by this cache implementation:

Category Metric Metric Name Description Granularity Frequency
Utilization Total Cache Utilization nv_cache_util Total cache utilization rate (0.0 - 1.0) Server-wide Per interval
Count Total Cache Entry Count nv_cache_num_entries Total number of entries stored in cache Server-wide Per interval
Total Cache Lookup Count nv_cache_num_lookups Total number of cache lookups done by Triton Server-wide Per interval
Total Cache Hit Count nv_cache_num_hits Total number of cache hits Server-wide Per interval
Total Cache Miss Count nv_cache_num_misses Total number of cache misses Server-wide Per interval
Total Cache Eviction Count nv_cache_num_evictions Total number of cache evictions Server-wide Per interval
Latency Total Cache Lookup Time nv_cache_lookup_duration Cumulative time spent doing cache lookups (microseconds) Server-wide Per interval
Total Cache Insertion Time nv_cache_insertion_duration Cumulative time spent doint cache insertions (microseconds) Server-wide Per interval