This repo contains an example TRITONCACHE API implementation for caching data locally in-memory.
Ask questions or report problems in the main Triton issues page.
Use a recent cmake to build. First install the required dependencies.
$ apt-get install libboost-dev rapidjson-dev
To build the cache:
$ mkdir build
$ cd build
$ cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install ..
$ make install
The following required Triton repositories will be pulled and used in the build. By default the "main" branch/tag will be used for each repo but the following CMake arguments can be used to override.
-D TRITON_CORE_REPO_TAG=[tag]
-D TRITON_COMMON_REPO_TAG=[tag]
Like other TRITONCACHE
implementations, this cache is configured through the
tritonserver --cache-config
CLI arg or through the
TRITONSERVER_SetCacheConfig
API.
Currently, the following config fields are supported:
size
: The fixed size (in bytes) of CPU memory allocated to the cache
upfront. If this value is too large (ex: greater than available memory) or
too small (ex: smaller than required overhead such as ~1-2 KB), initialization
may fail.
tritonserver --cache-config local,size=1048576
When TRITON_ENABLE_METRICS
is enabled in this cache (enabled by default),
it will check to see if the running Triton server has metrics enabled as well.
If so, the cache will publish additional cache-specific metrics to Triton's
metrics endpoint through the
Custom Metrics API.
The following metrics are reported by this cache implementation:
Category | Metric | Metric Name | Description | Granularity | Frequency |
---|---|---|---|---|---|
Utilization | Total Cache Utilization | nv_cache_util |
Total cache utilization rate (0.0 - 1.0) | Server-wide | Per interval |
Count | Total Cache Entry Count | nv_cache_num_entries |
Total number of entries stored in cache | Server-wide | Per interval |
Total Cache Lookup Count | nv_cache_num_lookups |
Total number of cache lookups done by Triton | Server-wide | Per interval | |
Total Cache Hit Count | nv_cache_num_hits |
Total number of cache hits | Server-wide | Per interval | |
Total Cache Miss Count | nv_cache_num_misses |
Total number of cache misses | Server-wide | Per interval | |
Total Cache Eviction Count | nv_cache_num_evictions |
Total number of cache evictions | Server-wide | Per interval | |
Latency | Total Cache Lookup Time | nv_cache_lookup_duration |
Cumulative time spent doing cache lookups (microseconds) | Server-wide | Per interval |
Total Cache Insertion Time | nv_cache_insertion_duration |
Cumulative time spent doint cache insertions (microseconds) | Server-wide | Per interval |