Integrate experimental FIL in the FIL backend

triton-inference-server / fil_backend

FIL backend for the Triton Inference Server

Apache License 2.0

68 stars 35 forks source link

Open hcho3 opened 1 year ago

hcho3 commented 1 year ago

Use use_experimental_optimizations flag to selectively enable the new FIL
Enable the new FIL for both CPU and GPU inference workload. Note: requires https://github.com/rapidsai/cuml/pull/5559 to function.
Implement common postprocessors to process the output of the new FIL.