infstreams questions - Githubissues

My recommended commands are based on benchmarking I did a while ago. I tested a lot of combinations and just picked the fastest one for the readme.

  --infStreams=N              Instantiate N engines to run inference concurrently (default = 1)

I tested it again with trt 9.3 on a 4090 with 1080p.

Engine 1:

trtexec --bf16 --fp16 --onnx=2x_AnimeJaNai_HD_V3_Sharp1_UltraCompact_425k_clamp_fp16_op18_onnxslim.onnx --minShapes=input:1x3x8x8 --optShapes=input:1x3x720x1280 --maxShapes=input:1x3x1080x1920 --saveEngine=2x_AnimeJaNai_HD_V3_Sharp1_UltraCompact_425k_clamp_fp16_op18_onnxslim_infStreams1.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --skipInference --useCudaGraph --noDataTransfers --builderOptimizationLevel=5 --infStreams=1

Engine 2:

trtexec --bf16 --fp16 --onnx=2x_AnimeJaNai_HD_V3_Sharp1_UltraCompact_425k_clamp_fp16_op18_onnxslim.onnx --minShapes=input:1x3x8x8 --optShapes=input:1x3x720x1280 --maxShapes=input:1x3x1080x1920 --saveEngine=2x_AnimeJaNai_HD_V3_Sharp1_UltraCompact_425k_clamp_fp16_op18_onnxslim_infStreams4.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --skipInference --useCudaGraph --noDataTransfers --builderOptimizationLevel=5 --infStreams=4

clip = core.trt.Model(
    clip,
    engine_path="/workspace/tensorrt/engine.engine",
    num_streams=4, # using 4 for both
)

infStreams	results
1	Output 2210 frames in 29.37 seconds (75.24 fps)
4	Output 2210 frames in 29.01 seconds (76.19 fps)

It seems to be faster, but may depend on model or other factors.

styler00dollar / VSGAN-tensorrt-docker

infstreams questions #70