styler00dollar / VSGAN-tensorrt-docker

Using VapourSynth with super resolution and interpolation models and speeding them up with TensorRT.
BSD 3-Clause "New" or "Revised" License
288 stars 30 forks source link

infstreams questions #70

Closed ylab604 closed 4 months ago

ylab604 commented 4 months ago

Thank you for great work! I have a question. What is the purpose of specifying the --infStreams condition when creating an engine with trtexec? From my experience, setting infStreams=4 on a 4090 did not result in any speed improvements. Why is this argument used?

styler00dollar commented 4 months ago

My recommended commands are based on benchmarking I did a while ago. I tested a lot of combinations and just picked the fastest one for the readme.

  --infStreams=N              Instantiate N engines to run inference concurrently (default = 1)

I tested it again with trt 9.3 on a 4090 with 1080p.

Engine 1:

trtexec --bf16 --fp16 --onnx=2x_AnimeJaNai_HD_V3_Sharp1_UltraCompact_425k_clamp_fp16_op18_onnxslim.onnx --minShapes=input:1x3x8x8 --optShapes=input:1x3x720x1280 --maxShapes=input:1x3x1080x1920 --saveEngine=2x_AnimeJaNai_HD_V3_Sharp1_UltraCompact_425k_clamp_fp16_op18_onnxslim_infStreams1.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --skipInference --useCudaGraph --noDataTransfers --builderOptimizationLevel=5 --infStreams=1

Engine 2:

trtexec --bf16 --fp16 --onnx=2x_AnimeJaNai_HD_V3_Sharp1_UltraCompact_425k_clamp_fp16_op18_onnxslim.onnx --minShapes=input:1x3x8x8 --optShapes=input:1x3x720x1280 --maxShapes=input:1x3x1080x1920 --saveEngine=2x_AnimeJaNai_HD_V3_Sharp1_UltraCompact_425k_clamp_fp16_op18_onnxslim_infStreams4.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --skipInference --useCudaGraph --noDataTransfers --builderOptimizationLevel=5 --infStreams=4
clip = core.trt.Model(
    clip,
    engine_path="/workspace/tensorrt/engine.engine",
    num_streams=4, # using 4 for both
)
infStreams results
1 Output 2210 frames in 29.37 seconds (75.24 fps)
4 Output 2210 frames in 29.01 seconds (76.19 fps)

It seems to be faster, but may depend on model or other factors.