styler00dollar / VSGAN-tensorrt-docker

Using VapourSynth with super resolution and interpolation models and speeding them up with TensorRT.
BSD 3-Clause "New" or "Revised" License
286 stars 30 forks source link

chain upscaling and vfi on different gpus #47

Closed efschu closed 9 months ago

efschu commented 1 year ago

I'm chaining upscaling with vfi by this

clip = vfi_inference( model_inference=model_inference, clip=core.trt.Model(clip, engine_path="/workspace/tensorrt/RealESRGAN_x2plus_2080ti_1080x1920.engine", num_streams=1, device_id=0), multi=2, metric_thresh=0.999 )

which works as expected, but actual I want the jobs to be split up, upscaling done by gpu 1, vfi done by gpu 0.

But if I change device_id I get RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

So doing it that way is not possible. Is it possible on a different way?

styler00dollar commented 1 year ago

PyTorch inference, by which i mean usage of vfi_inference, is probably the reason. Or rather, whatever you used as model_inference. I assume it will work if you use TensorRT for both.

clip = core.trt.Model(clip, engine_path = "sr_engine.engine", num_streams = 1, device_id = 0)
clip = rife_trt(clip, multi = 2, scale = 1.0, device_id = 1, num_streams = 2, engine_path = "rife_engine.engine")

Since I only have one GPU in one system, I can't verify code. I would recommend to do it like my readme says though for better utilization. Only TensorRT and ncnn have a device select variable for now.

efschu commented 1 year ago

Ok, I created the engine like mentioned here: https://github.com/styler00dollar/VSGAN-tensorrt-docker/issues/18 with this command:

trtexec --onnx=models/rife46_ensembleTrue_op17.onnx --minShapes=input:1x8x64x64 --optShapes=input:1x8x720x1280 --maxShapes=input:1x8x2160x3840 --saveEngine=rife46_ensembleTrue_op17.onnx.max2160x3840.P40.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --buildOnly --device=1

But when using it with:

clip = core.trt.Model(clip, engine_path="/workspace/tensorrt/RealESRGAN_x2plus_2080ti_1080x1920.engine", num_streams=2, device_id=1) clip = rife_trt(clip, multi = 2, scale = 1.0, device_id = 0, num_streams = 2, engine_path = "/workspace/tensorrt/rife46_ensembleTrue_op17.onnx.max2160x3840.P40.engine")

it errors out with:

File "/workspace/tensorrt/inference_config_both.py", line 185, in inference_clip clip = rife_trt(clip, multi = 2, scale = 1.0, device_id = 0, num_streams = 2, engine_path = "/workspace/tensorrt/rife46_ensembleTrue_op17.onnx.max2160x3840.P40.engine") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/tensorrt/src/rife_trt.py", line 32, in rife_trt if check_model_precision_trt(engine_path) == "float32": ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/workspace/tensorrt/src/rife_trt.py", line 18, in check_model_precision_trt with runner: File "/usr/local/lib/python3.11/site-packages/polygraphy/backend/base/runner.py", line 60, in enter self.activate() File "/usr/local/lib/python3.11/site-packages/polygraphy/backend/base/runner.py", line 95, in activate self.activate_impl() File "/usr/local/lib/python3.11/site-packages/polygraphy/util/util.py", line 694, in wrapped return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/polygraphy/backend/trt/runner.py", line 106, in activate_impl G_LOGGER.critical( File "/usr/local/lib/python3.11/site-packages/polygraphy/logger/logger.py", line 597, in critical raise PolygraphyException(message) from None polygraphy.exception.exception.PolygraphyException: Invalid Engine or Context. Please ensure the engine was built correctly. See error log for details.

So somehow I'm to stupid to use rife as tensor engine...

styler00dollar commented 10 months ago

I think I am now able to reproduce this, but I need more time to look into it. As a temporary workaround, you can set the precision manually in rife_trt.py. What you need to set will depend on which onnx you used.

"""
    if check_model_precision_trt(engine_path) == "float32":
        grayPrecision = vs.GRAYS
    else:
        grayPrecision = vs.GRAYH
"""
grayPrecision = vs.GRAYH  # using fp16
styler00dollar commented 9 months ago

The Invalid Engine or Context issue was a tensorrt and polygraphy version mismatch and it currently works in this branch. Fixed the precision check. And regarding multiple gpus, I will only work with such if I have such a system myself.