styler00dollar / VSGAN-tensorrt-docker

Using VapourSynth with super resolution and interpolation models and speeding them up with TensorRT.
BSD 3-Clause "New" or "Revised" License
274 stars 30 forks source link

error: bits per sample mismatch #82

Closed abcnorio closed 1 week ago

abcnorio commented 1 week ago

Hello,

the build went fine, but did not find some libs that physically were there, and therefor did not work out. Will separately post that, till then tried to use via docker pull the

styler00dollar/vsgan_tensorrt:latest_no_avx512 1c70b0b0ce36

GPU is a rtx 4090, cpu is Intel(R) Core(TM) i9-14900, OS Debian trixie, Docker version 26.1.5+dfsg1, build a72d7cd

input video file (originally SVHS):

Input #0, matroska,webm, from 'UNtalk.mkv':
  Metadata:
    encoder         : libebml v1.3.0 + libmatroska v1.4.1
    creation_time   : 2016-06-20T08:18:08.000000Z
  Duration: 00:14:53.56, start: 0.000000, bitrate: 3176 kb/s
  Stream #0:0: Video: h264 (High), yuv420p(progressive), 720x576, SAR 1:1 DAR 5:4, 25 fps, 25 tbr, 1k tbn (default)
  Stream #0:1: Audio: mp2, 48000 Hz, stereo, s16p, 224 kb/s (default)

output with error:


# vspipe -c y4m inference.py -
Script evaluation failed:
Python exception: operator(): bits per sample mismatch

Traceback (most recent call last):
  File "src/cython/vapoursynth.pyx", line 3387, in vapoursynth._vpy_evaluate
  File "src/cython/vapoursynth.pyx", line 3388, in vapoursynth._vpy_evaluate
  File "inference.py", line 10, in <module>
    clip = inference_clip(video_path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/tensorrt/inference_config.py", line 22, in inference_clip
    clip = core.trt.Model(
           ^^^^^^^^^^^^^^^
  File "src/cython/vapoursynth.pyx", line 3123, in vapoursynth.Function.__call__
vapoursynth.Error: operator(): bits per sample mismatch

Used the following model and converted as outlined (no problems) to model.engine

RealESRGANv2-animevideo-xsx2.onnx

with trtexec --bf16 --fp16 --onnx=RealESRGANv2-animevideo-xsx2.onnx --minShapes=input:1x3x8x8 --optShapes=input:1x3x720x1280 --maxShapes=input:1x3x1080x1920 --saveEngine=model.engine --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --skipInference --useCudaGraph --noDataTransfers --builderOptimizationLevel=5

inference.py and inference_config.py are with default values, just changed the video input:

inference.py

# cat inference.py 
import warnings

warnings.filterwarnings("ignore")
import sys

sys.path.append("/workspace/tensorrt/")
from inference_config import inference_clip

video_path = "/workspace/tensorrt/UNtalk.mkv"
clip = inference_clip(video_path)
clip.set_output()

inference_config.py

cat inference_config.py
import sys

sys.path.append("/workspace/tensorrt/")
import vapoursynth as vs

core = vs.core
core.num_threads = 4  # can influence ram usage
# only needed if you are inside docker
core.std.LoadPlugin(path="/usr/local/lib/libvstrt.so")

def inference_clip(video_path="UNtalk.mkv", clip=None):
    if clip is None:
        clip = core.bs.VideoSource(source=video_path)
        # clip = core.lsmas.LWLibavSource(source=video_path)
        # clip = core.ffms2.Source(source=video_path, cache=False)

    # convert colorspace
    clip = vs.core.resize.Bicubic(clip, format=vs.RGBH, matrix_in_s="709")

    # vs-mlrt (you need to create the engine yourself, read the readme)
    clip = core.trt.Model(
        clip,
        engine_path="/workspace/tensorrt/model.engine",
        # tilesize=[854, 480],
        overlap=[0, 0],
        num_streams=4,
    )

    clip = vs.core.resize.Bicubic(clip, format=vs.YUV420P8, matrix_s="709")
    return clip

So what's the problem? Some filter to apply before actually applying the trt model? Wrong colorspace? Thanks!

styler00dollar commented 1 week ago

I wrote what to do in my readme. Your input precision is wrong. Input precision (by default the same as the onnx precision) and model precision (the arguments you specify in trtexec) are two different things.

clip = vs.core.resize.Bicubic(clip, format=vs.RGBH, matrix_in_s="709")  # RGBS means fp32, RGBH means fp16
abcnorio commented 1 week ago

On Tue, 24 Sep 2024 18:31:14 -0700 sudo @.***> wrote:

yes thanks that's obvious, but is there a reason? Going on the vapoursynth page it lists RGBH but without any technical explanation what this is, the net does not know a fp16 RGBH format. One would expect 24 or 32 bit based on (2^8)^3 RGB values. Such things have to be documented where they appear (this is not meant for you, but for vapoursynth). So the problem here is the vapoursynth part?

Question: How does it decide which format the tensorrt uses? It is obviously not the model precision so the model creation call is unrelated to that, correct assumption?

How to find that out?

thx for a hint where to get those infos. Often there are "trivial" things not documented properly that are the cause for delays. I had a similar issues with fine-tuning and data set prep for upscaling where the gt_size was not properly documented and only later one finds out "en passent" that the internal engine crops a little bit more so this is not related to the previous cropping part to prep training images. One sentence would be enough.

Thanks and best!

I wrote what to do in my readme. Your input precision is wrong. Input precision and model precision are two different things.

clip = vs.core.resize.Bicubic(clip, format=vs.RGBH, matrix_in_s="709")  # RGBS means fp32, RGBH means fp16

-- Reply to this email directly or view it on GitHub: https://github.com/styler00dollar/VSGAN-tensorrt-docker/issues/82#issuecomment-2372688691 You are receiving this because you authored the thread.

Message ID: @.***>

styler00dollar commented 1 week ago

without any technical explanation what this is

The terminology "Single" and "Half" refers to IEEE 754.

grafik

Image source

It doesn't seem to be written in the vapoursynth documentation.

How does it decide which format the tensorrt uses?

By default the input precision is the onnx model precision, if the onnx is fp16 then the input will be fp16, but you can customize it with trtexec arguments.

  --inputIOFormats=spec              Type and format of each of the input tensors (default = all inputs in fp32:chw)
                                     See --outputIOFormats help for the grammar of type and format list.
                                     Note: If this option is specified, please set comma-separated types and formats for all
                                           inputs following the same order as network inputs ID (even if only one input
                                           needs specifying IO format) or set the type and format once for broadcasting.
  --outputIOFormats=spec             Type and format of each of the output tensors (default = all outputs in fp32:chw)
                                     Note: If this option is specified, please set comma-separated types and formats for all
                                           outputs following the same order as network outputs ID (even if only one output
                                           needs specifying IO format) or set the type and format once for broadcasting.
                                     IO Formats: spec  ::= IOfmt[","spec]
                                                 IOfmt ::= type:fmt
                                               type  ::= "fp32"|"fp16"|"bf16"|"int32"|"int64"|"int8"|"uint8"|"bool"
                                               fmt   ::= ("chw"|"chw2"|"chw4"|"hwc8"|"chw16"|"chw32"|"dhwc8"|
                                                          "cdhw32"|"hwc"|"dla_linear"|"dla_hwc4")["+"fmt]

To have fp16 input for a fp32 onnx, it would be --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw.

One sentence would be enough.

In my readme I wrote:

- If you use the FP16 onnx you need to use RGBH colorspace, if you use FP32 onnx you need to use RGBS colorspace in inference_config.py .

Isn't that enough?

abcnorio commented 1 week ago

On Fri, 27 Sep 2024 15:58:06 -0700 sudo @.***> wrote:

Honestly, this is pretty much clearer!

I have to think carefully and will send you a suggestion after reading all the links so that may be one sentence can be added as another explanation to your webpage. Shame on me I am not from the IT world so there are quite some things not always clear to me even if I read it carefully. But still I have to use the technology (we talk here of a voluntary area, not a professional commercial one).

best wishes and many thanks! Very appreciated !!

best wishes

without any technical explanation what this is

The terminology "Single" and "Half" refers to IEEE 754.

grafik

Image source

It doesn't seem to be written in the vapoursynth documentation.

How does it decide which format the tensorrt uses?

By default the input precision is the onnx model precision, if the onnx is fp16 then the input will be fp16, but you can customize it with trtexec arguments.

  --inputIOFormats=spec              Type and format of each of the input tensors (default = all inputs in fp32:chw)
                                     See --outputIOFormats help for the grammar of type and format list.
                                     Note: If this option is specified, please set comma-separated types and formats for all
                                           inputs following the same order as network inputs ID (even if only one input
                                           needs specifying IO format) or set the type and format once for broadcasting.
  --outputIOFormats=spec             Type and format of each of the output tensors (default = all outputs in fp32:chw)
                                     Note: If this option is specified, please set comma-separated types and formats for all
                                           outputs following the same order as network outputs ID (even if only one output
                                           needs specifying IO format) or set the type and format once for broadcasting.
                                     IO Formats: spec  ::= IOfmt[","spec]
                                                 IOfmt ::= type:fmt
                                               type  ::= "fp32"|"fp16"|"bf16"|"int32"|"int64"|"int8"|"uint8"|"bool"
                                               fmt   ::= ("chw"|"chw2"|"chw4"|"hwc8"|"chw16"|"chw32"|"dhwc8"|
                                                          "cdhw32"|"hwc"|"dla_linear"|"dla_hwc4")["+"fmt]

To have fp16 input for a fp32 onnx, it would be --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw.

One sentence would be enough.

In my readme I wrote:

- If you use the FP16 onnx you need to use RGBH colorspace, if you use FP32 onnx you need to use RGBS colorspace in inference_config.py .

Isn't that enough?

-- Reply to this email directly or view it on GitHub: https://github.com/styler00dollar/VSGAN-tensorrt-docker/issues/82#issuecomment-2380252304 You are receiving this because you authored the thread.

Message ID: @.***>