tongyuantongyu / vs-NNVISR

Neural Network Video Interpolation / Super Resolution Filter for VapourSynth
BSD 3-Clause "New" or "Revised" License
46 stars 1 forks source link

Usage of custom onnx models #1

Open styler00dollar opened 1 year ago

styler00dollar commented 1 year ago

First of all, thanks for doing this. You seem like the first person to make a tensorrt onnx vs filter that allows batch size > 1. I tried to use it, but it seems like rgb_1_1/fe_n1_2x2_l1.onnx is hardcoded.

clip = core.nnvisr.Super(clip, scale_factor=2, batch_size_extract=2, use_fp16=True, model="cugan_pro-denoise3x-up2x_fp16_opset18_clamp_and_colorfix.onnx", model_path="/workspace/tensorrt/")

Information: NNVISR: building engine for current resolution. This will take some time.
Warning: Check failed /workspace/tensorrt/vs-NNVISR/src/optimize.cpp:193: input_fe.is_open(), Source model file not exist:"/workspace/tensorrt/models/cugan_pro-denoise3x-up2x_fp16_opset18_clamp_and_colorfix.onnx/rgb_1_1/fe_n1_2x2_l1.onnx"
Segmentation fault (core dumped)

Would it be possible to just set paths direclty? Renaming everything to fe_n{input_count}_{scale_factor}x{scale_factor_h}_l{extraction_layers}.onnx and ff_n{input_count}_{scale_factor}x{scale_factor_h}_l{extraction_layers}.onnx sounds bothersome. Configurations should be passed via python api instead of file names if they are required.

From what I understand this repository aims to be an alternative to mlrt and allows upscaling or interpolation onnx while fe is for single image inference and ff is for interpolation tasks. Would it be possible to just specify one or two onnx? For example using image upscaling onnx and interpolation onnx, like cugan and rife.

tongyuantongyu commented 1 year ago

Thanks for your interest!

From what I understand this repository aims to be an alternative to mlrt and allows upscaling or interpolation onnx while fe is for single image inference and ff is for interpolation tasks

Not quite right here. We are not aiming at being an alternative of vs-mlrt, but focus mainly on models that do multiple frames to multiple frames enhancement, which vs-mlrt currently not able to do, or at least not easily.

fe and ff are abbreviations for "feature extract" and "feature fusion" respectively, and combining both are one whole network. Networks doing enhancement and interpolation are not simply stacking the two tasks, but fusing them in some ways, so the output of fe is almost never "frames", and the intention of separation is to avoid repeated work. So, fe and ff can not do single frame and interpolation on their own.

Would it be possible to just set paths direclty?

"model" here does not specify the model file path, but a name indicating where to find the model.

We consider specifying model path more repetition for user (now these configurations are both in model path and other arguments), and more error prone, since it's easy to set path to wrong model, or setting unsupported parameters (usually it is the case that for any parameter in onnx name, changing it requires at least regenerating onnx, or even retraining the network)

We would like network author/ packager to release onnx models (as a folder with correct structure) for users, and users download models and use. It may feel more work if you are doing both.

styler00dollar commented 1 year ago

fe and ff are abbreviations for "feature extract" and "feature fusion"

I meant, that it could be used for such. If fe generates a "feature" that is simply a 3 channel tensor which is just an rgb image, and then pass it into ff. Most models are doing separate tasks, and being able to combine such efficiently would be an interesting practical use case. For example upscaling with cugan and interpolation with rife. Nearly no custom trained models are doing both tasks at once, and thus limit the practical usage. People train either upscaling or interpolation based on their specific use case, so you have models specialized on both, but not really at once.

Doing such is possible with two seperate core.trt.Model calls and a bit of python, but I think it should be more efficient if both gets combined within C++, like your approach.

So, fe and ff can not do single frame and interpolation on their own

To just do one task at once, an onnx that just returns input could be used. I am mainly looking for the speedup due to batch size, since mlrt never implemented it.

and more error prone

I was imagining a scenario, where you have like multiple onnx files, and every time you want to use a model, you need to rename it. Making it not possible to determine again what the model actually was, unless you make sure you store it in folders and do not rename folders. Like, training a model for a while, for certain amount of iterations, and generating a new onnx. Then renaming every time.

tongyuantongyu commented 1 year ago

I meant, that it could be used for such.

I can understand your use case, but unfortunately that's not the use case we design and optimize for.

Nearly no custom trained models are doing both tasks at once

We are trying to facilitate the training and inferencing workflow of such models, and hopefully there will be more in the future.

I think it should be more efficient if both gets combined within C++

The potential efficiency improvement probably just comes from the elimination of copy between system and device memory. In my personal opinion, it solely does not serve as a sufficient reason to break the goodness of modularity - every VapourSynth plugin is a reusable part in the processing pipeline. With that being said, I probably won't putting effort in making special design for such use case.

In the long term, I believe the impovement should be made in VapourSynth, to have better support for heterogenous computation, like VideoFrame-s reside in device memory.

mainly looking for the speedup due to batch size, since mlrt never implemented it.

Doing batching means the loss of random access ability, so I doubt vs-mlrt will ever implement it. I wonder if having multiple streams can give you similar speed-up like batching, which vs-mlrt does support.

For NNVISR, it's forced to process frames sequentially due to the nature of interpolation network, so we implemented batch support as well because there's nothing more to lose.

where you have like multiple onnx files

Actually onnx files are just details that only author of model should care. Users can simply throw the provided model files into their model path and then forget it.

And again fe and ff are not intend to be meaningful model by themselves, instead the parent folder's name (parent's parent, to be precise) indicates what the model - the combination of the two onnx files - is. It is just that combining two standalone models also happens to fit in our design. As a workaround, you may ln/mklink fe/ff to the actual onnx file, so that you won't lose your track of origin.