saic-violet / bilayer-model

Mozilla Public License 2.0
245 stars 49 forks source link

Too few Fps #7

Open nagadit opened 4 years ago

nagadit commented 4 years ago

The speed of work does not correspond to the declared one, or I am doing something wrong. GPU - 2080TI Help with this pls.

Снимок экрана 2020-10-19 в 02 10 12
noyami2033 commented 4 years ago

@nagadit Me too, I test the runner in the InferenceWrapper class and got about 380 ms per frame on gtx 1060.

nagadit commented 4 years ago

@egorzakharov need your intervention

egorzakharov commented 4 years ago

Hi! @nagadit

First of all, in this pipeline, you are evaluating the full model (initialization + inference) and external cropping function, not just inference. Cropping function consists of a face detector and landmarks detector (face-alignment library), which can be optimized further, we just did not do it in this repository. For a real-time application, you need to train the model from scratch using a face and landmarks detector that works in real-time (like Google's MediaPipe). Note that this issue is common across all methods which utilize keypoints as their pose representation.

You can crop data externally via the infer.InferenceWrapper.preprocess_data function and call forward with crop_data=False, then you will only measure initialization + inference speed. Moreover, I would recommend running one basic optimization, which I simply forgot to include in this inference example: module.apply(runner.utils.remove_spectral_norm).

Lastly, if you want to measure the speed of the inference generator only, then you need to perform a forward pass of only this network. as mentioned in our article. We additionally speed it up by calling the runner.utils.prepare_for_mobile_inference function after it has been initialized with adaptive parameters.

Hope this helps!

egorzakharov commented 4 years ago

This closely follows a pipeline that we have developed in our mobile application: a computationally heavy initialization part runs separately in the PyTorch Mobile framework, and then we optimize a personalized inference generator by converting it into ONNX followed by SNPE for the real-time frame-by-frame inference.

By the way, I have pushed the remove_spectral_norm hack into master.

ak9250 commented 4 years ago

@egorzakharov could you also share the onnx weights?

nagadit commented 4 years ago

@egorzakharov

Thank you very much for such an informative answer, I will try to do something about it.

egorzakharov commented 4 years ago

@ak9250 I will ask my colleagues for approval, but I believe the conversion to ONNX was very simple, ONNX -> SNPE was much trickier.

noyami2033 commented 4 years ago

@egorzakharov Thank you very much! I tested use remove_spectral_norm function and do speed up the process from 380ms to 260ms. But when using "runner.apply(rn_utils.prepare_for_mobile_inference)", I got some problems below:

in prepare_for_mobile_inference gamma = module.weight.data.squeeze().detach().clone() AttributeError: 'NoneType' object has no attribute 'data'

in prepare_for_mobile_inference mod.weight.data = module.weight.data[0].detach().clone() torch.nn.modules.module.ModuleAttributeError: 'AdaptiveConv2d' object has no attribute 'weight'

could you please also push the prepare_for_mobile_inference to master or give some suggestions?

noyami2033 commented 4 years ago

I tested G_inf on Nvidia 1060, got about 15ms per frame. Thanks for the advice.