vignetteapp / vignette

The open source VTuber software. ❤
https://www.vignetteapp.org
Other
521 stars 33 forks source link

Lag Compensation for Prediction Data to Live2D #46

Closed sr229 closed 3 years ago

sr229 commented 4 years ago

As part of #28, we have discussed how raw data would result on jittery rough data, even if the neural network used is theoretically as precise as a human eye predicting the facial movements of the subject. To compensate for jittery input, we will implement a sort of lag-compensation algorithm.

Background

John Carmack's work with Latency Mitigation for Virtual Reality Devices (source) explains that the physical movement from the user's head up to the eyes is critical to the experience. While the document is designed mainly for virtual reality, one can argue the methodologies used to provide a seamless experience for virtual reality can be applied for a face tracking application, as face tracking like HMDs, are also very demanding "human-in-the-loop" interfaces.

Byeong-Doo Choi, et al.'s work with frame interpolation using a novel algorithm for motion prediction would enhance a target video's temporal resolution, by using Adaptive OBMC. Such frame interpolation techniques according to the paper has been proven to give better results than the current algorithms used for frame interpolation in the market.

Strategy

As stated on the background, there are many strategies we can perform lag compensation for such raw jittery input from prediction data from the neural network, it is limited to these two strategies:

Frame Interpolation by Motion Prediction

Byeong Doo-Choi, et al. achieves frame interpolation by the following:

First, we propose the bilateral motion estimation scheme to obtain the motion field of an interpolated frame without yielding the hole and overlapping problems. Then, we partition a frame into several object regions by clustering motion vectors. We apply the variable-size block MC (VS-BMC) algorithm to object boundaries in order to reconstruct edge information with a higher quality. Finally, we use the adaptive overlapped block MC (OBMC), which adjusts the coefficients of overlapped windows based on the reliabilities of neighboring motion vectors. The adaptive OBMC (AOBMC) can overcome the limitations of the conventional OBMC, such as over-smoothing and poor de-blocking

According to their experiments, such method would produce better image quality for the interpolated frames, which is helpful for prediction in our neural network, however it comes with a cost of having to process the video at runtime, as the experiment is only done on pre-rendered video frames already.

View Bypass/Time Warping

John Carmack's work with reducing input latency for VR HMDs suggests a multitude of methods, one of them is View Bypass - a method achieved by getting a newer sampling of the input.

To achieve this, the input should be sampled once but can be used by both the simulation and the rendering task, thus reducing the latency for such. However, the input and the game thread must run in parallel and the programmer must be careful not to reference the game state otherwise it would cause a race condition.

Another method mentioned by Carmack is Time Warping, which he states that:

After drawing a frame with the best information at your disposal, possibly with bypassed view parameters, instead of displaying it directly, fetch the latest user input, generate updated view parameters, and calculate a transformation that warps the rendered image into a position that approximates where it would be with the updated parameters. Using that transform, warp the rendered image into an updated form on screen that reflects the new input. If there are two dimensional overlays present on the screen that need to remain fixed, they must be drawn or composited in after the warp operation, to prevent them from incorrectly moving as the view parameters change.

There are different methods of warping which is forward warping and reverse warping, and such warping methods can be used along with View Bypassing. However, the increased complexity for lag compensation of doing input with the main loop concurrently is possible as the input loop will be independent of the game state entirely.

Conclusion

Such strategies mentioned would allow us to have smoother experience, however, based on my personal analysis, I found that Carmack's solutions would be more feasible for a project of our scale. We simply don't have the team and the technical resources to do from-camera video interpolation as it would be computationally expensive to be implemented with minimal overhead.

gmlwns2000 commented 4 years ago

Do we need low latency?

I think we are making VTuber streaming stuff like that. If so, it is good to use kalman filters, and buffered inputs for more paralleling face tracking for stabilization.

However for frame interpolation, it seems definatly needed, because it may make possible to stream 4k60 of VTuber.

sr229 commented 4 years ago

Do we need low latency?

I think we are making VTuber streaming stuff like that. If so, it is good to use kalman filters, and buffered inputs for more paralleling face tracking for stabilization.

However for frame interpolation, it seems definatly needed, because it may make possible to stream 4k60 of VTuber.

We definitely need these as landmark input by itself is actually very raw and rough by its own. @LeNitrous givr kalman filters a look?

sr229 commented 4 years ago

I gave kalman filtering a look, and judging from how it was explained:

Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, containing statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more accurate than those based on a single measurement alone, by estimating a joint probability distribution over the variables for each timeframe

We definitely need this, though I'm not sure if this is input level, frame level, or at the neural network level.

sr229 commented 4 years ago

Re-assigned to AB2

LeNitrous commented 3 years ago

Closing as direction for this project has changed towards an abstract and modular approach to puppet renderers.