liuziwei7 / voxel-flow

Video Frame Synthesis using Deep Voxel Flow (ICCV 2017 Oral)
https://liuziwei7.github.io/projects/VoxelFlow
215 stars 50 forks source link

extrapolation #5

Open huliyu1203 opened 6 years ago

huliyu1203 commented 6 years ago

Thanks for sharing.And I want to know how to modify the code to train the extrapolation‘s model?

hangg7 commented 6 years ago

Good question. It's little confusing that in paper authors just claim the model could do extrapolation w/o listing any detailed deliberation. I don't think the modification from the interpolation design would be trivial at all.

I understand the paper has been published for a while now and authors might be busy with new projects. But current repo is just not enough for reproducibility.

Please response and try to add missing parts :)

liuziwei7 commented 6 years ago

Actually regardless of the conditioning inputs (e.g. two frames are used in the paper), the learned voxel flow can be applied to any single frame to synthesize a warped frame, which could be either interpolation or extrapolation/prediction.

The "recurrent" extension of voxel flow for extrapolation/prediction is straightforward. Just predict one frame using voxel flow at one time step. And the predicted frame is used as one of the inputs for the next time step.

The "non-recurrent" extension of voxel flow to handle, e.g. 2-frame-input, 2-frame-output extrapolation/prediction setting, would be: (1) Conditioned on the two inputs, two different voxel flow fields are learned. And then the two voxel flow fields are applied to the two inputs respectively to obtain two warped frames; (2) Next, we learn a fusion mask to combine the two warped frames to generate the current prediction; (3) The similar procedure can be applied to other output frames. This procedure can also be simplified by exploiting the time interval relations between frames, such that voxel flow fields can be directly re-scaled.

zzwei1 commented 3 years ago

Good question. It's little confusing that in paper authors just claim the model could do extrapolation w/o listing any detailed deliberation. I don't think the modification from the interpolation design would be trivial at all.

I understand the paper has been published for a while now and authors might be busy with new projects. But current repo is just not enough for reproducibility.

Please response and try to add missing parts :)

Have you understood how to predict multi-frames? What confused me is that, how should I make multi-frame prediction using Voxel-Flow? When I have the first 5 frames as input, I concat them along time dimension, and then followed the precedure in Voxel-Flow to generate the fiest predicted frame. I understand so far. But what should I do next to generate the second, third frames? Should I concat the first predicted frame with the 5 ground truth frames to generate a new input? I don't think it's correct because the input channels will be changed if I do so.