CVPR 2018 paper - Githubissues

nationalflag commented 6 years ago

Great works! Can't wait for the code of your new CVPR 2018 paper~

sniklaus commented 6 years ago

Thank you for your feedback. 🙂

It might take a while but we are planning to eventually release it just like we did with SepConv, stay tuned!

scimunk commented 6 years ago

this is one of my favorite AI, I was wondering how well it would perform with 3D renderer image, could this AI be used to reduce the number of frame to render and create intermediary frame for free ? would it also be possible to use the motion data generated by the renderer to produce even better interpolated frame using the AI ? could we also even use the depth or even ID map ?

I'm a software developer and I'd really like to get started into AI, how difficult would it be to use your new context aware AI ?

sorry for all the question, i'm very excited about the idea to halve the amount of time to render 3D animation !

sniklaus commented 6 years ago

Thank you for your interest in our work.

Yes, it is possible to render only a subset of the total frames and interpolate the missing ones, thus potentially obtaining the rendered video more quickly.
Yes, if the renderer provides per-pixel motion data then the synthesis network can leverage it more effectively than the optical flow estimation, effectively improving the interpolation results.
Yes, the synthesis network can be augmented with depth as well as semantic maps, allowing it to leverage this additional information to potentially improve the interpolation results.

Our newest paper is rather elaborate and requires custom layers, I would thus recommend waiting until we release our implementation.

Jasas9754 commented 6 years ago

CtxSyn code not ready yet? I'm really looking forward to it.

sniklaus commented 6 years ago

I am afraid that I have not received the approval to do so yet. I am, like you, very interested in getting the code out there as soon as possible. After all, I would like to see our work be put to good use.

dagf2101 commented 6 years ago

I guess that your new method and a new video card will be viable (and probably awesome) way of viewing/converting personal daily movies.

Do you think that your methods or something similar could also be used to estimate pixels in x/y (super-resolution) instead of estimating pixels in time (between frames) ?

Finally in a other subject are you aware of any interesting 3d reconstruction papers/projects using continuous image sequences ( videos of a single scene ) ?

Thanks for sharing your awesome researches.

sniklaus commented 6 years ago

Single-image super resolution is an interesting topic that is orthogonal to our work. However, video super resolution and video frame interpolation are related in that they both need to perform motion compensation. One could thus try to apply ideas from one area to the other.

Scene reconstruction, like single-image super resolution, is an interesting topic that is orthogonal to our work. Fortunately, once could make use COLMAP which is a magnificent open-source project. The assumption of a continuous image sequence is commonly made when finding corresponding points between images.

dlwtojd26 commented 6 years ago

It’s great work!, I also looking forward to your code. But waiting is not my favorite, I'm trying to implement your work on my own I have some questions when I've training this model. Could you answer me some quenstions? it would be great pleasure if you answer.

Should flow generating network(like pwc-net in your paper) be pre-trained? When I trained your model from scratch it didnt work well but when i used pretrained model. It seems to starting work.
I'm using open dataset (UCF-101) and resize to small size( 384 x 384). trained model works well(I think) when input size is smaller than 384x384 but larger than this size interpolated output seems not good. It seems predicting flow is failed and the output is blurred even if theres no movement. Should I rescale the output of the flow generating net? when I test different resolution inputs?

Thanks for reading I'm not native speaker in English so sentences could be weird.

sniklaus commented 6 years ago

My apologies for not not been able to publish our reference implementation yet, and thank you for being interested in our work.

The methodology described in the training section only applies to the synthesis network, we trained the optical flow in advance and treat it as an off-the-shelf component.
It sounds like you are changing the aspect ratio when resizing the training data. I would recommend not to change the aspect ratio and to select training data through cropping instead. Note that the flow as well as the synthesis model are fully-convolutional, which is why we do not need to resize the input during inference.

dlwtojd26 commented 6 years ago

@sniklaus Thanks for your fast reply.

Oh, that was my misunderstanding. I thought only context extractor network(vgg or resnet) needed to be trained and the estimating bidirectional flow is just used in front of synthesis net. (after training flownet, generating frame works better, but network generates frame before training flownet though)
Thanks for your advice. My question was not good.

frame

This is an example image to describe my problem when I test trained model using this image for example. network generates star between stars in frame 0.5(256x256), but It fails when I used high resolution frames. It generates star like frame 0.5(512x512) I don't understand this(I think result should be same).

Thanks for your fast reply again. and have a nice weekend. thanks

sampiet commented 6 years ago

@sniklaus, thanks again for your answer, your paper is very interesting, and I have one question about spatial warping block : is it the same warping block as PWC warping bloc ? Thanks your very much in advance for your answer .

sniklaus commented 6 years ago

@dlwtojd26 If you double the resolution then everything will be twice as far away in terms of pixels. My guess is that your network cannot handle the increased optical flow and hence produces the artifacts that you are experiencing.

@sampiet Thank you for being interest in our work. We perform forward-warping for the spatial warping block whereas PWC-Net relies on backward-warping.

sampiet commented 6 years ago

@sniklaus Thank you for your reply. In fact, I implemented the entire network using Tensorflow, so that's why PWC-Net block is based on (https://github.com/daigo0927/PWC-Net_tf) . So, I used the same PWC-Net warping block for your spatial warping (is it make sense for your ?). But the problem is that I have a lot of artifact in my images out of the network. I think the problem comes from this spatial warping block. So my question how to correct it based on your previous comment? Second, how to manage occlusions in the spatial warping block? Thank you in advance, sorry to take a lot of your time.

dlwtojd26 commented 6 years ago

@sniklaus thanks for reply. I will check the implementation. I’m using this implementation.(https://github.com/daigo0927/PWC-Net_tf)

B-Step62 commented 6 years ago

@sniklaus Great works! I'm sorry, I am trying to reproduce your work according to your paper, but I have a question about that how do you implement the spatial warping function which has three inputs(context map, image, optical flow) and two outputs(image, context map).Could you tell me what the warping function works inside? Thank you!

hengchuan commented 6 years ago

@sniklaus It's a great work about video frame interpolation! I'm trying to implement it by myself. While I am a little confused about some details of your spatial warping.

How to pad the patch when warping, zeros or other methods?
If the holes after warping need to be filled?
Could you provide more explanation about the process "measure the flow quality by checking the brightness constancy and discard contributions from flow vectors that significantly violate this constraint."?

I'm sorry if I missed something in the paper. Thank you for reading patiently. Have a good weekends!

tommm994 commented 5 years ago

Any news about the release date of your implementation ?

sniklaus commented 5 years ago

My apologies for still not being able to release the reference implementation of CtxSyn. Please note that I am eager to do so, but have unfortunately not gotten the approval yet.

l2009312042 commented 5 years ago

hello sniklaus, the implementation of ctxsyn is ok now?

l2009312042 commented 5 years ago

It’s great work!, I also looking forward to your code. But waiting is not my favorite, I'm trying to implement your work on my own @dlwtojd26 ，hello dlwtojd26，can you share your implement of the cyxsyn work？ it would be great，thanks ！

lhao0301 commented 5 years ago

Look forward to the code of cvpr 2018 paper.

asheroin commented 4 years ago

My apologies for still not being able to release the reference implementation of CtxSyn. Please note that I am eager to do so, but have unfortunately not gotten the approval yet.

Is it possible for you to give a more detailed description or reference here about the spatial warping in your paper? It still confused me for how to perform such a warping strategy even after reading your paper for many times.

sniklaus commented 4 years ago

For everyone interested in the forward warping, please consider taking a look at our recently released softmax splatting: https://github.com/sniklaus/softmax-splatting

asheroin commented 4 years ago

I'm trying to re-produce the results base on the DAIN's code, so I rewrite the CuPy interface into a CUDA coda as the extension for pytorch. I have check the forward and backward output with the image and optical flow file provided in your repo and is almost the same with CuPy version( the mean L1 loss may 1e-14?).

However, when replace the DAIN's flow projection part with Softsplat-average and give the optical flow net a 0.01 learning rate coef, the training become very unstable, which means that the PSNR drops to 10~ after just a few epochs. I wonder if there is some magic setting for the module like batch size or learning rate coef.

sniklaus commented 1 year ago

We just released the full inference code to our CVPR 2020 paper on softmax splatting which is the direct successor to our CVPR 2018 paper on the context-aware synthesis here: https://github.com/sniklaus/softmax-splatting

I am hence closing this issue for now, thanks everyone for your patience and my apologies you had to wait this long!

sniklaus / sepconv-slomo

CVPR 2018 paper #10