haochen-rye / HNeRV

Official Pytorch implementation for HNeRV: a hybrid video neural representation (CVPR 2023)
https://haochen-rye.github.io/HNeRV/
109 stars 14 forks source link

The difference between HNeRV and AutoEncoder #1

Closed dawnlh closed 1 year ago

dawnlh commented 1 year ago

Hi~ @haochen-rye Thanks for sharing your nice work. After reading the paper, I find that the network structure and design scheme of HNeRV seems to be similar to an auto-encoder (AE). Although original AEs are mainly used for supervised/unsupervised learning, appling it to data fitting/compression is also a direct & valid idea. For classical NeRF (or NeRV from your another work), one could use a coordinate to query the corresponding pixel value or frame values. But for HNeRV, the input is actually the video/frame itself rather than the coordinate, which means one cannot query designed data from an explicit coordinate and instead he must has the image embedding from the encoder beforehand to query the image.

I think this should be the main difference between HNeRV & conventional NeRF, NeRV, and ENeRV. So have I misunderstood something? And what's your opinion about the difference.

BTW, I wonder how long it takes to train the NeRV & HNerV. I didn't find the absolute time in the paper. Thanks.

haochen-rye commented 1 year ago

HNeRV vs NeRV (or NeRF etc.):

HNeRV VS auto-encoder:

dawnlh commented 1 year ago

Got it. Then what about the training time? The paper gives the relative time consumption result, but I wonder how long it takes to finish the data fitting.

haochen-rye commented 1 year ago

We conduct all experiments in Pytorch with RTX2080ti GPUs, where it takes around 8s per epoch to train a 130 frame video of size 640 × 1280.

dawnlh commented 1 year ago

OK, thanks for your prompt reply.