semchan / UPST-NeRF

code for UPST-NeRF: Universal Photorealistic Style Transfer of Neural Radiance Fields for 3D Scene
https://semchan.github.io/UPST_NeRF/
68 stars 7 forks source link

How to get the consistency measurement? #2

Open kigane opened 1 year ago

kigane commented 1 year ago

E(Oi, Oj) = LPIPS(Oi, Mi,j, Wi,j(Oj)), how to get the mask and how to apply it to lpips?

semchan commented 1 year ago

E(Oi, Oj) = LPIPS(Oi, Mi,j, Wi,j(Oj)), how to get the mask and how to apply it to lpips?

We totally followed the link (https://github.com/phoenix104104/fast_blind_video_consistency) to calculate lpips, please find the detail in there. thanks

kigane commented 1 year ago

I have read their code. in evaluate_LPIPS.py they use LPIPS to get the perceptual distance between processed image P and their model output O. But, P and O are the same frame of the video. in evaludate_WarpError.py, they use optical flow predicted by FlowNet2 betweent frame1 and frame2 to warp frame2 to frame1, then calculate the L2 distance on non-occlude pixels. They do not use masks on LPIPS metric. As far as I know, LPIPS use vgg/squeeze/alex net to extract feature maps of differen layers of two input images, then calculate the L2 distance. So I am really confused about the mask Mi,j used in the equation. Could you please explain this detail more clearly? thank you.

koolo233 commented 1 year ago

@kigane Have you solved this problem? It strange that none of StylizedNeRF, StyleRF, Learning to Stylize Novel Views, etc. provide a calculation method of consistency.

zAuk000 commented 1 year ago

@kigane Have you solved this problem? It strange that none of StylizedNeRF, StyleRF, Learning to Stylize Novel Views, etc. provide a calculation method of consistency.

I have the same doubt as well. Why hasn't the calculation method for quantitative indicators been provided, even though it's the only evaluation criterion?

zAuk000 commented 1 year ago

I have read their code. in evaluate_LPIPS.py they use LPIPS to get the perceptual distance between processed image P and their model output O. But, P and O are the same frame of the video. in evaludate_WarpError.py, they use optical flow predicted by FlowNet2 betweent frame1 and frame2 to warp frame2 to frame1, then calculate the L2 distance on non-occlude pixels. They do not use masks on LPIPS metric. As far as I know, LPIPS use vgg/squeeze/alex net to extract feature maps of differen layers of two input images, then calculate the L2 distance. So I am really confused about the mask Mi,j used in the equation. Could you please explain this detail more clearly? thank you.

Have you tried testing the generated results using the code from "warperror.py"? If so, are the results close to those in the paper?