Closed orrblue closed 1 year ago
Thank you, @orrblue, for your attention to details while reading our paper!
Not sure if this is the right place to ask, but the paper mentions supplementary material that provides "detailed comparisons of the framework’s requirements against competitive methods". Would you be able to point me to it? Sounds like very valuable information!
Any questions are welcomed! Sorry we forgot to upload it along with the paper itself. Here I attached this supp: NeRFPlayer_supp.pdf. But it may not be that helpful to you, as the domain has changed a lot -- lots of great methods like KPlanes HyperReels are proposed recently.
Additionally, I'm not sure if it was intentional, but the paper website doesn't link to this repository -- only to nerfstudio. It just makes it a little difficult to navigate to this repo.
Thank you so much! I should update it after code refractory. I've been playing with diffusion models lately — they're so much fun that I totally forgot to keep up with the nerf stuff 😆
Thank you for your kind words, and for providing the supplementary materials. If I may ask another question...since you mentioned k-planes: I'm trying to learn about novel view synthesis methods in order to incorporate them into my research. I haven't considered K-planes in depth yet, but probably should-- I've just tried NerfPlayer for dynamic scenes since it supports monocular views (unlike HyperReel).
My application would need to provide dynamic view synthesis and would benefit from monocular RGB input views -- I tried Instant NGP and found it quite convenient to simply run COLMAP on RGB video without needing multi-view or depth (though COLMAP could provide depth, I suppose). I would also need real-time rendering, and support only simple rigid transformations in the scene. Wondering if K-planes or another method would be recommended?
Also, I found HOSNerf to be quite intriguing as it supports full 360 degree views (for humans). I wonder if you might know of more general methods or know if HOSNerf could be tweaked to support other objects that have known priors.
My compute is limited by having one GPU: RTX A4500 (~20GB) (comparable to RTX 3080)
Thank you again for your help and suggestions!
To clarify, I prefer monocular input views, but I could probably make do with multi-view -- as long as it's 2-4 views max. I definitely can't rig 15-50 cameras together for my research, like how many of the multi-view datasets were made.
(These are just my personal thoughts, and it's possible that others may strongly disagree.)
In my opinion, the current methods for monocular setting are not very effective. They only seem to work well for a limited number of real-life scenarios. Moreover, the datasets commonly used are often biased and may not accurately represent true monocular conditions. If you're interested in delving deeper into this topic, I recommend checking out the details provided by dycheck.
As for the question of how many cameras will work in this context, I believe no study has been conducted to address this issue. (Finding answers to this question could have a significant impact on the field!) Additionally, apart from using more cameras, incorporating depth input, similar to the approach seen in the dycheck dataset, could prove to be helpful. Another promising direction to explore is video depth, and DynamicStereo (https://dynamic-stereo.github.io/) seems to be a noteworthy resource in this area.
Not sure if this is the right place to ask, but the paper mentions supplementary material that provides "detailed comparisons of the framework’s requirements against competitive methods". Would you be able to point me to it? Sounds like very valuable information!
Additionally, I'm not sure if it was intentional, but the paper website doesn't link to this repository -- only to nerfstudio. It just makes it a little difficult to navigate to this repo.
Thank you!