Open danie1ll opened 2 months ago
A good metric could be frame blurriness. Here's an example from Stack Overflow using the Laplacian to quantify image blurriness and select the "least blurry image" every second. https://stackoverflow.com/questions/65949172/how-to-extract-clear-frames-from-video-file
@LucasArmand this looks very promising! Could also be combined with accumulated visual change between frames, so we not only select 'least blurry image' every second but 'the most important frames, which are different enough and are not blurry'
@danie1ll I wonder how far we can go in optimizing the ns-process-data
process? When our source is a video, we usually only select a low percentage of the total frames, so there are many unique combinations of frames to select for training. Some of these combinations would certainly result in higher quality trained scenes than others. Heuristics like "image blurriness" and "accumulated visual change" will probably give us a better combination of frames for training, but is it possible to find the best combination?
The scripts at https://github.com/SharkWipf/nerf_dataset_preprocessing_helper could perhaps be integrated into ns-process-data video?
No matter the solution decided for "smarter" processing, I think there is room for a fixed seed option. I am working on a PR to implement it.
From my understanding, right now calling
ns-process-data video
just randomly samples--num-frames-target
images from the video. This is suboptimal because of two reasons: