Closed theophanemayaud closed 1 year ago
Ideas :
Conclusion: There's a question of performance, which is faster between OS finding of videos than matching to cache, or database selection with specific path than loading. I think the latter is faster, DB operation with LIKE operator seems feasible and fast. Also I think cache is mostly used within the confine of a specific folder, not storing data across a very long time. People should delete the cache from time to time, and mainly use it to resume comparisons of a large dataset of specific videos in a given folder. Therefore, there would be many videos in the folder, with only a subset cached successfully. The second approach should thus be faster in general.
Steps :
[x] create radio selection for using cache
[x] adapt yes/no caches to behave like previous behavior
skip scanning of videos instead only using cached video locations but make sure to still only compare videos within user chosen folders/path :
MainWindow::on_findDuplicates_clicked
findVideos(dir)
. But we should skip this.processVideos()
, via Video::run, meaning each's
getMetadata(filename)
metadata is loaded from cache or retrieved with QFileInfo and ffmpeg (getMetadata(filename)
). NB Db doesn't cache the modified date.takeScreenCaptures(cache)
screen captures are loaded from cache or taken with ffmpeg. Nb in cutEnds, screen captures are checked to see if they're all black.[x] compare only cached videos
[x] check for bugs 🐜 🐞 🐛 🪲
closed by d922af2db584195f8e4a05295b72e3361c61e186
The cache is for video thumbnails and metadata. When scanning again, it will check what videos are on disk, and if it sees a match in the cache it will skip thumbnails and metadata retrieval which speeds up considerably the scan. But it still scans. I'm writing down the idea you're implying, of only using cached data. That would indeed be great in the case of very very large number of videos. There are a few hurdles because of the current implementation. If only cached data is used, we shouldn't be able to change the thumbnails type but only keep the cached ones, but otherwise it can be simply done with a "Only use cached data" switch, which would skip the scan phase and directly go to the comparison phase. Thanks for the great idea !! It'll also help me in my workflows a lot ! I don't have 1TB but still > 200GB which does take ~10 mins to rescan even when cached. Without cache it's more like 30 mins to an hour, but your method could even reduce further.
Originally posted by @theophanemayaud in https://github.com/theophanemayaud/video-simili-duplicate-cleaner/discussions/80#discussioncomment-3750178