qarmin / czkawka

Multi functional app to find duplicates, empty folders, similar images etc.
Other
19.37k stars 634 forks source link

Add an option to auto select the lowest quality items in Similar Video results #868

Open newadventure079 opened 1 year ago

newadventure079 commented 1 year ago

The Similar Video screens lists the items in groups that are matches. It's not possible to easily see which file should be deleted. The user has to manually double-click on each item and open it and manually compare.

808 should be implemented so we can easily see basic characteristics of the video files; resolution, bit rate, file size, etc.

Taking it a step further, an "auto select" button should be added that will select the check box of the lower quality items for each group. This will make it a lot faster to easily clean up the lower quality files and retain the best file

Farmadupe commented 1 year ago

From my own experience while creating vid_dup_finder_lib: It's actually quite hard to automatically guess which duplicated video is the 'best' to keep. You can make a copy of an original 'low quality' video with 'high quality' settings(e.g increased resolution, bitrate etc). A 'guesser' feature would look at the metadata and tell the user to keep the copy and delete the original.

I think what Czkawka would need is a utility that chooses the highest quality video by looking at the pixels and frames of the videos. This is called Objective video quality estimation. However to use such tools, you already have to know which video is the original/best.

According to wikipedia tools suitable for Czkawka are called no-reference video quality estimators.. There are few such tools and I have not experimented with them. I ahven't fully read any research papers, but I am worried that they will simply tell you that white noise is 'low quality' and non-moving shape is 'high quality'.

In the past I tried to guess which video was the original by JPEG-encoding some frames and choosing the video with the highest JPEG file size. I think it worked better than random, but I think JPEG is not very good at compressing video artefacts, so with for example cartoons, it always thought that poor quality copies were actually the original.

JoshuaVandaele commented 8 months ago

Should I create a similar issue for images, or would the same limitations apply?