qarmin / czkawka

Multi functional app to find duplicates, empty folders, similar images etc.
Other
19.37k stars 634 forks source link

Similar video not working as before #968

Open antonhagg opened 1 year ago

antonhagg commented 1 year ago

I have tested the Similar video function in 5.0.2 with sucess, but now I just get the error message "failed to hash file, reason too short:", Not sure what causes this, but if I just have 2 files that I know are the same, it scans and finds the. Could it be that it stops looking when it hits the first file that is to short?

qarmin commented 1 year ago

This is strange, because this module was not changed recently

Error should be always same for videos < 30s

antonhagg commented 1 year ago

I have done some more tests and there seems to be some issue in 5.0.2 as well. I prepped a folder 2 files that are the same, then added 10 more files that where of different lenght but unique and suddenly it doesnt find the first two files that I know is the same any more. It gives me no error or nothing. So it seems there might be some issue here as well. The videos are from a GoPro hero 11 if that helps.

The-Istar commented 1 year ago

I am seeing similar behavior. The weird thing is is that I only see it on my main machine with both a locally installed client as well as the Flatpak installation. Both return to short errors. However, when I check the same files on my server where I run a docker installation of Czkawka it seems to work.

Both the docker installation and the flatpak are on the latest version. And I checked but both installations use the same ffmpeg version as well (ffmpeg version 5.1.3)

So I am not sure where the difference or why it does not work on one but it does on the other.

antonhagg commented 1 year ago

Hmm strange, could it be something with permisions?

Farmadupe commented 1 year ago

I have not done any actual investigation, but I suspect that if different binaries of ffmpeg are used, then ffmpeg may return different frames to vid_dup_finder_lib even when called with the same arguments. Sometimes the hashing algorithm will produce very different hashes for similar-but-not-identical frames. This an inherant problem with hashing, (but can be minimized with better algorithms)

Sometimes a different version of ffmpeg could fail to return the correct number of frames (especially if the video is 30-32 seconds long). This will cause an error.

If Czkawka does not bundle ffmpeg binaries, this may occur if some ancient LTS distro is used for docker builds, as some ancient version of ffmpeg may be used.

There is a nearly-finished update to vid_dup_finder_lib with relevant improvements:

I probably do not have time to polish this lib, so I may upload to github in an 'as-is' state.

The-Istar commented 1 year ago

Thanks for the response Farmadupe.

Like I said, I checked the ffmpeg versions used both locally and in the container and both where the same version. Happy to check further if you need me to check something specific.

Glad to hear a new version of the lib is coming. Can´t wait to test it. :)

biggestsonicfan commented 1 year ago

Also excited for this update. Been using the AppImage for a while but just migrated to flatpak and I too am getting the "too short" errors for videos that are 32 min long.

The-Istar commented 1 year ago

Inspired by this I tried to test with the appimage and surprising that does find the duplicates properly. So what ever is different with the appimage should give us a clue on what is not working.

Th3EvilGod commented 11 months ago

I have not done any actual investigation, but I suspect that if different binaries of ffmpeg are used, then ffmpeg may return different frames to vid_dup_finder_lib even when called with the same arguments. Sometimes the hashing algorithm will produce very different hashes for similar-but-not-identical frames. This an inherant problem with hashing, (but can be minimized with better algorithms)

Sometimes a different version of ffmpeg could fail to return the correct number of frames (especially if the video is 30-32 seconds long). This will cause an error.

If Czkawka does not bundle ffmpeg binaries, this may occur if some ancient LTS distro is used for docker builds, as some ancient version of ffmpeg may be used.

There is a nearly-finished update to vid_dup_finder_lib with relevant improvements:

  • no >30 second requirement (video must have at least 64 frames)
  • ability to skip a certain amount (to avoid hashing some black frames at the start of a video, or an intro sequence)
  • Even if ffmpeg returns slightly different frames, hashes are more likely to match.

I probably do not have time to polish this lib, so I may upload to github in an 'as-is' state.

Any news on when you can release the new update? Thanks in advance for your amazing work!

biggestsonicfan commented 11 months ago

You didn't need to quote the whole reply for that, and @Farmadupe has had no activity on github since January so you may be out of luck.

Th3EvilGod commented 11 months ago

thanks for letting me know. bummer. i hope we can find an alternative for videos <30 seconds.

ShareBugreports commented 10 months ago

I created a ticket as well. Investigated it a bit. It just seems a random issue. Did multiple runs with different results. So i expect a multithreaded issue, of some variable is not reset in a loop or something.