kimono-koans / dano

A hashdeep/md5tree (but much more) for media files
https://crates.io/crates/dano
Mozilla Public License 2.0
139 stars 0 forks source link

[feature request]: add xxhash hash algorithm #7

Open sandersantema opened 1 year ago

sandersantema commented 1 year ago

According to the benchmarks of xxhash themselves (which can be found on the project github page https://github.com/Cyan4973/xxHash) it's a very fast hashing algorithm. If this speed would translate in dano performance I think it would be nice to have this as an option.

kimono-koans commented 1 year ago

Thank you for filing an issue.

dano is currently simply a frontend for ffmpeg, which means, right now, it only supports what ffmpeg supports. Could dano support xxhash? Yes. Would that take a great deal of effort? Yes. And, unfortunately, more effort than I want to expend on it right now. I'm currently just trying to get it to a place where it works really well, with decent semantics, most the time (this app is turning out to be a pain in the balls to write!).

Another option: You could also ask ffmpeg to support xxhash as a hashing format?

This could be a great project for someone else though! I'll leave open to see if anyone else wants to try their hand.

Thanks!

sandersantema commented 1 year ago

Ah I see, I was looking trough the code and couldn't find how the hashing was implemented so I thought it might just be a call to a binary, which I suppose would have been easier to implement. Although I guessed my understanding of the code could be wrong, not knowing the first thing about rust, hence the feature request.

If it would take more than a trivial amount of effort I don't think it would be worth it given that murmur3 is quite fast anyways and I suppose hashing isn't done a lot after hashing the first batch of files.

Anyways thanks for the effort of making this frontend, I completely missed ffmpeg was able to do this. It will come in handy since I've got music files consumed by multiple programs spread across different computers, this will make it easy to identify them.

This might not be the right place to ask, but do you know why the output hash differs between these two?

ffmpeg -i baz.mp3 -f hash -hash murmur3 -hide_banner -loglevel warning -
murmur3=e78028cb329d75b3077a033d592d4be5

dano --dry-run --write baz.mp3
murmur3=21ada31d036615f9599eeeb80019f0f2 : "baz.mp3"

I suppose it's just a difference in the way ffmpeg is called but if not I can open a bug report.

kimono-koans commented 1 year ago

This is a bug/regression that was fixed with: https://github.com/kimono-koans/dano/releases/tag/0.4.8

This issue is/was the default selection of streams, when you don't select audio or video only. I thought -map 0 was the default selection, many files will verify with -map 0, but it is not always the default. Unfortunately, there is no easy way to fix. I've reverted back to the previous behavior of the ffmpeg default stream selection. I recommend recreating those hashes. And confirming those hashes by hand if you're truly paranoid. Sorry.

But you're right -- it is perhaps worth a major/middle version change, or a note in the README, and a link to this issue?

Anyways thanks for the effort of making this frontend, I completely missed ffmpeg was able to do this. It will come in handy since I've got music files consumed by multiple programs spread across different computers, this will make it easy to identify them.

I'm glad it might be useful for you. I'll try to keep the breakages to a minimum/none in the future! There is an entire apparatus for versioning, but it failed here obviously. I'll be more careful.

sandersantema commented 1 year ago

No worries, I haven't used dano yet :) I'll have to create some infrastructure around it first, which I think should be able to handle changes in the checksum whenever they occur. This I'll be able to deal with since the checksums shouldn't be meant in my infrastructure to check for errors but rather to have an identifier invariant to metadata changes. So even if the checksum changes two files which have the same audio stream but differ in metadata will have the same checksum.

Another approach to breaking changes might be to open a breaking changes issue, I think a nice example is the one for neovim: https://github.com/neovim/neovim/issues/14090#issuecomment-1257302774

Then if you're interested in staying up to date with breaking issues you simply subscribe to the issue.