Open avuserow opened 10 months ago
Thanks, I'll take a look. I think that's entirely doable.
Having looked at this, I just realised that infact the result of .fingerprint
is the encoded fingerprint encoded as a string. So whilst the encode and decode methods might be useful, the ability to get the raw fingerprint directly as 32 bit ints may be more useful for the soundalike thing. Also if you are doing bulk comparisons, the hash function may be useful to discard completely dissimilar files before doing anything more intensive.
I haven't worked with this code in a bit, but I remember not finding hash very useful for comparison. I did write a compare algorithm in C based on a few other projects, and it does alright. I can give you a C implementation if that's something you want to try out.
I did find a project that implemented a way to do a quick comparison across a very large amount of files. It's written in golang and uses a hash table in memory, and I adapted this approach to using a SQLite database. If that's of any interest, I can provide a link to that project. (That seems a bit out of scope for your module but maybe it will be interesting to you in some other area.)
On Sun, Mar 10, 2024, 16:26 Jonathan Stowe @.***> wrote:
Having looked at this, I just realised that infact the result of .fingerprint is the encoded fingerprint encoded as a string. So whilst the encode and decode methods might be useful, the ability to get the raw fingerprint directly as 32 bit ints may be more useful for the soundalike thing. Also if you are doing bulk comparisons, the hash function may be useful to discard completely dissimilar files before doing anything more intensive.
— Reply to this email directly, view it on GitHub https://github.com/jonathanstowe/Audio-Fingerprint-Chromaprint/issues/3#issuecomment-1987367808, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAUCSUD6QPK3QNOGA3TY33YXTFZ7AVCNFSM6AAAAABBEY7PYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXGM3DOOBQHA . You are receiving this because you authored the thread.Message ID: @.*** com>
I've added encode-fingerprint
, decode-fingerprint
methods as well as enable access to the raw calculated fingerprint.
I think it's probably add any additional features not provided by libchromaprint
as separate modules.
These two functions would let me store compressed fingerprints in the database and the decompress for comparison. These are very effective when compressing a fingerprint, especially if you can store the raw (non-base64) version, something like a 4:1 compression ratio.
I would also be interested in the comparison functions, but I read that the chromaprint comparison API is not fully implemented and not exposed in the C headers. I didn't have good luck when trying to use it either.
I do have a simple comparison algorithm that I've seen in a few other projects that I could contribute. It's written in C and based on algorithms that I've seen in other projects (https://codeberg.org/derat/soundalike for one) so maybe it's better off as a secondary module so this one does not require a C compiler :shrug: