tchamberlin / lyrics_search

CLI tool to generate playlists of songs whose lyrics contain a given query
MIT License
0 stars 0 forks source link

Enhanced duplicate song detection via lyric similarity comparisons #1

Open tchamberlin opened 3 years ago

tchamberlin commented 3 years ago

Currently the de-duplication mechanism is quite stupid and error-prone. It considers a given (artist, track) combination unique, giving no consideration to cover songs, etc.

A good example of this is the song "Emily" by Frank Sinatra. This has been covered many dozens of times, and the covers rarely identify themselves as such. Further, it has a very generic name, making it impossible to filter based on that. The only real way forward I see is to perform similarity comparisons between all combinations of all lyrics (perhaps limit to identical song names?), via fuzzywuzzy or similar.

I think we could then grab the publication date for all duplicates and use only the earliest one.

tchamberlin commented 3 years ago

This won't be possible until I can get access to the full MusixMatch API, or find a different API that gives full lyrics. The current API only gives me snippets, and not necessarily the same "slice" of the lyrics, so the comparisons are meaningless.

But maybe one day.