Open snejus opened 5 days ago
https://github.com/user-attachments/assets/598277b4-be15-4606-a51b-31646ae51c9e
That's the widget I mentioned, you can see how it depends on correct timestamps.
Your song album name is A State Of Trance Classics 14
.
The track ID 12429604
album name is A State of Trance: Classics, Volume 14
.
After normalizing, these became a state of trance classics 14
and a state of trance classics volume 14
. Because of the extra word volume
, LRCLIB doesn't consider the ID 12429604
a match. It then retry without album name, and you finally get the ID 1029622
.
The best way to resolve this in my opinion is resubmitting the correct lyrics for your song's metadata, for example with LRCGET:
Drowning (Avicii Remix)
in the LRCGET song list, then use the search lyrics feature for this songHow come does it match album Mirage (The Remixes) [Bonus Tracks Edition]
instead?
In addition to this, neither the track name nor the duration returned by the /get
endpoint match the query. Meanwhile, there is a record in the database that matches them exactly.
I was wondering how does the matching/comparison logic work internally; which fields are prioritised for the comparison?
How come does it match album Mirage (The Remixes) [Bonus Tracks Edition] instead?
It just retries one more time, ignoring the album name parameter. The ID 1029622
is probably the first record that matches the criteria. The duration 472
vs 473
seconds is considered good enough (±2 seconds).
I was aware of the duration comparison, but it's surprising to me that the difference in the trackName
is ignored, since my query is Drowning (Avicii Remix)
but it returns Drowning - Avicii Remix
.
Do you reckon we could prioritize exact matches here?
I would be more than happy to contribute!
Meanwhile, there is a record in the database that matches them exactly.
Unfortunately it is not really exact, because of the extra word "volume".
LRCLIB doesn't deduplicate the metadata, it is a very difficult matter that also requires contribution from community, and someone else does this better already (musicbrainz). Even if it could, there might be still minor syncing issue because of differences between CD rips and musics downloaded from digital/streaming platform.
I know it sucks, I hate the fact that there are usually multiple duplicated lyrics records for the same song in LRCLIB. But this issue is almost impossible to resolve.
I was aware of the duration comparison, but it's surprising to me that the difference in the trackName is ignored, since my query is
Drowning (Avicii Remix)
but it returnsDrowning - Avicii Remix
.
All of the strings are normalized (converting to lowercase, removing special characters and accents from accented character). In your case:
Drowning (Avicii Remix)
will become drowning avicii remix
Drowning - Avicii Remix
will become drowning avicii remix
So they are considered an exact match.
The part of the code that does the normalization is here:
I would be more than happy to contribute!
I'd love to have your contribution! But, we need to come to an agreement on the best way to address this first.
Meanwhile, there is a record in the database that matches them exactly.
Unfortunately it is not really exact, because of the extra word "volume".
LRCLIB doesn't deduplicate the metadata, it is a very difficult matter that also requires contribution from community, and someone else does this better already (musicbrainz). Even if it could, there might be still minor syncing issue because of differences between CD rips and musics downloaded from digital/streaming platform.
I know it sucks, I hate the fact that there are usually multiple duplicated lyrics records for the same song in LRCLIB. But this issue is almost impossible to resolve.
I was aware of the duration comparison, but it's surprising to me that the difference in the trackName is ignored, since my query is
Drowning (Avicii Remix)
but it returnsDrowning - Avicii Remix
.All of the strings are normalized (converting to lowercase, removing special characters and accents from accented character). In your case:
Drowning (Avicii Remix)
will becomedrowning avicii remix
Drowning - Avicii Remix
will becomedrowning avicii remix
So they are considered an exact match.
The part of the code that does the normalization is here:
This makes a lot of sense.
My last straw then is the duration
- given that normalised artist and track names are the same, could we prioritize results that match the duration exactly?
Hi @tranxuanthang, following your suggestion under https://github.com/beetbox/beets/pull/5406 I now attempt to find matching lyrics using the
/get
endpoint, and only perform/search
if they could not be found.Thanks to synced lyrics availableion this database, the other day I added lyrics display in my music widget, which depends on accurate timestamps, and I noticed that lyrics are out of sync for some tracks.
One of them was this track:
Artist: Armin van Buuren Feat. Laura V Title: Drowning (Avicii Remix) Album: A State Of Trance Classics 14 Duration: 473.0
I checked and found its lyrics were fetched using the
/get
endpoint:Note that I receive the same data when I provide the
duration
field:When I perform the search for the artist and title
I see the following data
The lyrics I'm after are under id
12429604
, and it seems like it should be the closest match to my query. I can provide more examples if required.The results ranking algorithm I added in https://github.com/beetbox/beets/pull/5406 picks up the correct lyrics.