jitwxs / 163MusicLyrics

Windows 云音乐歌词获取【网易云、QQ音乐】
Apache License 2.0
1.91k stars 101 forks source link

Extra/Duplicate Original Lyric Lines in Translated Lyrics #115

Closed daoxi closed 1 year ago

daoxi commented 2 years ago

So, as of now there are 3 options to deal with untranslated lyric lines, when I choose the last option (i.e. "填充原文") and the "仅显示译文" lyrics mode, something strange happens:

For this song for example, there are 3 extra/duplicate original lyric lines in translated lyrics, at [01:12.78], [01:19.02], and [03:18.90] respectively, in these 3 timestamps the translated lyric already exists (although the time is off by 0.01 second ([00:00.01]) for unknown reason) and thus the original lyric shouldn't be used to fill in these 3 timestamps.

This issue also seems to happen in other lyrics mode too. Another test example would be this song, in which there're more extra/duplicate original lyric lines in translated lyrics.

Version tested: v4.8

jitwxs commented 2 years ago

The reason for the difference in lyric timestamp between the origin and the translated is that the server source file is not match, that's what you're say about [00:00.01].

If the timestamp is difference, the program can't match the original and translation lyrics in pairs. This scenario not easy to deal it.

daoxi commented 2 years ago

The reason for the difference in lyric timestamp between the origin and the translated is that the server source file is not match, that's what you're say about [00:00.01].

If the timestamp is difference, the program can't match the original and translation lyrics in pairs. This scenario not easy to deal it.

I see, something is probably wrong on the server side of the lyrics sources (which is definitely the fault of QQ/NetEase Music).

Would it be possible to solve this issue by trying to match the original and translated lyric timestamps within an user-defined tiny interval (e.g. [00:00.01], or [00:00.02], or [00:00.05], etc.)? That will make the original/translated lyrics matching more tolerant of inaccuracy.

For example, if the interval is [00:00.02], then the original lyric at [00:14.47] will be matched with [00:14.45]/[00:14.46]/[00:14.47]/[00:14.48]/[00:14.49] translated lyric (there's typically (99%) only one match, because the lyrics' timestamps are typically more than [00:00.02] apart).

I understand this isn't a perfect solution, but I can't think of a better way to deal with this issue.

jitwxs commented 1 year ago

see use guide: https://github.com/jitwxs/163MusicLyrics/wiki/4.x-Guide#424-%E8%AF%91%E6%96%87%E5%8C%B9%E9%85%8D%E7%B2%BE%E5%BA%A6