vanilla-music / vanilla

Vanilla Music Player for Android
GNU General Public License v3.0
1.15k stars 291 forks source link

Incorrect ID3 parsing. #785

Open jonnyrobbie opened 6 years ago

jonnyrobbie commented 6 years ago

ID3v2.3 standard defines / - forward slash character as separator for multiple entries in TALB, TEXT and TPE1 frames. ID3v2.4 standard further improves the standard, defining the \x00 - null character separator for all T*** text frames as a multiple field per frame separator.

As far as I could tell, Vanilla Music completely ignores this part of the specification.

Steps to reproduce:

1) Using ID3v2.4, tag a song with foo\x00bar data in a TPE1 frame. 2) Import this file to a Vanilla Music

Expected result:

1) There are two artists: foo and bar in the media library.

Observed result:

1) There is only a single artist: foo bar.

Please, consider doing a proper implementation of ID3v2.3 and ID3v2.4.

adrian-bl commented 6 years ago

We actually do split tags by null-byte: https://github.com/vanilla-music/vanilla/blob/master/app/src/main/java/ch/blinkenlights/bastp/ID3v2File.java#L171

Are you sure you are running the latest version? (This was added around March) Could you provide a sample file? (Feel free to mail it to adrian@blinkenlights.ch)

jonnyrobbie commented 6 years ago

Hmm, you are correct that I did not have properly formatted ID3 and after proper formatting, I see a change, but not what I'd expect. Now, when I have foo\0bar, I see only foo and bar is completely ignored. My version is latest f-droid: 1.0.61 from May 8th. I will send two test files (first is Utada Hikaru's Michi from the album Fantome, where I have null separated romaji and kanji, and the second is Eminem's Stan, where I have null separated Eminem and Dido as a featuring artist.):

$ hexdump -C 01.\ 道.mp3 | head -n15; echo "---"; hexdump -C 03\ -\ Stan.mp3 | head
00000000  49 44 33 04 00 00 00 00  1f 76 54 50 55 42 00 00  |ID3......vTPUB..|
00000010  00 39 00 00 01 fe ff 00  4a 00 41 00 53 00 52 00  |.9......J.A.S.R.|
00000020  41 00 43 00 20 00 3b 00  20 00 56 00 69 00 72 00  |A.C. .;. .V.i.r.|
00000030  67 00 69 00 6e 00 20 00  4d 00 75 00 73 00 69 00  |g.i.n. .M.u.s.i.|
00000040  63 00 20 00 28 00 4a 00  50 00 4e 00 29 54 49 54  |c. .(.J.P.N.)TIT|
00000050  32 00 00 00 0a 00 00 03  4d 69 63 68 69 00 e5 8e  |2.......Michi...|
00000060  90 54 50 45 31 00 00 00  20 00 00 03 55 74 61 64  |.TPE1... ...Utad|
00000070  61 20 48 69 6b 61 72 75  00 e5 ae 87 e5 a4 9a e7  |a Hikaru........|
00000080  94 b0 e3 83 92 e3 82 ab  e3 83 ab 54 50 45 32 00  |...........TPE2.|
00000090  00 00 20 00 00 03 55 74  61 64 61 20 48 69 6b 61  |.. ...Utada Hika|
000000a0  72 75 00 e5 ae 87 e5 a4  9a e7 94 b0 e3 83 92 e3  |ru..............|
000000b0  82 ab e3 83 ab 54 41 4c  42 00 00 00 09 00 00 03  |.....TALB.......|
000000c0  46 61 6e 74 c3 b4 6d 65  54 50 4f 53 00 00 00 03  |Fant..meTPOS....|
000000d0  00 00 03 30 31 54 52 43  4b 00 00 00 06 00 00 03  |...01TRCK.......|
000000e0  30 31 2f 31 31 00 00 00  00 00 00 00 00 00 00 00  |01/11...........|
---
00000000  49 44 33 04 00 00 00 08  31 53 54 50 55 42 00 00  |ID3.....1STPUB..|
00000010  00 1f 00 00 00 41 66 74  65 72 6d 61 74 68 20 2f  |.....Aftermath /|
00000020  20 49 6e 74 65 72 73 63  6f 70 65 20 52 65 63 6f  | Interscope Reco|
00000030  72 64 73 54 49 54 32 00  00 00 05 00 00 03 53 74  |rdsTIT2.......St|
00000040  61 6e 54 50 45 31 00 00  00 0c 00 00 03 45 6d 69  |anTPE1.......Emi|
00000050  6e 65 6d 00 44 69 64 6f  54 50 45 32 00 00 00 07  |nem.DidoTPE2....|
00000060  00 00 03 45 6d 69 6e 65  6d 54 41 4c 42 00 00 00  |...EminemTALB...|
00000070  18 00 00 03 54 68 65 20  4d 61 72 73 68 61 6c 6c  |....The Marshall|
00000080  20 4d 61 74 68 65 72 73  20 4c 50 54 50 4f 53 00  | Mathers LPTPOS.|
00000090  00 00 06 00 00 03 30 31  2f 30 31 54 52 43 4b 00  |......01/01TRCK.|

What I'd expect is to have:

Eminem
Dido
Utada Hikaru
宇多田 ヒカル

in Artists, and

Stan
Michi
道

in Songs. Instead I only have Eminem, Utada Hikaru in artists and Stan, Michi in Songs. songs

artists

jonnyrobbie commented 6 years ago

I managed to hack the database at /data/data/ch.blinkenlights.android.vanilla/databases/media-library.db to include multiple songs.title entries with the same songs._id and multiple contributors._contributor entries with the same contributors._id. The Vanilla Music somewhat deals with it, with just some slight duplication issue. Example included (though careful about songs._path) media-library.db.tar.gz. So the question is how would I actually want the player to behave in case of multiple entries of the same song/artist/album and how would I want it to result the db lookup duplicates.

adrian-bl commented 6 years ago

So the splitting of tags by null byte seems to work - but we only add a single album / artist per file to the library - this is a known 'feature': The relevant code is here: https://github.com/vanilla-music/vanilla/blob/master/app/src/main/java/ch/blinkenlights/android/medialibrary/MediaScanner.java#L544

As you already noted, inserting multiple albums/artists per song does somewhat work (we fully support multiple genres per song): I suppose that some joins would produce strange results, showing a song with multiple albums twice (however: one could argue that this is actually correct). But note that song._id is calculated from the path name: having the same file in the database show up multiple times (where one of them would have an invalid _id), will break things for sure - in non fun ways.

jonnyrobbie commented 6 years ago

Regarding your second paragraph, I actually gave a pretty good thought to what I'd imagine I'd like the duplicity issue to be be handled. One one hand, you are right that when you'd have a song with two artists (like 'Utada Hikaru' and '宇多田 ヒカル'), I'd like both be shown in the 'Artist' tab in media library. So that would be a wanted duplicity. But the album frame of the song contains only one field: 'Fantôme'. So the duplicity of 'Utada Hikaru-Fantôme' and '宇多田 ヒカル-Fantôme' in the albums tab is an unwanted one. I believe that the breakage would not be that severe (just look at my example hacked database. It knows of only two song files, but the duplicities didn't set my phone on fire yet). You'd just have to adjust the database so that '-id'-like columns in tables are not required to be unique (as I did with my hack) and to adjust the sql query to resolve the duplicities.

adrian-bl commented 6 years ago

I believe that the breakage would not be that severe (just look at my example hacked database. It knows of only two song files, but the duplicities didn't set my phone on fire yet).

Removing duplicate songs and albums could probably be done with some GROUP BY magic - feel free to give it a try if you feel like digging into sqlite.

You'd just have to adjust the database so that '-id'-like columns in tables are not required to be unique (as I did with my hack) and to adjust the sql query to resolve the duplicities.

Having two song entries pointing to the same file is definitively not something i want to support. As i already said: We use the filename to calculate song._id, this is used for various shortcuts by the indexer - also i don't really see the appeal of having a file included twice as our current schema already (kind of) supports multiple albums and artists per file.

jonnyrobbie commented 6 years ago

Having two song entries pointing to the same file is definitively not something i want to support.

That's fair. It was just a hacky proof of concept - workaround so I wouldn't need to change anything in the app and nothing substantial in the database itself. To properly implement multiple artists/albums/songnames/other text frames would probably need to robustly improve and reorganize the database itself together with more involved DB lookups in the app itself.

as our current schema already (kind of) supports multiple albums and artists per file.

Can I have more details on that?

Also, I'm thinking about completely bypassing media scanner and just hacking together my own media_db. Am I correct that the App actually doesn't do any difficult sql querries when reading and just completely relying on database views? Is there some documentations which columns are important and which are just auxillary?

adrian-bl commented 6 years ago

Can I have more details on that?

We have the contributors_songs table: https://github.com/vanilla-music/vanilla/blob/master/app/src/main/java/ch/blinkenlights/android/medialibrary/MediaSchema.java#L67

This schema supports having multiple artists (a contributor type) per song. I was wrong about the albums - but we could change the schema to have a similar mapping

Am I correct that the App actually doesn't do any difficult sql querries when reading and just completely relying on database views?

Depends on your definition of 'difficult'. The non-trivial part is at https://github.com/vanilla-music/vanilla/blob/master/app/src/main/java/ch/blinkenlights/android/medialibrary/MediaLibraryBackend.java#L234 - all other code just does simple SELECT * from X where A=? style queries

mddvul22 commented 1 year ago

How do you actually separate tag values with a null byte. I have some ogg files with multiple genres values:

GENRE=gothic rock;alternative rock

I used vorbiscomment to extract the comments into a text file and then used the Vim editor to replace the semicolon with a null byte. (insert mode, ctrl+v, ctrl+@, write, quit. Then, I used vorbiscomment to insert the comments back into the ogg file, but then, vanilla player doesn't even see the the genre, at all.

mddvul22 commented 1 year ago

Never mind. I answered my own question. For ogg files, the metadata takes multiple GENRE lines. Modifying the file to have one GENRE line for gothic rock and a second GENRE line for alternative rock works. Thanks!

xeruf commented 1 year ago

Please extend the title to be more descriptive, like:

Of multiple artists in id3 tags only first is used

I can concur on this issue.

This song is only shown under Feint:

❯ hexdump -C Feint\ \&\ Boyinaband\ \&\ Veela\ -\ Time\ Bomb\ \(Dave\ Rap\ Version\).mp3 | head
00000000  49 44 33 04 00 00 00 00  12 42 54 49 54 32 00 00  |ID3......BTIT2..|
00000010  00 1e 00 00 00 54 69 6d  65 20 42 6f 6d 62 20 28  |.....Time Bomb (|
00000020  44 61 76 65 20 52 61 70  20 56 65 72 73 69 6f 6e  |Dave Rap Version|
00000030  29 00 54 50 45 31 00 00  00 18 00 00 03 46 65 69  |).TPE1.......Fei|
00000040  6e 74 00 42 6f 79 69 6e  61 62 61 6e 64 00 56 65  |nt.Boyinaband.Ve|
00000050  65 6c 61 00 54 44 52 43  00 00 00 06 00 00 00 32  |ela.TDRC.......2|
00000060  30 31 32 00 54 43 4f 4e  00 00 00 0d 00 00 03 44  |012.TCON.......D|
00000070  6e 42 3b 20 4c 69 71 75  69 64 00 54 42 50 4d 00  |nB; Liquid.TBPM.|
00000080  00 00 05 00 00 00 31 33  30 00 54 50 45 32 00 00  |......130.TPE2..|
00000090  00 07 00 00 03 46 65 69  6e 74 00 54 45 4e 43 00  |.....Feint.TENC.|

I also have to concur on @mddvul22 - I also used to separate genres with semicolons and now have to convert everything over - maybe you can add support for that?