quodlibet / mutagen

Python module for handling audio metadata
https://mutagen.readthedocs.io
GNU General Public License v2.0
1.55k stars 158 forks source link

ID3: Merge duplicate text frames on loading (ID3hack) #172

Closed lazka closed 8 years ago

lazka commented 10 years ago

Originally reported by: Christoph Reiter (Bitbucket: lazka, GitHub: lazka)


From dgasa...@gmail.com on March 12, 2014 05:02:32

It appears, when reading tags from an MP3 through scripts using mutagen, or when using mid3v2, I only get the last COMM frame from the file.  For example, this is what mid3v2 displays for a file given three comments in foobar2000:

[dave@nymph Ant Videos]$ mid3v2 00\ -\ Little\ Sur.mp3
IDv2 tag info for 00 - Little Sur.mp3:
COMM=='eng'= https://soundcloud.com/jhnmyr/little-sur TALB=[non-album tracks]
TCOM=John Mayer
TIT2=Little Sur
TMCL=[unrepresentable data]
TPE1=John Mayer
TSOP=Mayer, John
TXXX=MusicBrainz Artist Id=144ef525-85e9-40c3-8335-02c32d0861f3
TXXX=MusicBrainz Work Id=869a9d8f-15e4-471a-a2bd-19078d15b639
UFID=http://musicbrainz.org=62a1f0f6-77e5-4111-a219-40577333daed

[dave@nymph Ant Videos]$ id3ted -l 00\ -\ Little\ Sur.mp3 
ID3v2.4 - 12 frames:
TIT2: Little Sur
TPE1: John Mayer
TALB: [non-album tracks]
UFID: 
TMCL: electric bass guitar Pino Palladino piano Chick Corea electric guitar John Mayer drums Steve Jordan trumpet Wallace Roney
TSOP: Mayer, John
TXXX: [MusicBrainz Artist Id]: 144ef525-85e9-40c3-8335-02c32d0861f3
TXXX: [MusicBrainz Work Id]: 869a9d8f-15e4-471a-a2bd-19078d15b639
TCOM: John Mayer
COMM: [](eng): Recorded on 2014-02-22.
COMM: [](eng): Released to soundcloud and tumblr on 2014-03-10.
COMM: [](eng): https://soundcloud.com/jhnmyr/little-sur I sure this is because the ID3 spec says "There may be more than one comment frame in each tag, but only one with the same language and content descriptor."  Nevertheless, some software (foobar2000) is writing them, so I propose that mutagen support reading (but not writing) them.

Original issue: http://code.google.com/p/mutagen/issues/detail?id=172


lazka commented 8 years ago

Original comment by Sophist UK (Bitbucket: Sophist-UK, GitHub: Sophist-UK):


Yup - have already done that.

lazka commented 8 years ago

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


You can merge those values yourself if the default behavior isn't what you want.

lazka commented 8 years ago

Original comment by Sophist UK (Bitbucket: Sophist-UK, GitHub: Sophist-UK):


Ah yes - though for COMM, USLT tags, a newline character might make more sense.

lazka commented 8 years ago

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


I am not sure what currently happens if you have a multi-string ID3v24 COMM tag and then try to save it as ID3v23 where the spec does not support multiple strings.

See the v23_sep argument on save() https://mutagen.readthedocs.org/en/latest/api/id3.html#mutagen.id3.ID3.save

lazka commented 8 years ago

Original comment by Sophist UK (Bitbucket: Sophist-UK, GitHub: Sophist-UK):


ID3v2.4 spec says "There may be more than one comment frame in each tag, but only one with the same language and content descriptor."

The example above shows multiple comments with the same language (eng) and content descriptor [].

I would support loading tags saved in this way, treating them as if they had been a single id3v24 COMM tag with multiple null-terminated strings as per spec.

However mutagen should (IMO) always write tags which conform to the specification, and so should not write tags in this way.

I am not sure what currently happens if you have a multi-string ID3v24 COMM tag and then try to save it as ID3v23 where the spec does not support multiple strings.

lazka commented 9 years ago

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


Also see issue #167

lazka commented 10 years ago

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


current foobar2000 at least doesn't create such files when adding multi value tags and switching to id3v2.4

quodlibet supports reading them using the ID3hack subclass: https://code.google.com/p/quodlibet/source/browse/quodlibet/quodlibet/formats/_id3.py#28

I tend to agree that we should read them.