tinytag / tinytag

Python library for reading audio file metadata
MIT License
689 stars 102 forks source link

[BUG] ID3: Only first WOAR frame is kept #206

Closed aw-was-here closed 3 weeks ago

aw-was-here commented 5 months ago

Describe the bug According to the ID3 spec, the WOAR frame may appear multiple times

To Reproduce

  1. tinytag open an MP3 with multiple WOAR frame
  2. see that extra["url"] only has the first entry and not the second

Expected behavior extra["url"] should be an (ordered) list wit all WOAR frame

Sample File example

aw-was-here commented 5 months ago

Looks like most of the website tags have the same problem.

mathiascode commented 5 months ago

This will require special handling similar to artists and genres. Which types of URLs are you interested in?

aw-was-here commented 5 months ago

I'm particularly interested in the artist websites (e.g., equivalent of WOAR under ID3). Since 2.0 is going to be incompatible, maybe this would be a good time for TinyTag to automatically put things in lists if it detects more than on frame for a field, regardless of the frame type?

EDIT: I'm hitting the same problem with fields like 'musicbrainz_artistid' in flac's and that's an absolute killer for my app. I won't be able to use TinyTag exclusively without getting all of the values for fields that are allowed to be repeated. ☹️

mathiascode commented 5 months ago

Since 2.0 is going to be incompatible, maybe this would be a good time for TinyTag to automatically put things in lists if it detects more than on frame for a field, regardless of the frame type?

I don't think doing this automatically this is a good idea. Returned types should be predictable and consistent, we can't just switch between a string/integer and list.

How about keeping non-extra fields as-is, but always using lists for extra fields? There will be a lot of single-item lists, but we can also guarantee that a list will always contain at least one item.

aw-was-here commented 5 months ago

I don't think doing this automatically this is a good idea. Returned types should be predictable and consistent, we can't just switch between a string/integer and list.

You could always just make them always lists.

How about keeping non-extra fields as-is, but always using lists for extra fields? There will be a lot of single-item lists, but we can also guarantee that a list will always contain at least one item.

That sounds reasonable, esp if the non-extra lists also appear in extra as lists.

mathiascode commented 3 months ago

Can you test https://github.com/devsnd/tinytag/pull/209 and verify that it works as expected?

New behavior:

aw-was-here commented 3 months ago

Sorry this completely slipped my mind (I was at the Cruel World festival). I'll try to bang on it here over the weekend.

aw-was-here commented 3 months ago
  • Non-extra attributes (e.g. album, artist, genre) store the first value. If any additional values are found for these attributes, they are stored in extra lists (e.g. extra.album, extra.artist, extra.genre).

Including the one value that is stored in non-extra, correct? Just want to verify what I'm seeing.

mathiascode commented 3 months ago

Including the one value that is stored in non-extra, correct? Just want to verify what I'm seeing.

Excluding the first non-extra value (except for composer, but I decided to undeprecate it, which will also fix the behavior there).

The idea is that you use the non-extra fields to get common info about a file as a single value (in part for backwards compatibility, to avoid breakage for every existing user out there, but also for consistency). For more "advanced" use cases, the extra dict provides any remaining, nonduplicate info in the form of lists.

aw-was-here commented 3 months ago

Ok then there is either a bug or I'm doing something wrong. With a track with a single artist, I'm seeing the artist primary value also showing up in extra.artists. FWIW, it'd make my life easier to just read that one field but in the end appending doesn't make much difference. :smile:

{'album': 'Ghosts I-IV',
 'albumartist': 'Nine Inch Nails',
 'artist': 'Nine Inch Nails',
 'bitdepth': 24,
 'bitrate': 1058.4,
 'channels': 2,
 'comment': None,
 'disc': 1,
 'disc_total': 1,
 'duration': 113.148,
 'extra': {'ab:genre': ['Jazz'],
           'ab:mood': ['Not sad'],
           'acoustid id': ['02d23182-de8b-493e-a6e1-e011bfdacbcf'],
           'arranger': ['Atticus Ross'],
           'artists': ['Nine Inch Nails'],
           'asin': ['B00158SHD8'],
           'barcode': ['766929908628'],
           'catalognumber': ['halo twenty six LE'],
           'encoded_by': ['Lavf59.27.100'],
           'engineer': ['Alan Moulder'],
           'isrc': ['USTC40852243'],
           'label': ['The Null Corporation'],
           'license': ['https://creativecommons.org/licenses/by-nc-sa/3.0/us/'],
           'media': ['Digital Media'],
           'mixer': ['Alan Moulder'],
           'musicbrainz album artist id': ['b7ffd2af-418f-4be2-bdd1-22f8b48613da'],
           'musicbrainz album id': ['3af7ec8c-3bf4-4e6d-9bb3-1885d22b2b6a'],
           'musicbrainz album release country': ['XW'],
           'musicbrainz album status': ['official'],
           'musicbrainz album type': ['album'],
           'musicbrainz artist id': ['b7ffd2af-418f-4be2-bdd1-22f8b48613da'],
           'musicbrainz release group id': ['550bf4b9-92ca-30ba-9ea2-8b45f9d081f1'],
           'musicbrainz release track id': ['168cb2db-5626-30c5-b822-dbf2324c2f49'],
           'musicbrainz track id': ['2d7f08e1-be1c-4b86-b725-6e675b7b6de0'],
           'originaldate': ['2008-03-02'],
           'originalyear': ['2008'],
           'performer': ['Trent Reznor'],
           'producer': ['Atticus Ross'],
           'script': ['Latn'],
           'website': ['https://www.nin.com/']},
 'filename': '/Users/aw/Src/whats-now-playing/tests/audio/15_Ghosts_II_64kb_füllytâgged.m4a',
 'filesize': 11036760,
 'genre': 'Industrial',
 'images': {'back_cover': [],
            'extra': {},
            'front_cover': [{'data': b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x01\x00H\x00H\x00\x00\xff\xfe\x00\x0cAppleMark\n\xff\xdb\x00\x84\x00\x02\x02\x02\x02\x02\x02..',
                             'description': None,
                             'mime_type': 'image/jpeg',
                             'name': 'front_cover'},
                            {'data': b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x02\x01\x00H\x00H\x00\x00\xff\xe1\t\xf9Exif\x00\x00MM\x00*\x00\x00\x00\x08\x00\x0e\x01\x00\x00\x03\x00..',
                             'description': None,
                             'mime_type': 'image/jpeg',
                             'name': 'front_cover'}],
            'leaflet': [],
            'media': [],
            'other': []},
 'samplerate': 22050,
 'title': '15 Ghosts II',
 'track': 15,
 'track_total': 36,
 'year': '2008-03-02'}
mathiascode commented 3 months ago

artists must be some format-specific field. Maybe I have to map it to extra.artist.

mathiascode commented 3 months ago

Should be fixed in the latest revision of the PR.

mathiascode commented 3 months ago

FWIW, it'd make my life easier to just read that one field but in the end appending doesn't make much difference. 😄

You should now be able to do this with tag.as_dict(flatten=True). The as_dict() method returns the tag as a dictionary, and the flatten parameter merges the fields in the extra dict with the non-extra ones.

Edit: flatten is now true by default, so you can just use tag.as_dict().

mathiascode commented 2 months ago

Have you been able to test the latest changes?

mathiascode commented 3 weeks ago

I'll assume the changes in the PR are fine, and merge it. Let me know if something is still missing.

mathiascode commented 1 week ago

Updated documentation for 2.0.0 is available here: https://github.com/mathiascode/tinytag/blob/2.0.0-docs/README.md

Let me know if something is unclear, missing or incorrect, and I'll make improvements before the final release.