polyfloyd / rust-id3

A rust library for reading and writing ID3 metadata
MIT License
240 stars 46 forks source link

Ignore NULL byte following TXXX frame? #135

Open randoragon opened 5 months ago

randoragon commented 5 months ago

Consider a sample mp3 file created in the following way:

ffmpeg -f lavfi -i anullsrc=r=44100:cl=mono -t 5 -q:a 9 -acodec libmp3lame -id3v2_version 0 sample.mp3
mid3v2 --TXXX Key:Value sample.mp3

The first command creates an empty, 5 seconds long mp3 file with no ID3v2 tag. The second utilizes the mid3v2 utility based on the Python Mutagen module to add an ID3v2.4 TXXX frame with description "Key" and value "Value".

If we now print the tag using id3...

use id3::{Tag, TagLike};

fn main() {
    let tag = Tag::read_from_path("../sample.mp3").unwrap();
    println!("{:#?}", tag);
}

...this is the output:

Tag {
    frames: [
        Frame {
            id: Valid(
                "TXXX",
            ),
            content: ExtendedText(
                ExtendedText {
                    description: "Key",
                    value: "Value\0",
                },
            ),
            tag_alter_preservation: false,
            file_alter_preservation: false,
            encoding: Some(
                UTF8,
            ),
        },
    ],
    version: Id3v24,
}

As you can see, there is a '\0' at the end of the TXXX value. This '\0' is not reported by any other ID3 tool I've tried (mid3v2, eyeD3, id3ted).

I've investigated this issue a lot since yesterday, I've read the ID3 Informal specification a few times, looked at issues in Mutagen (particularly https://github.com/quodlibet/mutagen/issues/379 is interesting). It seems this inconsistency between tooling is sadly a result of the ID3 specification itself being pretty vague on the subject. For instance, my not-very-educated stance on this is that TXXX is not a text frame and the optional NULL termination rules do not apply to it (justification: https://github.com/dhamaniasad/mutagen/issues/179#issuecomment-2043319114). However, a maintainer over at Mutagen's closed issue had a different opinion (https://github.com/quodlibet/mutagen/issues/379#issuecomment-486852503), claiming that:

The v2.3 spec explicitly states that text frames (TXXX) can have optional termination, (...)

...which I strongly disagree with, unless I see where this "explicitness" is coming from (I was unable to find it).

Still, as an author of a command-line ID3 tool (rsid3) based on rust-id3, and especially as a user, I do care about compliance between tools. Since Mutagen is probably the most used and popular ID3 library at the moment, I think it's especially beneficial to be compliant with it, so that users can smoothly transition and expect the same results. This is all assuming that Mutagen is compliant with the standard and the differences only arise from varying interpretations of the standard.

I guess my suggestion is to keep some kind of list of differences with Mutagen (and possibly other popular ID3 libraries, if they exist), maybe even include some remarks in the documentation. And I'd love to hear what you think about this, should this project aim for compliance with Mutagen (within the ramifications of the standard), or do we not care about it at all? Is this a bug, a feature, or neither?

randoragon commented 5 months ago

A small correction: The problem is mid3v2 specifically, not Mutagen. It produces the NULL termination, which is a known issue (https://github.com/quodlibet/mutagen/issues/384, https://github.com/quodlibet/mutagen/issues/379). I ran the same experiment with eyeD3 and id3ted, and the NULL byte was not added (and eyeD3 is based on Mutagen, so this is not a Mutagen problem).

The important difference is that all other tools ignore the NULL when reading TXXX, while rust-id3 includes it as part of the data.

polyfloyd commented 5 months ago

Thank you for your research on this matter!

When the spec is vague about something, I think it is important to adhere to whatever all other popular tooling does. Mutagen is certainly a commonly found implementation of the id3 format, so if it strips off the null terminator then so should this library.