robinbowes / flac2mp3

flac2mp3 is a tool to convert audio files from flac to mp3 format including the copying of tags.
GNU General Public License v3.0
143 stars 29 forks source link

Multibyte tags corrupts ID3v2.3 tags in mp3 #58

Open sktt opened 3 years ago

sktt commented 3 years ago

If the input flac contains strange multibyte characters it results in the output mp3 to have bad id3v2.3 tags.

Some tag parsers do not care about this. mediainfo shows the full tags. but ffprobe won't it stops outputting tags after the first bad tag.

How to reproduce?

Make an empty directory for the test input

mkdir input-dir
cd input-dir

Find a random flac, eg.

curl -O https://filesamples.com/samples/audio/flac/sample1.flac

Add the problematic tag

metaflac sample1.flac --set-tag='TITLE=It’s Broken'
# add some other tags to show the effect of the bug
metaflac sample1.flac --set-tag='ARTIST=Some Artist'
metaflac sample1.flac --set-tag='ALBUM=Some Album'

cd ..

Convert like normal

flac2mp3.pl input-dir output-dir

Check the tags in ffprobe: ffprobe -v quiet -show_format -print_format ini output-dir/sample1.mp3

# ffprobe output

[format]
filename=output-dir/sample1.mp3
nb_streams=1
nb_programs=0
format_name=mp3
format_long_name=MP2/3 (MPEG audio layer 2/3)
start_time=0.025057
duration=122.122449
size=2489344
bit_rate=163071
probe_score=51

[format.tags]
album=Some Album
title=It’s Broke

title is missing its last character and artist is completely missing. The problem seems to be the character in the title, which is 3 bytes long.

Tags not gone, however: mediainfo -f --Details=1 output-lol/sample1.mp3 | grep TIT2 --after=10

00001F  TIT2 - Title/songname/content description (36 bytes)
00001F   Header (10 bytes)
00001F    Frame ID:                            TIT2
000023    Size:                                25 (0x00000019)
000027    Flags:                               0 (0x0000)
000029    Tag alter preservation:              0 (0x0000000000000000)
000029    File alter preservation:             0 (0x0000000000000000)
000029    Read only:                           0 (0x0000000000000000)
000029    Compression:                         0 (0x0000000000000000)
000029    Encryption:                          0 (0x0000000000000000)
000029    Grouping identity:                   0 (0x0000000000000000)
000029   Text_encoding:                        1 (0x01)
00002A   Information:                          It’s Broken

Here Size=25 seems to be off by 1. This seems to result in some media players showing the correct tags (they don't care about the Size header) for example VLC. Others will instead just render the tags until the faulty frame, for example deadbeef.

sktt commented 3 years ago

After heaps of debugging.

It should be off by 1. At least in some implementations. But not by others. The added byte comes from unsynchronizing, ie removing byte patterns that looks like audio synchronization frames by adding an extra \x00 to it.

It turns out, however, that the spec is not clear on whether tags should be un-unsynced (ie removing that added \x00 byte) before or after parsing the id3 tags.

libid3tag (eg. libmad) for example assumes (just like MP3/Tag) that the idv3 tags are "un-unsynced" before parsing them. Whereas libmpg123 (eg. ffmpeg) appear to assume that they should be parsed "as-is".

Edit: It seems that the few mp3s with special characters that I checked (russian, korean, chinese) don't set the unsynchronisation flag. So my proposed solution would be disable id3v2_unsync. flac2all (which sets tags using with lame) for example does not enable unsync as it seems.