quodlibet / mutagen

Python module for handling audio metadata
https://mutagen.readthedocs.io
GNU General Public License v2.0
1.56k stars 158 forks source link

Embedded APE pictures have corrupted metadata/header when extracted #550

Closed ghost closed 2 years ago

ghost commented 2 years ago

The binary data from an embedded picture in an APE tag will produce an image which is not recognized as an image by other programs.

For example, if I embed a JPEG file in an APE tag, and then try to extract the embedded image back into a JPEG file, no other programs will recognize the file as a JPEG image and will not display it.

Looking at the file with a hex editor, I found that the exported APE picture has an extra 00 at the beginning of the file, which is not present in functioning JPEG files. Removing the 00 fixes the image.

I've included some sample code below which will demonstrate the issue. I have tested this with a few different JPEG files, so the particular image doesn't seem to make any difference.

>>> import mutagen
>>> apetags = mutagen.File('test.ape')
>>> with open('output.jpg', 'wb') as output_image:
...     output_image.write(apetags['COVER ART (FRONT)'].value)
... 
19596
>>>

The output.jpg file will not be readable by image viewers. The first line of output.jpg (in hex) will look like this:

0000:0000 00 FF D8 FF E0 00 10 4A 46 49 46 ...

Normal JPEGs will have this instead:

0000:0000 FF D8 FF E0 00 10 4A 46 49 46 ...

Please let me know if you need any further information to help fix this.

Thank you for your work on Mutagen!

Regards, blueblots

lazka commented 2 years ago

They also contain a description in front (null separated), see https://github.com/quodlibet/quodlibet/blob/4f3f0e3567b801e15c2135f92767b75b05827e18/quodlibet/formats/_apev2.py#L31

ghost commented 2 years ago

Thanks for responding promptly. I didn't know that there is a field name and description for the APE cover art, since its not mentioned in the documentation.

Does mutagen offer a method or interface of accessing and setting the field name and description, or should I be parsing/writing to the .value attribute manually?

lazka commented 2 years ago

It's not part of the APE tag spec, just a convention introduced by some random software 15 years ago :) https://hydrogenaud.io/index.php?topic=40603.msg504669#msg504669

mutagen currently doesn't provide any code for this.

ghost commented 2 years ago

Yeah I read that thread, its linked to in your first message (inside _apev2.py). Do you know if there would be any interest in a pull request that provides some form of .description or field name property for the APE tags, in case I decide to write a patch?

I'll close the issue now, thanks for your help.