quodlibet / mutagen

Python module for handling audio metadata
https://mutagen.readthedocs.io
GNU General Public License v2.0
1.54k stars 158 forks source link

Recurse m4a atoms to search for ilst #625

Open sezanzeb opened 1 year ago

sezanzeb commented 1 year ago

This makes it possible to read m4a files written by gstreamer, which might store tags incorrectly: https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/-/issues/759#note_2041293

closes https://github.com/quodlibet/mutagen/issues/480

An alternative to recursion could be to specify multiple possible paths.

I don't know if there could be ilst atoms that mutagen shouldn't read in mp4/m4a containers, so I don't know if this is really ok.

Assuming that gstreamer is really doing something wrong there, I hope that they fix it instead eventually. Still, adding a change to mutagen would help reading old m4a files.

sezanzeb commented 1 year ago

Alternative:

    def load(self, atoms: Atoms, fileobj):
        try:
            self._load_from_path(atoms.path(b"moov", b"udta", b"meta", b"ilst"), fileobj)
        except KeyError as key:
            raise MP4MetadataError(key)

        try:
            # for reading m4a files written by gstreamer
            self._load_from_path(atoms.path(b"moov", b"trak", b"udta", b"meta", b"ilst"), fileobj)
        except KeyError:
            pass

    def _load_from_path(self, path: List[Atom], fileobj):
        free = _find_padding(path)
        self._padding = free.datalength if free is not None else 0

        ilst = path[-1]
        for atom in ilst.children:
            ok, data = atom.read(fileobj)
            if not ok:
                raise MP4MetadataError("Not enough data")

            try:
                if atom.name in self.__atoms:
                    info = self.__atoms[atom.name]
                    info[0](self, atom, data)
                else:
                    # unknown atom, try as text
                    self.__parse_text(atom, data, implicit=False)
            except MP4MetadataError:
                # parsing failed, save them so we can write them back
                self._failed_atoms.setdefault(_name2key(atom.name), []).append(data)
sezanzeb commented 1 year ago

According to https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/-/issues/759#note_2041908 there can be multiple tracks, and each track can have its own set of metadata.

It would be interesting to have such a file to figure out how to correctly go over each track. In any case, the variation that recurses over everything (which is currently commited in this PR) would find all of them.