google / wuffs

Wrangling Untrusted File Formats Safely
Other
4.06k stars 129 forks source link

wuffs_base__magic_number_guess_fourcc can't identify VP8X #61

Closed pjanx closed 2 years ago

pjanx commented 2 years ago
        if (y == 0x56503820) {         // 'VP8 'be
          return 0x57503820;           // 'WP8 'be
        } else if (y == 0x5650384C) {  // 'VP8L'be
          return 0x5750384C;           // 'WP8L'be
        }

An "Extended file format" WEBP starts with a "VP8X" chunk, VP8/VP8L may come after that.

https://github.com/webmproject/libwebp/blob/master/doc/webp-container-spec.txt

I'm not entirely sure what the current WEBP magic detection is for--if it said just "WEBP", it would be workable.

nigeltao commented 2 years ago

I thought it might be nice for wuffs_base__magic_number_guess_fourcc to discriminate lossy from lossless. But you're right that that's not 100% do-able (when looking at a fixed number of opening bytes) because of VP8X and its optional but arbitrarily long ICCP chunk before we hit the image data bitstream.

If I'm reading that linked WebP spec right (pretty HTML version is at https://developers.google.com/speed/webp/docs/riff_container), an animated WebP can actually mix lossy and lossless between frames.

As you said, we should probably just group them all as "WEBP".