google / wuffs

Wrangling Untrusted File Formats Safely
Other
4.07k stars 129 forks source link

Decode PNG text chunks (tEXt, zTXt, iTXt) #55

Closed nigeltao closed 2 years ago

nigeltao commented 2 years ago

Spun out of https://github.com/google/wuffs/issues/13#issuecomment-965769040

I need PNG metadata readout... complete support of text chunks (for thumbnails).

@pjanx can you clarify what "complete support" means? According to https://www.w3.org/TR/2003/REC-PNG-20031110/#11tEXt such chunks are actually key-value pairs. Do you need the keys too or only the values? iTXt chunks also have language codes (e.g. ISO 639, ISO 646) and the key can also be translated (e.g. from English to Japanese). Do you need that too?

It may be helpful if you can attach some example PNG files (with text chunks) and say what you need to crack out of them.

pjanx commented 2 years ago

Complete means tEXt, zTXt, iTXt. Former two use Latin 1, last one UTF-8. There's also a minor complication in that these chunks can be at both ends of the file, the end user probably doesn't want to keep that distinction.

libpng simply accumulates them (though you need to read the start and end chunks both explicitly, unless using the constrained API) and then you can read an array out of the info structure. spng is rather similar.

What I'm doing is requesting values for particular keys, and I'm not interested in “translated keywords”. There can be multiple values for a given key, and the simplest viable high-level API is func(keyword string)[]string. ISO 8859-1 is trivially converted to UTF-8, so that defines the encoding. A low-level API would call back with:

struct {
    keyword, text string
    languageCode, translatedKeyword *string
}

Have a look at ~/.cache/thumbnails in a GNOME/KDE system. Specification.

nigeltao commented 2 years ago

Thanks for the comment. Just a quick reply to the idea of string, *string and []string in the APIs...

Wuffs' higher level auxiliary C++ API can do this but one memory-safety constraint on the lower level C API is that it cannot allocate memory and specifically, the callee cannot create or hold arbitrarily long strings. C also just doesn't have a good string type. I'm not sure which level you had in mind but, if your program is C and not C++, then you're obviously restricted to the C API, not C++.

In terms of API design, I'll also copy/paste from what I just wrote at https://github.com/google/wuffs/issues/39#issuecomment-966685935

Wuffs' image, image metadata and color correction APIs span many file formats... so there's some abstraction that might look weird at first glance.

In particular, languageCode and translatedKeyword may be part of PNG iTXt chunks, but IIRC they're not part of GIF, JPEG, etc. comments.

pjanx commented 2 years ago

The content encoding can be simply tagged, that would also be viable. I haven't had the necessity to learn about GIF or JPEG.

zTXt is compressed, not sure how you want to handle the decompression there, then.

nigeltao commented 2 years ago

iCCP payload is also zlib-compressed and we already handle that.

pjanx commented 2 years ago

Where/by whom is the decompression buffer allocated?

nigeltao commented 2 years ago

Throughout Wuffs' C API, it's always the caller (not the callee) who allocates variable-length buffers. Pass that caller-allocated buffer as the wuffs_base__io_buffer* a_dst argument to wuffs_base__image_decoder__tell_me_more or wuffs_png__decoder__tell_me_more.

If the buffer is too short, that call will return wuffs_base__suspension__short_write and it's up to the caller to re-allocate and call again, or otherwise abort (because it cannot or will not allocate more memory).

The C++ API (e.g. used by example/imageviewer) manages the memory for you, so that the C++ callback (e.g. imageviewer's MyDecodeImageCallbacks::HandleMetadata) gets a contiguous (ptr, len) pair: the raw arg. If you want to dig into the C++ code, start with this line:

https://github.com/google/wuffs/blob/f4c2d4652e843ffbf63ab7e9898353044fdf59ba/internal/cgen/auxiliary/base.cc#L214

pjanx commented 2 years ago

That sounds like ISO Latin 1 could still be just another decoder. Sadly, with zlib compression, it might be two levels of transformations to do.

Code for the encoding conversion is a trivial exercise, having it done automatically would just make the API less awkward to use.

Speaking of awkward, it looks like both libpng and spng make you assume the encoding based on the ‘type’ of the chunk, which is exposed in their structures. Luckily for me, everything above ASCII happens to be percent-encoded in thumbnails…

nigeltao commented 2 years ago

Wuffs' PNG decoder should now be able to give you text chunks (whether iTXt, tEXt or zTXt). If you're using the C++ API, this patch shows how to get the data. Starting at:

https://github.com/google/wuffs/blob/66cfd3e02f8c085963ca7ec122500f366226abe3/example/imageviewer/imageviewer.cc#L178

Apply:

diff --git a/example/imageviewer/imageviewer.cc b/example/imageviewer/imageviewer.cc
index 96938453..b0327414 100644
--- a/example/imageviewer/imageviewer.cc
+++ b/example/imageviewer/imageviewer.cc
@@ -175,6 +175,23 @@ class MyDecodeImageCallbacks : public wuffs_aux::DecodeImageCallbacks {
               1e5 / (g_flags.screen_gamma * minfo.metadata_parsed__gama());
           break;
       }
+    } else {
+      const char* name = nullptr;
+      switch (minfo.metadata__fourcc()) {
+        case WUFFS_BASE__FOURCC__KVPK:
+          name = "Key";
+          break;
+        case WUFFS_BASE__FOURCC__KVPV:
+          name = "Val";
+          break;
+      }
+      static char buf[4096];
+      if (name && (raw.len < 4096)) {
+        // Convert raw (a wuffs_base__slice_u8) to a NUL-terminated C string.
+        memcpy(buf, raw.ptr, raw.len);
+        buf[raw.len] = 0x00;
+        printf("    %s %s\n", name, buf);
+      }
     }
     return wuffs_aux::DecodeImageCallbacks::HandleMetadata(minfo, raw);
   }
@@ -244,6 +261,7 @@ load_image(const char* filename) {
   uint64_t dia_flags = 0;
   if (g_flags.screen_gamma > 0) {
     dia_flags |= wuffs_aux::DecodeImageArgFlags::REPORT_METADATA_GAMA;
+    dia_flags |= wuffs_aux::DecodeImageArgFlags::REPORT_METADATA_KVP;
   }

   MyDecodeImageCallbacks callbacks;

For example, running ./build-example.sh example/imageviewer && gen/bin/example-imageviewer ~/.cache/thumbnails/large/etc.png prints:

    Key Thumb::URI
    Val file:///usr/share/images/desktop-base/desktop-grub.png
    Key Thumb::MTime
    Val 1559986823

If you're using the C API, test_wuffs_png_decode_metadata_kvp has some code you can study (link below). To keep the test simple, it assumes that the entire PNG input can fit into memory at once, as will each key/value pair within. If you're streaming, then you'll need to take care of Wuffs' usual "short read/write" suspensions.

https://github.com/google/wuffs/blob/66cfd3e02f8c085963ca7ec122500f366226abe3/test/c/std/png.c#L730


The callee, not the caller, translates from Latin-1 to UTF-8 when necessary.