pnggroup / libpng

LIBPNG: Portable Network Graphics support, official libpng repository
http://libpng.sf.net
Other
1.25k stars 611 forks source link

Reject reads that exceed our buffer length in png_push_fill_buffer. #552

Closed johnstiles-google closed 4 months ago

johnstiles-google commented 4 months ago

Previously, png_push_fill_buffer would simply truncate the read, leaving the remainder of the output buffer in an uninitialized state. This can lead to undefined behavior when handling truncated or malformed inputs.

johnstiles-google commented 4 months ago

Reproducer:

void EnsureTruncatedPNGWorksInMSAN() {
  png_byte png_data[] = {
      0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a, 0x00, 0x00, 0x00, 0x0d,
      0x49, 0x48, 0x44, 0x52, 0x00, 0x00, 0x00, 0xf0, 0x00, 0x00, 0x00, 0xf0,
      0x08, 0x06, 0x00, 0x00, 0x00, 0x3e, 0x55, 0xe9, 0x92, 0x00, 0x00, 0x00,
      0x95, 0x65, 0x58, 0x49, 0x66, 0x89, 0x47, 0x50, 0x4e, 0x0d, 0x0a, 0x1a,
      0x0a, 0x00, 0x61, 0x61, 0x61, 0x61, 0x61, 0x61, 0x61, 0x61, 0x61, 0x61,
      0x61, 0x61, 0x61, 0x3f, 0x3f, 0x3f, 0x3f, 0x3f, 0x3f, 0x3f, 0x3f, 0x3f,
      0x3f, 0x3f, 0x3f, 0x3f, 0x3f, 0x3f, 0x3f, 0x3f, 0x26, 0x0b, 0x13, 0x01,
      0x00, 0x9a, 0x9c, 0x18, 0x00, 0x00, 0x00, 0x07, 0x74, 0x49, 0x4d, 0x45,
      0x07, 0x7d, 0x01, 0x1a, 0x16, 0x3b, 0x05, 0xc3, 0xff, 0x6f, 0x00, 0x00,
      0x00, 0x19, 0x74, 0x45, 0x58, 0x74, 0xb2, 0x43, 0x6f, 0x6d, 0x2d, 0x65,
      0xa0, 0x6e, 0x74, 0x00, 0x43, 0x72, 0x65, 0x61, 0x74, 0x65, 0x00, 0x43,
      0x72, 0x65, 0x61, 0x74, 0x65, 0x64, 0x20, 0x77, 0x69, 0x74, 0x68, 0x20,
      0x47, 0x49, 0x4d, 0xe2, 0x35, 0x87, 0xc3, 0xa1, 0x00, 0x00, 0x00, 0x49,
      0x45, 0x4e, 0x44, 0xef, 0x04, 0x3e, 0x00, 0xbf, 0x00, 0xae, 0x49, 0x44,
      0x41, 0x54, 0x68, 0x81, 0xed, 0xd5, 0x6b, 0x99, 0x25, 0x2e, 0xff, 0xff,
      0x00, 0xae, 0x79, 0x79, 0x79, 0x42, 0x60, 0x69, 0x82, 0x79, 0x79, 0x79,
      0xf0, 0x7e,
  };
  static_assert(sizeof(png_data) == 194);

  png_structp png_ptr = png_create_read_struct(PNG_LIBPNG_VER_STRING, nullptr,
                                               nullptr, nullptr);
  assert(png_ptr);

  png_infop info_ptr = png_create_info_struct(png_ptr);
  assert(info_ptr);

  if (setjmp(png_jmpbuf(png_ptr))) {
    // If we reach here, libpng correctly flagged the PNG data as corrupt.
    return;
  }

  png_set_progressive_read_fn(png_ptr, nullptr, nullptr, nullptr, nullptr);

  png_process_data(png_ptr, info_ptr, png_data, sizeof(png_data));
}

Without the patch, MSAN reports a failure in the CRC reading code, which calls png_read_data into an otherwise uninitialized buffer, then uses the CRC value:

==2249552==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x5571bfe24e93 in png_crc_finish third_party/libpng/pngrutil.c:229:8
    #1 0x5571bfe2c609 in png_handle_eXIf third_party/libpng/pngrutil.c:2082:10
    #2 0x5571bfdf3a2a in png_push_read_chunk third_party/libpng/pngpread.c:301:7
    #3 0x5571bfdf0d0d in png_process_some_data third_party/libpng/pngpread.c:109:10
    #4 0x5571bfdf0d0d in png_process_data third_party/libpng/pngpread.c:46:7
    #5 0x5571be5aaa20 in EnsureTruncatedPNGWorksInMSAN ...

  Uninitialized value was created by an allocation of 'crc_bytes.i' in the stack frame
    #0 0x5571bfe24acf in png_crc_error third_party/libpng/pngrutil.c:253:4
    #1 0x5571bfe24acf in png_crc_finish third_party/libpng/pngrutil.c:229:8

SUMMARY: MemorySanitizer: use-of-uninitialized-value third_party/libpng/pngrutil.c:229:8 in png_crc_finish
johnstiles-google commented 4 months ago

@jbowler When you have a moment, PTAL? The tests won't run until a maintainer initiates them.

jbowler commented 4 months ago

I have the same access as you. I can't believe this is a bug in libpng, please provide a repro. I.e. a test file and minimal test case.

johnstiles-google commented 4 months ago

https://github.com/pnggroup/libpng/pull/552#issuecomment-2052435114 is a minimal repro case.

jbowler commented 4 months ago

#552 (comment) is a minimal repro case.

My apologies - my bad (using github on a cellphone. There is a law about that somewhere, don't github and cellphone?)

This is weird; careful analysis will be required because your code change is clearly correct if we believe in "self documenting" code (i.t. buffer_size means what it suggests, not "last valid buffer index").

The problem is that the bug implies that any user of the progressive reader that reads to end-of-stream would detect an error and any user of the progressive reader that used libpng to read over a PNG embedded in a stream would have seen detectably weird behavior.

This suggests a regression. The bug also confirms the lack of a working unit test for the progressive reader.

Nasty. Can't see a CVE but there might be one.

jbowler commented 4 months ago

Also, there have to be two bugs don't there? The code change in question causes libpng to read an extra (initialized) byte from the buffer; one it should have read in the first place.

But why, without the fix, is there an "uninitialized" byte available? That doesn't make any sense to me.

johnstiles-google commented 4 months ago

png_push_fill_buffer is used as a read callback. It is set as such by png_set_progressive_read_fn.

Read callbacks are given a buffer and a number of bytes to read. There is no mechanism for partially successful reads—either the read is fully satisfied, or png_error should be called. As an example, see png_image_memory_read.

If you run the reproducer in MSAN you will see that png_crc_finish accesses memory that was never initialized. This happens because this read only partially fills this buffer, but the read function returns successfully anyway instead of calling png_error.

My proposed change should cause png_error to be called instead when a read is partially satisfied.

As you deduced, in cases where the read can be fully satisfied, the logic is unchanged.

johnstiles-google commented 4 months ago

Caveat: I don't know how to run the libpng unit tests locally, and don't have the permissions to run them on github. As far as I can see, Chrome and Skia don't show any negative repercussions from this change from a unit-test perspective. I cannot reason through a way that the previous code could be correct—leaving the buffer partially unwritten seems like it could not be correct ever. However I am not an expert in libpng so there may be nuances that I am unfamiliar with.

johnstiles-google commented 4 months ago

Bear in mind that this is C, not python: indentation does not matter.

I am not sure what you mean by this—the indentation isn't changed from before, nor is it abnormal.

jbowler commented 4 months ago

The comment about indentation was misleading, but look at Guy's original code (which is substantially unaltered in this interface)

Look at Guy's original code; it hasn't changed since Guy wrote it. This may help you understand why it works and why it doesn't read uninitialized data.

jbowler commented 4 months ago

I believe this bug is in the original code, which only implemented progressive reading for the IDAT chunks. If I am correct the bug arises because of a call to png_push_fill_buffer which is not preceded by the required check for available bytes. This is certainly true of the initial signature check (in 1.6.44) but that's not where this file seems to fail.

There is another possibility which is much worse and which I won't mention here.

In either case it looks like an internal libpng error and it looks like it might be fairly serious (possible CVE level).

This is an update; I will investigate more.

johnstiles-google commented 4 months ago

Just to follow up, I have now run the libpng unit tests locally. (All you need to do is make test! Refreshingly simple!) I can now confirm that this change doesn't impact any unit test results in libpng.

That being said, I'm admittedly not an expert in libpng design philosophy—if there is a better way to fix the issue, that's fine too. I'm highly interested in solving the MSAN issue, but much less concerned about how we fix it. This seemed to be the simplest approach to me, and seemed to get to the root problem in a way that didn't add complexity, but maybe there is a better way.

~/libpng $ make test
/Applications/Xcode_15.2.app/Contents/Developer/usr/bin/make  pngtest pngunknown pngstest pngvalid pngimage pngcp timepng
make[1]: `pngtest' is up to date.
make[1]: `pngunknown' is up to date.
make[1]: `pngstest' is up to date.
make[1]: `pngvalid' is up to date.
make[1]: `pngimage' is up to date.
make[1]: `pngcp' is up to date.
make[1]: `timepng' is up to date.
/Applications/Xcode_15.2.app/Contents/Developer/usr/bin/make  check-TESTS
PASS: tests/pngtest-all
PASS: tests/pngvalid-gamma-16-to-8
PASS: tests/pngvalid-gamma-alpha-mode
PASS: tests/pngvalid-gamma-background
PASS: tests/pngvalid-gamma-expand16-alpha-mode
PASS: tests/pngvalid-gamma-expand16-background
PASS: tests/pngvalid-gamma-expand16-transform
PASS: tests/pngvalid-gamma-sbit
PASS: tests/pngvalid-gamma-threshold
PASS: tests/pngvalid-gamma-transform
PASS: tests/pngvalid-progressive-size
PASS: tests/pngvalid-progressive-interlace-standard
PASS: tests/pngvalid-transform
PASS: tests/pngvalid-progressive-standard
PASS: tests/pngvalid-standard
PASS: tests/pngstest-1.8
PASS: tests/pngstest-1.8-alpha
PASS: tests/pngstest-linear
PASS: tests/pngstest-linear-alpha
PASS: tests/pngstest-none
PASS: tests/pngstest-none-alpha
PASS: tests/pngstest-sRGB
PASS: tests/pngstest-sRGB-alpha
PASS: tests/pngunknown-IDAT
PASS: tests/pngunknown-discard
PASS: tests/pngunknown-if-safe
PASS: tests/pngunknown-sAPI
PASS: tests/pngunknown-sTER
PASS: tests/pngunknown-save
PASS: tests/pngunknown-vpAg
PASS: tests/pngimage-quick
PASS: tests/pngimage-full
============================================================================
Testsuite summary for libpng 1.6.44.git
============================================================================
# TOTAL: 32
# PASS:  32
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0
============================================================================
~/libpng $ git diff                                        
diff --git a/pngpread.c b/pngpread.c
index ffab19c08..1f81ccea6 100644
--- a/pngpread.c
+++ b/pngpread.c
@@ -455,11 +455,11 @@ png_push_fill_buffer(png_structp png_ptr, png_bytep buffer, size_t length)
    {
       size_t save_size;

-      if (length < png_ptr->current_buffer_size)
+      if (length <= png_ptr->current_buffer_size)
          save_size = length;

       else
-         save_size = png_ptr->current_buffer_size;
+         png_error(png_ptr, "Truncated data");

       memcpy(ptr, png_ptr->current_buffer_ptr, save_size);
       png_ptr->buffer_size -= save_size;
~/libpng $ 
jbowler commented 4 months ago

@johnstiles-google I am completely unable to repro using your "reproducer". I attach a version which compiles against .libs/libpng16.a (GCC 13.2.1 with -g and -O2 for libpng). I pulled your patch for libpng but when I run the program it exits normally; not via png_error/setjmp as it would if the repro worked.

pr552-orig.cpp.txt

Please try this yourself. I examined the data; it contains two complete chunks and nothing else, so the save buffer never gets used (the pointer remains NULL) and there is always sufficient data in current_buffer_ptr. The read loop just exits because it runs out of data.

I've attached the data as a PNG file:

pr552.png.txt

But it's not much use because it doesn't seem to correspond to the bug you are reporting; libpng doesn't care about the bad CRC or the actual eXIf contents, it just warns that they are wrong.

johnstiles-google commented 4 months ago

I just reconfirmed that the error case is reached. Instead of png_error, I changed the else case to an abort(). Here is the stack crawl.

7 0x7fef207104b2 abort

8 0x558b364631e8 png_push_fill_buffer [../../third_party/libpng/pngpread.c:461:10]

9 0x558b3646b74f png_read_data [../../third_party/libpng/pngrio.c:37:7]

10 0x558b364762ef png_crc_error [../../third_party/libpng/pngrutil.c:275:4]

11 0x558b36476148 png_crc_finish [../../third_party/libpng/pngrutil.c:229:8]

12 0x558b36478752 png_handle_eXIf [../../third_party/libpng/pngrutil.c:2082:10]

13 0x558b36462a54 png_push_read_chunk [../../third_party/libpng/pngpread.c:301:7]

14 0x558b36461eeb png_process_some_data [../../third_party/libpng/pngpread.c:109:10]

15 0x558b36461e03 png_process_data [../../third_party/libpng/pngpread.c:46:7]

16 0x558b35655d48 gfx::PNGCodec_EnsureTruncatedPNGWorksInMSAN_Test::TestBody() [../../ui/gfx/codec/png_codec_unittest.cc:1111:3]

There is a chance that the disconnect comes from our config settings. Here is the pnglibconf.h that we use in Chrome: https://source.chromium.org/chromium/chromium/src/+/main:third_party/libpng/pnglibconf.h?q=pnglibconf.h&ss=chromium

My initial hunch was that this was the reason for the disparity: PNG_USER_CHUNK_CACHE_MAX

However, I've tried changing that back to 1000 and it doesn't affect the outcome.

I did find that if you disable PNG_READ_eXIf_SUPPORTED, then no error occurs. It may be possible that you haven't enabled this flag.

jbowler commented 4 months ago

Your version of png_handle_eXIf is borked somehow. It's reading too many bytes. First thing to do is to document the 'length' parameters to each of the calls to png_push_fill_buffer; either note them down under the debugger or just add a printf (well, fprintf(stderr, if you still have the abort in). Then we can compare with mine. If the break is a compiler issue then an fprintf might hide it but then the bug would not repro so that would prove something too.

So just post a list of the arguments in turn.

johnstiles-google commented 4 months ago

I've made this change and kept the abort() on under-read in place. Here's what I see.

void PNGCBAPI
png_push_fill_buffer(png_structp png_ptr, png_bytep buffer, size_t length)
{
   png_bytep ptr;

   if (png_ptr == NULL)
      return;

   printf("png_push_fill_buffer: length=%zu, save_buffer_size=%zu, current_buffer_size=%zu \n",
          length, png_ptr->save_buffer_size, png_ptr->current_buffer_size);
[ RUN      ] PNGCodec.EnsureTruncatedPNGWorksInMSAN
png_push_fill_buffer: length=8, save_buffer_size=0, current_buffer_size=194 
png_push_fill_buffer: length=4, save_buffer_size=0, current_buffer_size=186 
png_push_fill_buffer: length=4, save_buffer_size=0, current_buffer_size=182 
png_push_fill_buffer: length=13, save_buffer_size=0, current_buffer_size=178 
png_push_fill_buffer: length=4, save_buffer_size=0, current_buffer_size=165 
png_push_fill_buffer: length=4, save_buffer_size=0, current_buffer_size=161 
png_push_fill_buffer: length=4, save_buffer_size=0, current_buffer_size=157 
png_push_fill_buffer: length=1, save_buffer_size=0, current_buffer_size=153 
png_push_fill_buffer: length=1, save_buffer_size=0, current_buffer_size=152 
png_push_fill_buffer: length=149, save_buffer_size=0, current_buffer_size=151 
png_push_fill_buffer: length=4, save_buffer_size=0, current_buffer_size=2 
Received signal 6
johnstiles-google commented 4 months ago

Note also, we are on libpng 1.6.37. There is a small chance that this bug has already been discovered and fixed, and we just need to update.

jbowler commented 4 months ago

This is the bug:

png_push_fill_buffer: length=149, save_buffer_size=0, current_buffer_size=151

It's from png_handle_eXIf as are the two previous 1 byte reads but there are only 149 bytes in the whole of the eXIf data. So png_handle_eXIf reads two extra bytes and after that point the whole stream is out-of-whack.

This is the "[other] possibility which is much worse and which I won't mention here" because it affects everything. A completely correct PNG with an eXIf chunk which triggers that case will be reported as corrupted by every one of the libpng read APIs as soon as the next chunk is read. There's no point adding an extra check in png_push_fill_buffer; it's a big bad obvious bug that must be fixed.

Here's my sequence which, as you can see, is identical apart from the 149 byte read and adds up to the actual length:

8+4+4+13+4+4+4+1+1+147+4 194

(That's just fprintf(stderr, "%lu+", length) ; easy to pass to bc to get the total.)

Ok, so it isn't in my 1.6.44-git (i.e. origin/libpng16 HEAD). So next is for you to checkout the origin/libpng16 branch and see if the problem goes away. I recommend using the github tool gh (it does require a little setup but it's worth it). You do this in your current clone of pnggroup/libpng:

gh pr checkout 522

It gives you what would actually be checked in. It is very important to test any pull release against a clean completely up-to-date clone (local branch) of the origin. For libpng thats configure/make check or cmake/make test or both for build system changes.

johnstiles-google commented 4 months ago

OK, I will try updating our version to latest code and see if the issue goes away. That would be a nice resolution and if so I'm happy to drop this PR.

That being said, as a counterpoint to this comment:

There's no point adding an extra check in png_push_fill_buffer; it's a big bad obvious bug that must be fixed.

Detecting internal errors as soon as possible is generally worth doing. If the caller is requesting more bytes than can be satisfied, and it's minimal cost to detect the issue and stop processing immediately, it seems like a good idea to do so. In a library that's used everywhere, a simple bounds check that can protect us from a CVE down the road seems valuable to me—especially since we already had a bounds check here, just an ineffective one.

johnstiles-google commented 4 months ago

(FYI: it's a weekend here, and I expect updating libpng will take some time; I don't expect to get to it until Monday.)

jbowler commented 4 months ago

The bug is in 1.6.37. png_handle_eXIf is calling png_crc_finish with an incorrect "length" parameter (length not length-2) when the byte-order check fails. Depends on an invalid first byte in the data.

CAUSE: No unit test! So no one actually tested that code path. EFFECT: Destroys the whole PNG stream (should error out on the next chunk). TEST: I'll attach a modified pr552.png which as an IEND chunk appended and a modified repro program which takes a test file (and is written in C).

DATES (I'm doing this quickly, I may be wrong):

Bug created by Glenn in 2017:

cf713fb0ab (Glenn Randers-Pehrson 2017-08-06 10:24:04 -0500 2082) png_crc_finish(png_ptr, length);

Bug fixed by Cosmin in (I think):

2733482d8e (Cosmin Truta 2022-09-14 22:00:42 +0300 2082) png_crc_finish(png_ptr, length - 2);

So "not repro" because it is fixed and no change necessary because the current code is adequate and consistent for any error of this type (chunk handler reading too many bytes).

There's a separate bug in the signature handling but I have to get a repro before suggesting a fix. It's not particularly important.

We have missing unit file tests; otherwise valid PNG files with an invalid eXIf chunk (invalid byte order specifier with valid CRC). If I can generate those I think I can add them easily to the exist pngtest tests.

johnstiles-google commented 4 months ago

Agreed—I just did the same analysis as you.

The actual bug fix actually happened a little sooner, in 2021:

https://github.com/pnggroup/libpng/commit/c4bd411c35290f9568da44a7faf7a09505db78db Fix decode fail on image with invalid eXIf chunk

Cosmin's 2022 change just simplifies the logic.

Thank you for adding the unit tests. I will get Chrome updated.

johnstiles-google commented 4 months ago

(Instead of png_error, is there a png_unreachable or something we could use to indicate an internal logic error in this code?)

jbowler commented 4 months ago

@johnstiles-google can you test the attached pr552-fixed.png against your current build (1.6.37 with or without your change):

pr552-fixed.png.txt

I changed the the repro prog to be in C and accept a file argument:

pr552.c.txt

It should fail with 1.6.37 and complain about a bad chunk name; the 'fix' to the original test is just an IEND chunk at the end (well, there's a spurious \n too but that doesn't matter.)

jbowler commented 4 months ago

(Instead of png_error, is there a png_unreachable or something we could use to indicate an internal logic error in this code?)

No, not in this version of libpng (IRC I added png_assert to 1.7). I don't want to do that because it produces a horrible situation where exactly the same libpng build with exactly the same file produces two different error messages depending on just the buffering of the input (with the progressive reader)!

The check also hardly ever works; your test file had both a broken eXIf chunk and truncation exactly at the end of that chunk. The same check can be performed more generally in png_read_data but why, what does that achieve? Just a marginally better error message which is normally not displayed (well, png_assert, IRC was disabled in release but still).

The whole architecture of relying on a complex chunk handler to get the read of the chunk data exactly right is broken; the fix is so validate individual reads against the actual data length and make png_crc_finish not take a length parameter at all. Probably a fail-safe internal change that could be done quite quickly.

johnstiles-google commented 4 months ago

I ran your "pr552-fixed.png.txt" and got this output. It did not hit the abort in png_push_fill_buffer.

[ RUN      ] PNGCodec.EnsureTruncatedPNGWorksInMSAN
png_push_fill_buffer: length=8, save_buffer_size=0, current_buffer_size=207 
png_push_fill_buffer: length=4, save_buffer_size=0, current_buffer_size=199 
png_push_fill_buffer: length=4, save_buffer_size=0, current_buffer_size=195 
png_push_fill_buffer: length=13, save_buffer_size=0, current_buffer_size=191 
png_push_fill_buffer: length=4, save_buffer_size=0, current_buffer_size=178 
png_push_fill_buffer: length=4, save_buffer_size=0, current_buffer_size=174 
png_push_fill_buffer: length=4, save_buffer_size=0, current_buffer_size=170 
png_push_fill_buffer: length=1, save_buffer_size=0, current_buffer_size=166 
png_push_fill_buffer: length=1, save_buffer_size=0, current_buffer_size=165 
png_push_fill_buffer: length=149, save_buffer_size=0, current_buffer_size=164 
png_push_fill_buffer: length=4, save_buffer_size=0, current_buffer_size=15 
png_push_fill_buffer: length=4, save_buffer_size=0, current_buffer_size=11 
png_push_fill_buffer: length=4, save_buffer_size=0, current_buffer_size=7 
*** png_error called ***
jbowler commented 4 months ago

Good, thanks. That matches what happens when I recreate the bug in 1.6.44. The error message is:

ND[00][00]: invalid chunk type

As I expected.

johnstiles-google commented 4 months ago

I've updated Chromium to libpng 1.6.43 and confirmed that the issue no longer reproduces. Thanks again for your debugging assistance.