Security: LZ4_uncompress can crash on invalid input #2

GoogleCodeExporter commented 9 years ago

In LZ4_uncompress:
By making a specially crafted input, with a literal length tag equal to the 
value of -(op), it's possible to make 'cpy' have the value NULL.

cpy = op+length;  // cpy is now NULL

This will make this line crash:
LZ4_READ_LITTLEENDIAN_16(ref,cpy,ip); ip+=2;

Original issue reported on code.google.com by strig...@gmail.com on 5 Dec 2012 at 7:50

GoogleCodeExporter commented 9 years ago

Due to the way "length" is build, it can only be positive.

Now if it wraps around 32-bits (larger than (2<<31)-1), it could become 
negative.
Maybe by making it "size_t" instead of "int" it would reduce the issue.

Now, as a practical attacks, it seems quite improbable : "op" is unknown to the 
potential attacker, it could have any value. It's therefore completely 
improbable that it can generate a content which creates a length of exactly 
"-op" value (after wrapping beyond MAX_INT).

Original comment by yann.col...@gmail.com on 6 Dec 2012 at 1:27

GoogleCodeExporter commented 9 years ago

It does not have to be exactly equal to -op. 

For example if you make an input file that contains 9000000 0xFF and then some 
zeros, 'length' will wrap and will become a negative value, which can cause 
thngs to crash. 

When I tried yesterday I could easily make it crash by feeding it millions of 
0xff.

Original comment by strig...@gmail.com on 6 Dec 2012 at 1:31

GoogleCodeExporter commented 9 years ago

OK, so if this apply to an extended range of values, not "only -op", then it's 
a practical attack. I'll look into it.

Original comment by yann.col...@gmail.com on 6 Dec 2012 at 1:33

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

This crashes for me on windows 32-bit mode:

#include "lz4.h"

char output[20<<20];
char input[20<<20];
int main(int argc, char* argv[]) {
  input[0] = 0x0F;
  input[1] = 0x00;
  input[2] = 0x00;
  for(int i = 3; i < 16800000; i++)
    input[i] = 0xff;
  LZ4_uncompress(input, output, 20<<20);
  return 0;
}

Original comment by strig...@gmail.com on 6 Dec 2012 at 1:58

GoogleCodeExporter commented 9 years ago

Thanks for the sample. I'll look into it.

Original comment by yann.col...@gmail.com on 6 Dec 2012 at 2:00

GoogleCodeExporter commented 9 years ago

This program also crashes once in a while, depending on ASLR (Address Space 
Layout Randomization)

#include "lz4.h"
#include <stdlib.h>
#include <string.h>

char input[20<<20];
char output[20<<20];
int main(int argc, char* argv[]) {
  for(int j = 0; j < 10; j++) {
    int n = 16700000 + j * 50000;
    char *p = input;
    *p++ = 0xF0;
    memset(p, 0xff, n); p += n;
    *p++ = 0x0;
    LZ4_uncompress(input, output, 20<<20);
  }
  return 0;
}

Original comment by strig...@gmail.com on 6 Dec 2012 at 2:20

GoogleCodeExporter commented 9 years ago

Here's a possible fix for the first half of this bug:
        if ((length=(token>>ML_BITS)) == RUN_MASK)  { 
          len = *ip++, length += len;
          if (len == 255) {
            do len = *ip++, length += len; while (len == 255);
            // verify that cpy=op+length below will not wrap.
            if unlikely(op + length < op) goto _output_error;
          }
        }

I unrolled the code a bit, so this code is actually FASTER than the old code.

Original comment by strig...@gmail.com on 6 Dec 2012 at 10:59

GoogleCodeExporter commented 9 years ago

Also, it should be size_t.

size_t len, length;

Original comment by strig...@gmail.com on 6 Dec 2012 at 11:01

GoogleCodeExporter commented 9 years ago

Here's a possible fix for the second half of the bug:

        // get matchlength
        if ((length=(token&ML_MASK)) == ML_MASK) {
          length += *ip;
          if (*ip++ == 255) {
            do length += *ip; while (*ip++ == 255);
            // verify that "cpy = op + length + 4" below does not wrap.
            if unlikely(op + length + 4 < op) goto _output_error;
          }
        }

It's also fast in my benchmark.

Original comment by strig...@gmail.com on 6 Dec 2012 at 11:16

GoogleCodeExporter commented 9 years ago

That's impressive.
My dev station is currently down, but i'll repair it and give it shot a.s.a.p

Original comment by yann.col...@gmail.com on 7 Dec 2012 at 12:00

GoogleCodeExporter commented 9 years ago

Hi

I tried your suggested code,
but for some reason,
it did not result in performance improvement on my station.

In fact, it's the other way round : 
benchmark measure a loss, about 1-2%.

Original comment by yann.col...@gmail.com on 8 Dec 2012 at 10:33

GoogleCodeExporter commented 9 years ago

Another feedback;

I've switched to Visual 2012. 
I expect the optimisations of VC2012 and VC2010 to be different, which could 
explain why in my previous experiment, i witnessed a loss of performance 
instead of your reported win.

To my surprise, the loss of performance was larger under VC2012, at 
approximately -10%.

So now i'm wondering, maybe i'm doing it wrong.

In my experiment, i've replaced the line :
        if ((length=(token>>ML_BITS)) == RUN_MASK)  { size_t len; for (;(len=*ip++)==255;length+=255){} length += len; }

with your suggestion :
        if ((length=(token>>ML_BITS)) == RUN_MASK)  
        { 
          BYTE len = *ip++; length += len;
          if (len == 255) 
          {
            do len = *ip++, length += len; while (len == 255);
            // verify that cpy=op+length below will not wrap.
            if unlikely(op + length < op) goto _output_error;
          }
        }

Both are therefore followed by :
        cpy = op+length;
        if unlikely(cpy>oend-COPYLENGTH)
        {
            if (cpy != oend) goto _output_error;         // Error : not enough place for another match (min 4) + 5 literals
            memcpy(op, ip, length);
            ip += length;
            break;                                       // EOF
        }

Is that the way it is supposed to be implemented ?

Original comment by yann.col...@gmail.com on 10 Dec 2012 at 10:05

GoogleCodeExporter commented 9 years ago

This is the code I test with. You see my benchmark numbers in the comments, 
it's very close, but I'm surprised I don't get any negative effects from the 
added out of bounds check. I don't know why you get worse number. Maybe our 
CPUs are different.

#if 1
        // Running this three times gave the following: 2176MB/s 2184MB/s 2204MB/s
        if ((length=(token>>ML_BITS)) == RUN_MASK)  { 
          len = *ip++, length += len;
          if (len == 255) {
            do len=*ip++, length += len; while (len == 255);
            // verify that cpy=op+length below will not wrap.
            if unlikely((uintptr_t)op + length < (uintptr_t)op) goto _output_error;
          }
        }
#else
        // Running this three times gave the following: 2180MB/s 2166MB/s 2169MB/s
        if ((length=(token>>ML_BITS)) == RUN_MASK)  { for (;(len=*ip++)==255;length+=255){} length += len; }
#endif

Original comment by strig...@gmail.com on 10 Dec 2012 at 10:20

GoogleCodeExporter commented 9 years ago

Small comment (almost a note to myself) :
it's worth remembering that 
LZ4_compress() only authorizes input sizes <= ~1.9GB.

As a consequence, it's not possible for any copy size to be > 2GB.
So, if nonetheless such a value is encoded in the compressed stream,
then it must be a malformed/malicious data.

Original comment by yann.col...@gmail.com on 22 Dec 2012 at 7:24

GoogleCodeExporter commented 9 years ago

Second note to self :
the issue can only happen in 32-bits mode. 
It's not possible for the pointer value to wrap-around address space in 64-bits 
mode.
Which means the test can be restricted to 32-bits mode only.

Original comment by yann.col...@gmail.com on 30 Mar 2013 at 11:20

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Hi, I just disclosed a similar bug in another compression scheme to the Linux 
kernel team. Their implementation of LZ4 is also affected. If you could, please 
bump this to High priority because it can be controlled for precise memory 
corruption. This vulnerability can lead to code execution.

Original comment by d...@securitymouse.com on 20 Jun 2014 at 6:42

GoogleCodeExporter commented 9 years ago

Note that this bug cannot be used within Linux kernel.
It happens only in 32-bits mode, and only with blocks > 8MB.
Such situation is not possible for the version within Linux kernel, since it 
uses the Legacy file compression format, which has a hard block size limit at 8 
MB (the newer format has a hard limit at 4 MB).

For this bug to happen, you need to create a specific/custom format, generating 
compressed blocks > 8MB, in 32-bits mode. It's a very specific combination 
which happens very rarely, if ever. And more importantly, it does not happen 
within the Linux kernel implementation.

Nonetheless, this bug will be corrected within an upcoming version.

Original comment by yann.col...@gmail.com on 20 Jun 2014 at 9:07

GoogleCodeExporter commented 9 years ago

Latest LZ4 release within "dev" branch fixes this issue :
https://github.com/Cyan4973/lz4/tree/dev

Ludwig Strigeus' overflow test is now integrated into the fuzzer test tool,
and passed correctly by both 32-bits & 64-bits versions.

Regards

Original comment by yann.col...@gmail.com on 22 Jun 2014 at 10:46

GoogleCodeExporter commented 9 years ago

Original comment by yann.col...@gmail.com on 22 Jun 2014 at 10:47

Added labels: Priority-High
Removed labels: Priority-minor

GoogleCodeExporter commented 9 years ago

Fixed into r118

Original comment by yann.col...@gmail.com on 26 Jun 2014 at 9:50

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

Hi Yann,
Sorry communication went poorly on this issue. I did not see your responses to 
this flaw. Regardless, thank you for fixing the code. I appreciate it. 

Best,
D

Original comment by d...@securitymouse.com on 27 Jun 2014 at 4:57

GoogleCodeExporter commented 9 years ago

hum, yes it did.
A flaw in a largely deployed piece of code is not a "gold fame nugget". It's a 
risk for the community. I can only hope Oberhummer had enough time to design 
and deploy a fix, before the risk was exposed. This *should* be at the core 
value of a security firm.

My other grudge is that the risk regarding LZ4 was overblown in your article, 
even though you were explained it wasn't possible to use it within currently 
known implementations, due to block size limits. It's a good thing to correct 
this risk for future implementations, but it's not a good idea to make the 
world believe it is at risk *right now*, while it's not true. No known 
implementation get even near the block size required to trigger the risk, this 
is applicable to the Linux kernel version too.

Original comment by yann.col...@gmail.com on 27 Jun 2014 at 8:22

GoogleCodeExporter commented 9 years ago

I appreciate your perspective, Yann. But, I think you misunderstand how I've 
looked at this issue. I've worked in infosec for 15 years, I definitely 
understand the risks involved. 

I coordinated with every other team involved in this. I worked with each team 
to build patches (except for Markus, who developed his on his own). We 
coordinated a release, and that timeline was unfortunately broken by a public 
git push. Even if that didn't happen, the Linux kernel team (Linus made this 
very clear) merges security related fixes to mainline within 5 days of 
disclosure, no exceptions. So, they would have gone forward regardless. 

I think you're seeing this from an unfortunate point of view because we were 
unable to coordinate. There are two things that caused this:

1) I found the Linux kernel LZ4 bug when we were performing the LZO triage. In 
looking for other implementations, I found yours. I saw the bug report here and 
realized that this issue had already been set aside as a non issue. Seeing this 
was a year+ old it occurred to me that I may never hear from you, and since I 
already had the Linux team's ear, I went ahead and coordinated with them, 
building a patch for LZ4 with Greg.

2) Google Code requires me to sign in with my Google account to post on this 
bug report site. But, it doesn't alert me (as I presumed it would) when a 
response to one of my posts is made public. I presumed that like every other 
Google product it would alert me when I had a response in an interface. It did 
not, and I poorly presumed that you had ignored my post here. My apologies for 
that. 

So, yes, everyone was given enough time. Patches were all developed before my 
press and blog post. Things were coordinated appropriately, and FWIW even Solar 
Designer said that things went well with almost non-existent issues. 

I know you were upset, but please don't speculate on things you did not have 
insight into. I'm sorry you were left out of the loop, but I am only here to 
help you, not hurt you. 

Regarding LZ4 being overblown, you are right that implementations are not 
affected. If you would have finished reading my original blog post you would 
have noted that I stated this exact thing in my blog by referencing the ZFS 
128k block limitation. That isn't the point. The point is that the algorithm 
doesn't care about a size limit for buffers, and there is no documentation in 
the code to require a limit. This, just as with the LZO implementation, will 
eventually lead to people that misuse the API. That's just a fact of 
engineering. So, yes, it is a vulnerable algorithm and yes, it needed patching. 

My PoC demonstrates how the algorithm itself is vulnerable. It does not matter 
whether it is in kernel or user mode. The demonstration proves that the 
algorithm itself allows for random memory overwrite. Having written kernels for 
multiple architectures, and having written kernel exploits for everything from 
Plan 9, to AVR, to Linux and BSD, I think I get the difference :-) 

Thanks, Yann.
D

Original comment by d...@securitymouse.com on 27 Jun 2014 at 4:42

GoogleCodeExporter commented 9 years ago

By the way, you should give me a mailing address so I can apologize to you 
using my custom Lab Mouse Security Sympathy Card :-) At least we can resolve 
this with a smile and a laugh or two. Hopefully someday I can buy you a beer. 
https://www.etsy.com/listing/193821961/sorry-to-make-you-patch-sympathy-card

Original comment by d...@securitymouse.com on 27 Jun 2014 at 4:45

GoogleCodeExporter commented 9 years ago

OK, this is indeed a much better end.

As you have already guessed, the only elements I could perceive so far :
- someone letting a comment on an already opened issue
- don't give a damn to my answers
- publish just a few days later a big headline-grabbing blog post
- pretending to have found the bug
- and dressing it heartbleed-like (RCE!)

With such elements at disposal, the sole logical conclusion I could come up 
with was an irresponsible fame-grab attempt, even if it would put millions of 
users at risk.
Hence my immediate anger.

However, the scenario you are describing above is much, much better.
I could not guess it up to now, but if correction and deployment was indeed 
coordinated with the relevant parties involved before the article publication, 
then it's a much better standard of conduct, which I do respect.

Original comment by yann.col...@gmail.com on 27 Jun 2014 at 5:59

GoogleCodeExporter commented 9 years ago

Btw, it's more minor, but be informed that I don't agree with the general tone 
of your blog post article. It's not that the elements presented are technically 
wrong, but the weight of risk seems unfairly exaggerated.

It's correct to say there is a vulnerability, which requires some specific 
conditions to be usable. However, it's critical to underline that *no currently 
known program* using LZ4 today matches these conditions, including programs 
within the Linux Kernel.

So yes, I've seen your note about ZFS, but it's clearly not enough. It's almost 
hidden in the middle of your article. Most casual readers haven't reached it, 
nor understood the general consequences.
Just roam the forums and social networks, and you'll see most of them jump to 
conclusions on reading mostly the headline and the nice conclusion table : 
*current* programs using LZ4 or LZO are basically big security holes.

I can't comment on LZO variants, but for registered LZ4 programs, that's not 
true.
Yes, future programs may inadvertently fall into the pit, and as a consequence, 
the vulnerability must be patched (and has been patched btw).
But that's not correct to extend that logic to existing programs. Existing 
programs are safe, because they don't match the conditions to exploit the 
vulnerability.

And that's *extremely* important. This is *the* huge difference with a 
*heartbleed like* bug disclosure.

In conclusion, I love your latest description, which seems fair and correct to 
me :

"you are right that implementations are not affected. That isn't the point. The 
point is that there is no documentation in the code to require a limit. This 
will eventually lead to people that misuse the API. So, yes, it is a vulnerable 
algorithm and yes, it needed patching."

Yes. E-xac-tly.
I would love it to be understood this way by your readers.

Regards

Original comment by yann.col...@gmail.com on 27 Jun 2014 at 6:38

GoogleCodeExporter commented 9 years ago

Yeah, I think the the tone of the post was written from a different perspective 
than you have, which is why we ended up disagreeing again. 

You're looking at it from the point of the current threat surface within known 
implementations. I'm looking at it from the sole point of view of the library 
function's vulnerability, and the potential for abuse in the future. 

The ratings were all given fair levels of importance based on how the functions 
themselves can be exploited, not based on the implementations. This is because 
I do not know - nor does anyone know - how each implementation is used, or 
whether it is used correctly. So, I only spoke from a perspective of the 
information I could verify. This is why the Linux kernel LZO RCE was deemed 
"Impractical" in the blog post and in the bug report. I can prove DoS using the 
bug, but can the library function be instrumented alone for RCE? No. At least, 
not by me. 

I put things into the context that was fair for the release, as I refuse to 
conjecture about what systems may or may not be vulnerable. Also, by describing 
the result of exploiting the function itself, I give each engineer implementing 
a use case of the function the power to know how and what will result if the 
function is used in a way that exposes the application to risk. That is useful 
information that can help significantly reduce the threat surface of a target 
application. 

As far as LZ4 variants, it turns out a lot of A/V utilities automatically 
unpack LZO and LZ4 compressed files. Have you verified that they are all using 
the block sizes correctly? I think that there are probably a lot of 
implementations out there that are not using the APIs correctly because they 
can't. 

A/V is a great example of this because they have to be able to decode LZ4 
payloads that are potentially mangled on purpose to bypass A/V checks. This 
might be a great place to look for potential fixes. 

Regardless, I am very happy to see that we are starting to see eye to eye. 
Thanks for taking the time to talk with me here. I appreciate it a lot :-)

D

Original comment by d...@securitymouse.com on 27 Jun 2014 at 7:14

GoogleCodeExporter commented 9 years ago

If I do understand the situation correctly, A/V unpacking automatically LZ4 
files must use the official documented LZ4 format (otherwise, they would not be 
able to read the file).

This format is safe, because it limits the maximum block size to 4 MB (8 MB if 
legacy format). To be fully safe, the decoder mush also check block size 
condition before loading the block, but this is a necessity, due to memory 
allocation management. And this is btw properly done within the reference code 
lz4io.c.

Of course, I can't guarantee they are using the reference file reader code. Nor 
can I guarantee that they use the reference algorithm lz4.c either. But then, 
potential vulnerabilities are endless.

Original comment by yann.col...@gmail.com on 27 Jun 2014 at 7:34

xiaolu / lz4

Security: LZ4_uncompress can crash on invalid input #2 #52