openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
10.29k stars 2.1k forks source link

Rewrite common.c: isdec*() #2951

Closed solardiz closed 6 years ago

solardiz commented 6 years ago

On invalid input, the isdec*() functions in common.c may cause signed integer overflow, then try to post-check for it, but the overflow itself is already UB. Those functions should be rewritten to manually work on the strings of digits - just loop over them making sure the characters are in fact digits and are not too numerous, and optionally allow a leading minus sign where needed. If we introduce a limit of at most 9 digits in there, then it would become safe for the callers to call atoi() next (like many currently do).

solardiz commented 6 years ago

Sorry I couldn't resist. Untested, so at least I am leaving that to you.

int isdecu(const char *q)
{
        const char *p = q;
        do {
                if (*p < '0' || *p > '9' || p - q >= 9)
                        return 0;
        } while (*++p);
        return 1;
}

int isdec_negok(const char *q)
{
        if (*q == '-')
                q++;
        return isdecu(q);
}
solardiz commented 6 years ago

Or even:

int isdec_negok(const char *q)
{
        return isdecu(*q == '-' ? q + 1 : q);
}

Oh, and yes my isdecu() relies on NUL being outside of the range of '0' to '9' to detect empty string (this is also why the do/while loop, instead of a while loop), but that's valid.

seekaddo commented 6 years ago

Sorry for the late reply, I was at lectures. @jfoug Thanks for the feedback. I think prix increment in the if is not undefined behavior. ++i is the same asi += 1. I am reading ISO JTC 1. And it reads

6.5.3.1 Prefix increment and decrement operators

Semantics 2 The value of the operand of the prefix ++ operator is incremented. The result is the new value of the operand after incrementation. Theexpression++Eisequivalentto(E+=1)._ I know that something like this i = ++i is undefined.

@solardiz Thanks for the feedback, much shorter. Should I test this code and submit it for a preview later?

solardiz commented 6 years ago

@seekaddo What you found is no proof your usage of ++i is correct. (It is. But you don't prove it with that specific quote.) You'd need to read up on sequence points - where they are and where they are not, and why this is crucial in cases like yours.

Yes, you may test this code and then include it in a PR. You'll also need to add a #define for isdec (or do we replace all uses with isdecu? Jim, what do you say?) and a comment explaining these currently check for 9 decimal digits and isdec_negok() allows a leading minus sign. All of these additions go into common.h. Thanks.

seekaddo commented 6 years ago

@solardiz ok thank for the feedback and the support. I will do that.

solardiz commented 6 years ago

Jim, to address your concerns more specifically:

My reading is that it is undefined to use and make changes to the same variable within the same sequence point.

That's correct. But we don't do that here. || introduces a sequence point, and our check of the ++i expression against 9 is not another use of the same variable - it is use of the subexpression result.

I am not 100% sure adding the parenthesis changes or makes a new sequence point (I do not think it does).

It does not. But we have one due to ||.

jfoug commented 6 years ago

@seekaddo the problem (lack of problem per @solardiz ), is that you can not read and write to the same variable within the same sequence point. The results are not defined. This is undefined, because you can not specify when the read happens, vs when the write happens.

the classic 'example', is

x = 1;
x = ++x;   // note, just as undefined is x = x++;
printf ("Now x = %d\n", x);

Ok, what is valid. Is 1 valid? 2? something else? It really is not defined as to just WHAT the results are.

Now, if the code was

x=1;
++x;
x = x;
printf("Now x is %d\n", x);

everyone that can read c knows the value is 2. This is because we have added sequence points between the usages. Yes, the assignment when shown here seems silly. BUT this is likely what the author of the x = ++x; really wanted.

The funny thing is I did find this in code before:

x = ++x; (it was a much more complicated expression, I just reduced it. This was code that worked fine until we upgraded the compiler. Once we upgraded, that expression became this: x = x; The post decrement simply disappeared. that was an ugly bug to work out. The solution was to do this (and it worked everywhere

x = ( (x+1) ....);  // with the other longer expression.

The original compiler was delaying the increment, until after the entire expression was parsed, and assigned, then doing the increment. The newer version added some optimization that eliminated that post increment. The problem was also that the original person porting turned off warnings, and thus the compiler, which was being helpful, would have spotted the problem by the compiler being friendly and telling them, but they turned off warnings.

@solardiz thanks for the information. I was not sure the or did this or not. Would any logical boolean ops add sequence points? How would this work

if (x > 4 && ++x< 8) { do something }

I know that if we have 100% assurance of boolean short circuit, then x may or may not be incremented. Also, if the && is a sequence point, then there should be no problems with the expressions having any undefined behavior. But this really is smelling on the fishy side to me ;)

But I did find this:

Are logical operators are sequence points?
The standard reads: “Unlike the bitwise binary & operator, the && operator guarantees left-to-right
evaluation; there is a sequence point after the evaluation of the first operand. If the first operand
compares equal to 0, the second operand is not evaluated.” (There's a similar statement for logical OR.)
Aug 26, 2012
seekaddo commented 6 years ago

@jfoug Thanks for sharing.

seekaddo commented 6 years ago

@jfoug Still waiting for your response on this

You'll also need to add a #define for isdec (or do we replace all uses with isdecu? Jim, what do you say?)

solardiz commented 6 years ago

@jfoug What you found looks correct to me. In your example, ++x would only be reached if x > 4 is true. I too was wary of such constructions when initially writing JtR back in the 90s, which is why you might still find unnecessarily separated nested if statements in my ancient code. But now I happily use those to simplify code - e.g., if (p && func(p)) is a perfectly fine way to only call func(p) if p is non-zero or non-NULL, and treat p being zero or NULL the same as func() returning 0.

(BTW, our external mode language, while being C-like, currently lacks short-circuit evaluation. This is documented in doc/EXTERNAL as a difference from C and as something we might not preserve in future versions.)

solardiz commented 6 years ago

The issue as described in the first comment here is finally fixed. Thanks, @seekaddo!