s-yata / marisa-trie

MARISA: Matching Algorithm with Recursively Implemented StorAge
Other
507 stars 89 forks source link

Bug hunt: Pointer underflow behavior #8

Closed mikepb closed 8 years ago

mikepb commented 8 years ago

Following up from #7 marisa::grimoire::trie::ReverseKey takes advantage of pointer underflow to simplify the code. Unfortunately, it seems that the pointer underflow behavior is undefined. Clang, for example, exhibits a different behavior than GCC.

A quick search suggests that relying on this behavior opens up marisa-trie to compiler-specific bugs and security exploits.

The purpose of this issue is to track these consequences and discuss possible fixes.

From Show at Stack Overflow on the behavior in C++:

The standard specifies in §5.7/4 that:

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. [...] Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behaviour is undefined.

A similar quote can be found for subtraction in §5.7/5. So given that overflow and underflow are special cases of pointers that exceed the bounds of object to which they originally point to, the behaviour would simply be undefined.

From rampion at Stack Overflow on the behavior in C99:

From section 6.5.6 of the combined C99 + TC1 + TC2 (pdf):

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

Some resources:

s-yata commented 8 years ago

The commit cbab26f05f92313e72f4a58262264879bdb37531 fixes this.

mikepb commented 8 years ago

You're too quick :+1: Thank you!