windelbouwman / ppci

A compiler for ARM, X86, MSP430, xtensa and more implemented in pure Python
https://ppci.readthedocs.io/en/latest/
BSD 2-Clause "Simplified" License
337 stars 36 forks source link

ppci-cc: pointer arithmetic should not cast the integer expression to "int" #75

Open tstreiff opened 4 years ago

tstreiff commented 4 years ago

I found this when compiling the "bsearch" function in stdlib.

long li; char *ptr; ptr = ptr + li;

generates the following IR (with int=32bit, ptr & long=64bit)

ptr tmp_load = load alloca_addr_1; ptr value i64 tmp_load_2 = load alloca_addr; li value i32 typecast = cast tmp_load_2; ??? i32 num = 1; sizeof(char) i32 tmp = typecast num; (int32)li 1 ptr typecast_3 = cast tmp; (int64)(int32)li ptr tmp_4 = tmp_load + typecast_3; ptr + (int32)li store tmp_4, alloca_addr_1;

The most significant part of li is ignored. There should not be any i32 conversion involved in this case. "ptr + li" is equivalent to "(void)ptr + li sizeof(1)". Since li is long, the multiplication must be done with "li" size, ie 64bit.

This i32 conversion made me suspect another wrong case:

unsigned u; char *ptr; ptr = ptr + u; generates:

ptr tmp_load_7 = load alloca_addr_1;
u32 tmp_load_8 = load alloca_addr_6;
i32 typecast_9 = cast tmp_load_8;                 ? should not change anything
i32 num_10 = 1;
i32 tmp_11 = typecast_9 * num_10;
ptr typecast_12 = cast tmp_11;                     ??? oups, sign extension
ptr tmp_13 = tmp_load_7 + typecast_12;
store tmp_13, alloca_addr_1;

If u is greater than INT_MAX (anything over 0x7fffffff), it is binary unchanged when casted to int but is considered negative. When it it casted back to ptr, sign extension occurs and the (64bit) result does not point to the expected address. This case is subtle because, since "unsigned" are 32bit in our case, multiplication may stay in 32bit and could "legally" overflow (if sizeof(type) is not 1) before being added to the pointer. That's why it is recommanded to use size_t/ssize_t for indexing.

windelbouwman commented 4 years ago

I think this is now resolved by your work with the pointer arithmatic (which I copied over). Could we create a test case for this?

tstreiff commented 4 years ago

The cast to int is no longer applied. This is much better. But having pointers larger than ints is troubling...

Having 64bit pointers, when indexing an array of very large objects, I do not expect to have 32bit overflow when indexing the array with ordinary ints.

I have just checked the C standard and the solution is there: the value "returned" by sizeof is not "int", it is size_t. size_t is large enough to be able to index any array of any size on the target system. This implies its size is similar to pointer size.

Briefly, size_t (or its signed friend ssize_t) is the integer type I computed in the (pointer - pointer) code. On x86_64 (pointer64, int32), pointerT + intvalue will lead to the evaluation of: pointer to T(64bit) + (intvalue(32bit) * sizeof(T)(64bit)) The multiplication must therefore be 64bit (because at least one operand is 64bit), and there is then no risk of hidden overflow.