Use macros from `limits.h` to prevent signed integer wrap-around warnigns

MisterDA commented 1 month ago

The code is currently correct since we use wrap-around semantics for signed integers (-fwrapv), but:

it's difficult to communicate that fact to static analyzers, which warn when computing the minimum integer with left-shifting 1 to the sign bit position (most-significant bit);
MSVC doesn't support wrap-around semantics, but historically hasn't optimized for this (so no harm), and might innocuously warn.

Using constants from <limits.h> instead allows for self-documenting code and silences these warnings.

Computing the minimum signed integer

From the standard (which I recall doesn't consider wrap-around semantics for signed integers):

The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. […] If E1 has a signed type and nonnegative value, and E1 × 2^E2 is can't be represented in the result type, then that is the resulting value; otherwise, the behavior is undefined.

The problem being that the result of 1 << CHAR_BIT * sizeof(int) - 1 to compute the minimum int can't be represented in the result type (it's 2^63, but the maximum is 2^63-1); without wrap-around.

Introduce the INTNAT_MIN macro to avoid independent re-definitions of this value.

Is a change entry needed? This also prevents warnings raised under Windows by clang-cl and improves code quality with MSVC.

(I might have confused undefined behavior with unspecified behavior, oh well)

NickBarnes commented 4 weeks ago

I'll review this.

MisterDA commented 3 weeks ago

This is all good, a clear improvement.

Thanks, I've rebased on trunk and added you as a reviewer.

xavierleroy commented 2 weeks ago

What about using INTPTR_MIN, INTPTR_MAX and UINTPTR_MAX unconditionally? OCaml's value type is, morally, intptr_t, even though it is not defined as such for historical reasons.

NickBarnes commented 2 weeks ago

What about using INTPTR_MIN, INTPTR_MAX and UINTPTR_MAX unconditionally? OCaml's value type is, morally, intptr_t, even though it is not defined as such for historical reasons.

This makes sense to me, and could remove the test for SIZEOF_PTR == SIZEOF_LONG etc in config.h. It does need <stdint.h>, but although I see that we use HAS_STDINT_H in config.h, I suspect that parts of the runtime wouldn't compile at all if <stdint.h> were not available.

While we're on the subject, it's surprising to me that we don't seem to have, or use, CAML_INT_MAX and CAML_INT_MIN (or similar names). Maybe this PR would be a reasonable time to introduce them?

xavierleroy commented 2 weeks ago

although I see that we use HAS_STDINT_H in config.h, I suspect that parts of the runtime wouldn't compile at all if were not available.

Right. is standard since C99, and OCaml 5 requires C11, so we should use unconditionally and remove the configure test for it.

MisterDA commented 2 weeks ago

What about using INTPTR_MIN, INTPTR_MAX and UINTPTR_MAX unconditionally? OCaml's value type is, morally, intptr_t, even though it is not defined as such for historical reasons.

While we're on the subject, it's surprising to me that we don't seem to have, or use, CAML_INT_MAX and CAML_INT_MIN (or similar names).

Two good suggestions. I've changed the definitions to use the {u,}intptr_t limits, and namespaced the macros with the CAML_ prefix. It's technically a breaking change to move from UINTNAT_MAX to CAML_UINTNAT_MAX, but opam grep UINTNAT_MAX doesn't return anything. Would you rather use INTPTR_MIN directly and not have CAML_INTNAT_MIN?

NickBarnes commented 2 weeks ago

What I meant about CAML_INT_MAX and CAML_INT_MIN was the max and min values of OCaml's int type. I find these are in fact currently defined in mlvalues.h as Max_long and Min_long (which I think are confusing names!).

MisterDA commented 3 days ago

I've rebased this PR.

What I meant about CAML_INT_MAX and CAML_INT_MIN was the max and min values of OCaml's int type. I find these are in fact currently defined in mlvalues.h as Max_long and Min_long (which I think are confusing names!).

I've introduced CAML_LONG_{MAX,MIN} macros replacing {Max,Min}_long. I think that LONG instead of INT is more consistent with the current naming. Should I retain the former names for compatibility? Are we convinced that this is a good idea?

NickBarnes commented 2 days ago

On reflection we shouldn't change Max_long or Min_long in this PR, and I regret suggesting it. Those names have been fixed for decades and there may be a lot of code out there using them (opam grep immediately finds base_bigstring for instance). If we did change them, or offer new alternatives, IMO it should be CAML_INT_MAX and CAML_INT_MIN, because they are the maximum and minimum values of the OCaml type int.

MisterDA commented 2 days ago

On reflection we shouldn't change Max_long or Min_long in this PR, and I regret suggesting it. Those names have been fixed for decades and there may be a lot of code out there using them (opam grep immediately finds base_bigstring for instance).

My thoughts also, I'll remove that commit.

If we did change them, or offer new alternatives, IMO it should be CAML_INT_MAX and CAML_INT_MIN, because they are the maximum and minimum values of the OCaml type int.

but on 64-bits arches, only Val_long maps to a 63-bit integer, right? not Val_int, which is cast'ed to (int).

ocaml / ocaml

Use macros from `limits.h` to prevent signed integer wrap-around warnigns #13083