Closed pnwamk closed 5 days ago
Are there considerations to support overflow detection?
Are there considerations to support overflow detection?
I'm not aware of specific considerations here as of yet. The tinkering so far has focused on efficiency, avoiding signed integer-related UB we may inherit from C, and just general correctness.
Does anyone know: is there any prior art in this space for the UIntN
types? (If no, would such considerations equally apply to both UIntN
and IntN
types? And, do we want such considerations to be a part of an initial IntN
addition/PR, or a part of a follow-up feature/enhancement?)
The usual semantics for unsigned types are to wrap around, without overflow. (However, add with carry/subtract with borrow/extended multiplication are often supported.) The overflow semantics for signed types are quite different; they are well supported in LLVM and as builtins in major C compilers. So support is available but might require compiler support.
The usual semantics for unsigned types are to wrap around, without overflow. (However, add with carry/subtract with borrow/extended multiplication are often supported.) The overflow semantics for signed types are quite different; they are well supported in LLVM and as builtins in major C compilers. So support is available but might require compiler support.
The proposed Int8
implementation linked in the RFC uses the corresponding unsigned operations where possible. This gives them predictable wrap around semantics and avoids any UB from signed integer overflow, e.g. (127 + 1) : Int8 = -128
.
For the sign-sensitive operations, it specifically checks for and avoids the problematic cases which would trigger signed integer overflow (e.g., div and mod explicitly check the few cases where that can happen, so you get predictable albeit perhaps initially odd results like(-128 / -1) : Int8 = -128
instead of UB).
Perhaps I misunderstood what you meant by "support overflow detection". The proposed design explicitly avoids signed integer overflow in the C sense and values will wrap around, but it doesn't attempt to dynamically detect specific cases at runtime in order to warn/error for the user.
The runtime detection (with catch support?) is what I'm asking about.
Just to be clear: I'm just asking whether this is currently considered, I don't think this is important for basic signed integer support.
This seems closely related to the Std.BitVec
type, which implements a signless approach to bitvectors.
For example, it has Std.BitVec.udiv
for unsigned division and Std.BitVec.sdiv
for signed division.
Ultimately, though, the BitVec
is just a wrapper around a Nat
, just like the proposed IntN
being a wrapper around UIntN
.
Things like overflow detection and such might make more sense to implement on BitVec
, and are in fact on my roadmap for them already. The proposed IntN
types seem more suited for when you want to squeeze out every last bit of performance, and thus probably don't want to pay the cost of overflow detection.
This seems closely related to the Std.BitVec type
Yes, that code was helpful to reference =) I just added a Related Work
section to the RFC I should have included initially acknowledging the related implementations I'm aware of.
Things like overflow detection and such might make more sense to implement on BitVec ... the proposed IntN types seem more suited for when you want to squeeze out every last bit of performance
I am increasingly curious if Lean could reasonably help a user avoid/detect overflow while using these sorts of "primitive" UIntN
and IntN
types... but I think that may be a topic to dive into separately (at least based on the fact that the UIntN
types today also don't feature this). Happy to hear disagreement here though =)
C/C++ make signed integer overflow undefined but efficient tools are generally available.
LLVM support: https://llvm.org/docs/LangRef.html#llvm-sadd-with-overflow-intrinsics Clang support: https://clang.llvm.org/docs/LanguageExtensions.html#checked-arithmetic-builtins GCC support: https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html
A generic operation would have to trap overflow by checking the inputs.
This is now implemented through #5790, #5885 and #5961. Additionally UIntX
was refactored to use BitVec
instead in #5323 so IntX
can now make use of the BitVec
primitives that we already put in a lot of work over the course of this year.
To the best of our knowledge we have addressed all UB concerns by both mimicking semantics of BitVec
in C and enabling the -fwrapv
option in the compiler such that overflow etc. is now defined behavior.
Additional things such as proper proof APIs for both UIntX
and IntX
as well as things like catching overflows are future work items that are not part of this RFC afaict so I'm closing this as completed.
Awesome!! This is very useful, thank you :smiley:
Proposal
Lean currently supports unsigned fixed-width integers (
UInt8
,UInt16
,UInt23
,UInt64
).This proposal is regarding similar support for signed fixed-width integers that provide similar efficiency benefits to the
UIntN
types (e.g., native C representations).Design Points for discussion
Semantics for overflow and avoiding UB.
UIntN
types.Fin
andBitVec
in Lean4 today and provide predictable wraparound behavior on overflow (there is a slight bias for these semantics in this RFC at the moment), or ii. choose to detect and panic on such occurrences (perhaps similar to Rust in debug mode?)C representation
intn_t
types could make for the least surprising set of C definitions/FFI APIsuintn_t
types could also be made as they are already used by the compiler for some typesWhere would
IntN
definitions ideally live in the short and long term?IntN
types or similar?Approaches
Approach 1:
UIntN
SubtypesA
UIntN
is used to represent eachIntN
type, e.g. here is Int8:Efficient Representation
This approach already provides an efficient representation in compiled code with no compiler extensions by being defined directly in terms of
UInt8
. I.e., all compiled definitions operating on theseInt8
s will useuint8_t
directly. Note that this does mean that FFI code will need to manually track which generated C definitions are usinguint8_t
forInt8
and which are usinguint8_t
for some other type (e.g.,UInt8
,Bool
, etc).Operators
Arithmetic operations which are "sign-insensitive" can be defined directly in terms of their corresponding
UInt8
operation, e.g. Int8.add:The remaining few sign-sensitive operations can be given the appropriate implementation in Lean (and backed with a more efficient C representation where needed), e.g., Int8.div:
and the corresponding C definition lean_int8_div:
Here's an initial draft for
Int8
for feedback here if we want to use this approach:Potential pros
Int8
ops simply use the sameUInt8
opUInt8
already provides many appropriate C definitions that are re-usedPotential cons
IntN
types will all be in terms ofuintn_t
types and notintn_t
types. The casts between auintn_t
andintn_t
type are free and semantically correct. The book keeping to track whichuintn_t
means what in the FFI/C layer, however, could be painful and a common Lean4/FFI footgun.Approach 2:
Fin
SubtypeA
Fin
is used to represent eachIntN
type, e.g. here is Int8:Efficient Representation
This approach would require a compiler extension like what exists for
UIntN
types to use the expectedintn_t
C type. This would obviously mean more up front work, but the resulting compiled C/FFI code would be in terms of the expected signed integer type, which seems likely to be the least surprising choice for a user to encounter.Operators
Arithmetic operations which are "sign-insensitive" can be defined directly in terms of their corresponding
Fin
operation in Lean and given the expected efficient representations in C, e.g. Int8.add:The remaining few sign-sensitive operations can be given a more complex definition when appropriate in Lean (along with a more efficient C implementation where needed), e.g., Int8.div:
and the corresponding C definition lean_int8_div:
Here's an initial draft for
Int8
usingFin
for feedback here if we want to use this approach:Potential pros
Fin
may be "more direct" or "more natural" than defining them in terms of unsigned fixed-width integers (debatable of course!)Potential cons
UIntN
Approach 3:
Int
SubtypesEach
IntN
type could also be defined as a subtype ofInt
similar to the following:Zulip discussion steered me away from this and towards the aforementioned UIntN approach, so less experimenting has been done with this approach.
Potential pros
Potential cons
UIntN
approach.Community Feedback
The lack of signed fixed-width integer types has come up in discussion with Lean FRO members and has been discussed some in the lean4 dev community. It was also noted as a pain point in the community when working with external code during Lean Together 2024.
Thank you to those who have helped in discussing this topic so far.
Impact
Add :+1: to issues you consider important. If others benefit from the changes in this proposal being added, please ask them to add :+1: to it.
Related Work
The following also involve supporting signed fixed-width integer types:
Nat
Int64
impl using the same proposed design of wrapping an unsigned int (USize
instead of thanUIntN
). The proposedInt8
impl is similar with a few slight differences (e.g., fewer bitwise operations, leverages a few C definitions for efficiency, etc)Std.Data.BitVec