Open andrewrk opened 7 months ago
unless im misunderstanding, this only covers float -> int right? std.math.lossyCast
does more than that
imo this crosses the line of "zig reducing the things one must remember" and an @intFromFloat
should be required.
eg const y3: i32 = @intFromFloat(@round(x));
isnt that much more to type and passes today.
All of these functions would be defined to saturate to the destination integer type, when the operand exceeds the maximum or minimum value.
Would there be a fallback for the case when one doesn't care about operands exceeding the range (for performance reasons), e.g. when one is certain that the range won't be exceeded, and/or one can tolerate "garbage" integer values produced by out-of-range conversions? E.g. in the optimized float mode maybe? Or is one supposed to use asm
blocks in such cases?
- @round: Returns the nearest integer to the operand, away from zero.
The wording looks confusing here to me. All other options speak about the direction in which the nearest integer should be looked for. The rounding speaks about the direction in which to resolve the ties.
Additionally it might be desired to have rounding where one doesn't care in which direction ties are resolved (e.g. x86 processors resolve them towards nearest even, IIRC, although this mode is a bit processor-state sensitive, still highly desired for performance-critical code). Or again, is one supposed to use asm
blocks?
Would there be a fallback for the case when one doesn't care about operands exceeding the range (for performance reasons), e.g. when one is certain that the range won't be exceeded
@intCast
, and/or one can tolerate "garbage" integer values produced by out-of-range conversions?
@truncate
E.g. in the optimized float mode maybe?
no
Or is one supposed to use asm blocks in such cases?
no
@round
et al. returning an int of a RLS-inferred type might make sense in situations where the value is known to be in the range of the integer type, but it's not immediately obvious that it should clamp the value to the range of the integer. It's also not obvious what should happen if the value is a NaN (should it invoke safety-checked UB? Should it be treated as 0?).
I agree with @nektro that something like const y: i32 = @intFromFloat(@round(x))
reads and communicates intent better. But @intFromFloat
invokes safety-checked UB if the result doesn't fit in the destination type, so then we would either need to change this behavior or add a separate saturating cast builtin that could be used like @saturate(@round(x))
to achieve the intended result.
Thinking out loud, but on the subject of communicating intent precisely, I would even suggest turning @intFromFloat
ing a float value with a non-zero fractional part into safety-checked UB and require the use of @round
/@floor
/@ceil
/@trunc
for clarity. @intFromFloat(@trunc(x))
would be equivalent to a bare @intFromFloat
today and makes it crystal clear to a reader how the fractional part will be handled, even if they come from a different language that rounds float-to-int casts differently.
This would also form a nice parallel to how coercing a comptime_float
to an int type is a compile error when the float value has a non-zero fractional part.
@saturate
could be the saturating alternative to @intFromFloat
, still requiring the value to have a zero fractional part and thus mandating usages like const y: i32 = @saturate(@trunc(x))
for clarity. Converting a NaN this way could either be safety-checked UB or return 0, I don't know which makes more sense. Infinities feel like they make sense to convert to std.math.min/maxInt(T)
.
(As a secondary use, a @saturate
builtin could perhaps also be used to clamp an int value of a wider integer type to the bounds of a narrower type.)
Maybe I'll think about it some more and formulate a more concrete counter proposal later.
It's also not obvious what should happen if the value is a NaN (should it invoke safety-checked UB? Should it be treated as 0?).
Treating it as 0 would be incorrect.
I would even suggest turning
@intFromFloat
ing a float value with a non-zero fractional part into safety-checked UB
The Language Reference says:
If the integer part of the floating point number cannot fit in the destination type, it invokes safety-checked Undefined Behavior.
Edit for clarity: I think your suggestion harmonizes quite well with making these four builtins - which all have to do with choosing how to resolve fractional values during a transformation to integer - become the preferred way to convert floats to integers.
To be clear this issues is already accepted. You can add arguments to change my mind but so far nobody has said anything convincing.
All of these functions would be defined to saturate to the destination integer type, when the operand exceeds the maximum or minimum value.
Why saturate? This is inconsistent with @intCast
and @intFromFloat
which define values outside the range to be safety-checked UB.
What about:
Attempting to convert a number which is out of range of the destination type results in safety-protected Undefined Behavior.
And defining @saturate
for numerical types to be:
-Inf
and +Inf
clamp when converting float to int.NaN
triggers safety-protected UB when converting float to int.When converting a float to an int, there are two interesting questions with regard to behavior that need answering:
My main point was that the @round
family of builtins offer a natural answer to the second question, but not the first. If @round
is updated to support returning a value of an int type, what makes clamping to the range of the int type the obvious default behavior? It feels a bit malplaced and like it's trying to cram too much functionality and edge case handling into the same builtin.
Is the rationale that @round
meaning "round toward the nearest representable integer" implies clamping 300.4 to 255 (u8
) because 255 truly is the nearest representable integer under those rules and in that context? That logic checks out for @round
and @trunc
("toward zero"), but if we're being pedantic it doesn't work for @ceil
("toward +inf") because 255 is not the nearest integer in that direction (in fact there's no representable integer in the +inf direction). A similar case can be made for @floor
with negative floats beyond range of the target type.
If anything, if there's going to be an int type-returning @round
, wouldn't the more natural behavior be to invoke safety-checked UB for out-of-range values, just like @intFromFloat
does today?
Regarding the argument about the friction of int-to-float conversions, assuming @round
is defined to return a RLS-inferred type, this will only trade one type of friction for another, because code like
const x: f32 = 123.45;
const y = @round(x); // error: @round must have a known result type
const z = 1.5 * @ceil(x) + 10; // error: @ceil must have a known result type
will now require explicit result types (quickly devolving into @as
hell). So then it becomes a question of which case is more common and more worthy of reduced friction, rounding to the same float type or converting to an int? I'd wager that rounding floats while having them remain as float types is the more common use case.
I noticed the edits to the main issue description that were briefly there before being reverted and I don't really agree that const y: i32 = @saturate(@round(x))
would that much more cumbersome than const y: i32 = @round(x)
. Both cases still require the user to provide an explicit result type, either via an intermediate variable or with @as
, which is by far the most friction-inducing part.
Language semantics wise there doesn't have to be anything special about chaining these builtins (unlike @constCast(@alignCast(@ptrCast(ptr)))
which is special-cased as a single logical operation); if @round
stays exactly as it works today, then @saturate(@round(x))
is just rounding a float to a whole number (still a float type) followed by a clamping conversion of a float with a known zero fractional part to an int type, as two separate logical operations. Any special casing, optimization or elimination of unneeded safety checks would just be a compiler implementation detail but have no impact on semantics.
So, for the purposes of realtime performance-critical code, especially on Intel processors, which have limited support, if any, of the proposed functionality... (Some of the questions below now have proposed answers in the just posted previous comment by @castholm, but most of the questions still hold, I believe).
How would one explicitly avoid the saturation and safety checks (in case one is sure all incoming floating point values are in-range and/or is willing to accept the potential garbage)? Like this?
const f: f32 = getSomeF32Value();
const i: i32 = @round(@truncate(f));
(so @truncate
would need to detect that it's being used in the context of float-to-int rounding). Or is it the opposite:
const f: f32 = getSomeF32Value();
const i: i32 = @truncate(@round(f));
(so @round
would need to detect that it's being used inside @truncate
). I wouldn't be actually sure how to read the latter. What is the type of the value returned by @round()
in the latter case? Is it still float? Or is it an unsafely-converted int?
How would one explicitly avoid saturation but keep safety checks (in case one is certain that all incoming floating point values are in-range)? Like this?
const f: f32 = getSomeF32Value();
const i: i32 = @intCast(@round(f));
(so technically @round
would need to detect that it's being used inside @intCast
)
In realtime code it seems imperative to me to have a counterpart of C/C++'s lrint
function (and possibly rint
) one way or another. AFAIK Intel processors do not support rounding with tie away from zero (as Zig's @round
is required to do), only tie to even. Furthermore, depending on processor generation, even the choice of the rounding mode can be prohibitively expensive, hence one needs to round "accoring to the current rounding mode", exactly like lrint
does it, if one wants to do rounding in one processor instruction or so. Lacking this option, I'm afraid the use of asm blocks for rounding would be the only option in realtime code which has to run on Intel.
To sum up this third point: practically, often one can accept variations in the tie resolution mode, while tradeoffs in rounding performance might be unwanted.
The clamping behavior the proposal suggests for these builtins is non-obvious and does not follow from the names at all. Having four different builtins for the exact same behavior on integers blatantly contradicts the Zen: "Only one obvious way to do things".
The @saturate
replacement does not suffer from these problems, although, I would prefer naming it @clampCast
because it is consistent and obvious, since @clampCast(...)
ought to be equivalent to @intCast(clamp(..., minInt(RT), maxInt(RT))
.
Andrew seems to be okay with the visual noise introduced by @as
and @intFromFloat
/@floatFromInt
. I imagine he wouldn't be opposed to a similar mechanism with @clampCast
.
@ni-vzavalishin, note that @truncate
drops the high bits from integers, while @trunc
truncates floats rounds towards zero.
How would one explicitly avoid the saturation and safety checks (in case one is sure all incoming floating point values are in-range and/or is willing to accept the potential garbage)?
I imagine this would still work.
const f: f32 = getSomeF32Value();
const i: i32 = @intFromFloat(x);
I wouldn't be actually sure how to read the latter.
I believe that should be a compile error in this proposal, since @round
doesn't have a result type here.
I imagine this would still work.
@ni-vzavalishin, note that @truncate drops the high bits from integers, while @trunc truncates floats.
I did mean @truncate
in my post, as it was about discarding the information (although not exactly the high bits) and not about rounding towards zero.
Why not just make a single builtin that specifies behaviours explicitly, considering that it's basically two separate steps/operations being shoehorned into one?
fn intFromFloat(
x: anytype,
comptime underflow: enum{
closest, // round
upward, // ceil
downward, // floor
inward, // trunc
outward, // fill
},
comptime overflow: enum{
clamp, // saturate
wrap, // truncate
}
) T
a simple test shows that it still does the safety checks, runtime and/or comptime (comptime checks might be less of an issue in this respect I believe, but runtime ones are)
Where are you getting that from? This doesn't emit any runtime checks in ReleaseFast:
https://godbolt.org/z/eKhTWY81z
If you mean that it emits safety checks in debug mode, that is by design. All illegal behavior that can be checked in safe code should be checked in safe code. In real-time code, in an otherwise safe program, you can use @setReleaseSafety(false)
to explicitly disable safety checks.
I did mean @truncate in my post, as it was about discarding the information (although not exactly the high bits) and not about rounding towards zero.
I don't understand.
If you mean that it emits safety checks in debug mode, that is by design
For integers one can bypass safety checks completely, including debug mode, by using @truncate
. It may be necessary/desired in certain edge cases to do the same for various float to int conversions, including debug mode, although I agree such edge cases are more difficult to imagine (but I could imagine some).
I don't understand.
When I asked how would one completely bypass safety checks in the float to int conversions proposed here, Andrew answered "@truncate
". This is why I used @truncate
in my examples, because I wasn't sure how @truncate
is supposed to be used for this purpose. I was vaguely guessing that maybe its semantics is supposed to be extended from purely integers to intermediate float values. When you suggested that I might have been confusing @truncate
with @trunc
my answer was that I wasn't, @trunc
has nothing to do (IIUC) in places where I want rounding rather than truncating behavior.
For integers one can bypass safety checks completely.
Yes, that's because @truncate
is a well-defined operation that is different from @intCast
, unlike @floatFromInt
. Consider MIPS, where @intCast
can save one instruction over @truncate
(but doesn't, because LLVM IR doesn't ever contain the assumption that the high bits are zero)
It may be necessary/desired in certain edge cases to do the same for various float to int conversions
Such as? cvtss2si
, for example, generates an exception on failure, or returns all ones if that exception is masked.
The need to bypass the safety checks that I'm referring to, doesn't come from intentional programming, but from having to deal with mistakes historically entangled into the code and/or data sets, including the ones coming from 3rd parties. In that respect, whatever safety checks might be missing in release builds, one might occasionally need to be able to disable, on a case-by-case basis, in debug builds, simply to be able run debug builds at all. I'm not sure what's Zig's official standpoint on this. So yes, e.g. garbage returned by cvtss2si
might be a desired scenario.
cvtss2si
might not return garbage, and even if it does because you masked the exception, that isn't portable. If you know that a value will be in range, @floatFromInt
. If you can't guarantee that, check for it, as with the proposed solution. If you want guaranteed garbage, use inline assembly.
In the following code:
const foo = @round(x);
return baz(foo);
Would foo
's type be resolved by the call to baz
? Or would foo
need the explicit result type?
I believe it's the latter.
I'm splitting my counter proposal up into two parts. The first part directly addresses the original use case of lossily converting a float to an int and can be evaluated and implemented in isolation. The second part is not explicitly related to the original use case, and after thinking a lot about it I'm not sure how I feel about it, but it extends and synergizes with the first part and touches upon rounding.
@saturate
for clamping conversion to intThe @saturate
builtin is introduced, which can be used to convert an integer or a float to an integer.
@saturate
@saturate(int_or_float: anytype) anytype
Converts an integer or float to the inferred integer result type, clamping the value between the minimum and maximum representable values of the destination type.
For integer arguments, this conversion is always safe.
For float arguments, the integer part of the value is converted. If the value is NaN, safety-checked Undefined Behavior is invoked. Infinities are clamped.
For integer arguments,
const y: T = @saturate(x)
would be equivalent to
const y: T = @intCast(@min(@max(x, @max(std.math.minInt(T), std.math.minInt(@TypeOf(x)))), std.math.maxInt(T)))
For float arguments,
const y: T = @saturate(x)
would be equivalent to
const y: T = @intFromFloat(if (std.math.isNan(x)) x else @min(@max(x, std.math.minInt(T)), std.math.maxInt(T)))
For integers, @saturate
pairs well with @truncate
and forms a nice counterpart to +|
saturating and +%
wrapping operators.
For floats, just like @intFromFloat
, @saturate
would implicitly round any values toward zero (equivalent to @trunc
) before converting. Users who desire different rounding behavior should explicitly round the value using @round
, @floor
or @ceil
before passing the result to @saturate
.
std.math.lossyCast
can be deleted in favor of @floatFromInt
, @floatCast
, @saturate
or regular coercion, depending on destination and source types. The only unsupported case is converting NaN to an int, which users will need to handle explicitly.
test "@saturate int, positive, in bounds" {
var x: u32 = 37;
_ = &x;
const y: u8 = @saturate(x);
try expectEqual(37, y);
}
test "@saturate int, negative, in bounds" {
var x: i32 = -37;
_ = &x;
const y: i8 = @saturate(x);
try expectEqual(-37, y);
}
test "@saturate int, positive, out of bounds" {
var x: u32 = 999;
_ = &x;
const yu: u8 = @saturate(x);
try expectEqual(255, yu);
const yi: i8 = @saturate(x);
try expectEqual(127, yi);
}
test "@saturate int, negative, out of bounds" {
var x: i32 = -999;
_ = &x;
const yu: u8 = @saturate(x);
try expectEqual(0, yu);
const yi: i8 = @saturate(x);
try expectEqual(-128, yi);
}
test "@saturate float, positive, in bounds" {
var x: f32 = 89.7;
_ = &x;
const y: u8 = @saturate(x);
try expectEqual(89, y);
}
test "@saturate float, negative, in bounds" {
var x: f32 = -89.7;
_ = &x;
const y: i8 = @saturate(x);
try expectEqual(-89, y);
}
test "@saturate float, positive, out of bounds" {
var x: f32 = 1234.56;
_ = &x;
const yu: u8 = @saturate(x);
try expectEqual(255, yu);
const yi: i8 = @saturate(x);
try expectEqual(127, yi);
}
test "@saturate float, negative, out of bounds" {
var x: f32 = -1234.56;
_ = &x;
const yu: u8 = @saturate(x);
try expectEqual(0, yu);
const yi: i8 = @saturate(x);
try expectEqual(-128, yi);
}
test "@saturate float, positive infinity" {
var x: f32 = std.math.inf(f32);
_ = &x;
const yu: u8 = @saturate(x);
try expectEqual(255, yu);
const yi: i8 = @saturate(x);
try expectEqual(127, yi);
}
test "@saturate float, negative infinity" {
var x: f32 = -std.math.inf(f32);
_ = &x;
const yu: u8 = @saturate(x);
try expectEqual(0, yu);
const yi: i8 = @saturate(x);
try expectEqual(-128, yi);
}
If @saturate
working for both ints and floats is undesired, it could be split into two builtins: @saturateInt
for int-from-int and @saturateIntFromFloat
for int-from-float.
@truncate
be used for wrapping conversion from floats, for feature parity with @saturate
?Maybe. I played around briefly and I think you might be able to implement an int-from-float @truncate
as
fn truncateIntFromFloat(comptime T: type, x: anytype) T {
const U = std.meta.Int(.unsigned, @bitSizeOf(T));
const y: T = @bitCast(@as(U, @intFromFloat(@rem(@abs(x), std.math.maxInt(U) + 1))));
return if (x < 0) -%y else y;
}
but I don't know if there would be precision problems with the %
remainder operation for very large float values, nor do I know if it would work for non-power of 2 size integers.
Edit: I wrote a quick program that exhaustively tests all integer values in the i64
range representable by f32
and the math seems to check out for signed and unsigned 7, 8, 9, 11, 12 and 13-bit ints (these were the only ones I tested). The algorithm is functionally equivalent to as if the float value is cast to an infinitely wide int type, then truncated to a smaller destination int type, so it should be suitable for implementing @truncate
for conversion from float to int.
It would be logically sound for @truncate(±std.math.inf(f32))
to return 0, because the distance between each consecutive representable float value is always a power of 2 which continuously doubles in size as we move toward infinity. The step size will eventually become a multiple of std.math.maxInt(U) + 1
and thus always have the algorithm return 0 beyond a certain point.
WebAssembly defines instructions like i32.trunc_sat_f32_u
, which behave exactly as how @saturate
is defined above with the exception of NaN, for which WebAssembly defines that saturating conversions should return 0. So given that there is prior art for handling NaN by returning 0, it might be worth at least considering going with the same behavior, even if it feels "incorrect". A benefit of handling NaN this way is that it would make @saturate
a completely safe operation for both int and float arguments, making it a super low friction method of converting floats to ints.
Edit: I later discovered that LLVM also defines llvm.fpto[us]i.sat.*
to return 0 for NaN. So it's not just WebAssembly that seems to think that returning 0 is sensible.
@intFromFloat
and @saturate
more strict by disallowing arguments with fractional parts, requiring explicit rounding@intFromFloat
and @saturate
both convert the integer part of the value, discarding the fractional part. In other words, values are always rounded towards zero before converting, which means that @intFromFloat(x)
is equivalent to @intFromFloat(@trunc(x))
.
Rounding toward zero when converting a float to an int is the convention in most programming languages, but there are some odd languages that use different rounding methods for int-from-float conversions (the first one to come to mind for me is PowerShell, which rounds toward nearest, ties to even). And IEEE 754 defines five integer rounding operations (nearest ties to even, nearest ties away, toward zero, toward -infinity and toward +infinity), so there is nothing intrinsic about floats that says that rounding toward zero should be the default.
What if @intFromFloat
and @saturate
were changed to make no assumptions about rounding at all, by making converting floats with fractional parts safety-checked undefined behavior, thus requiring users to explicitly round any non-integer arguments via @round
, @trunc
, @floor
or @ceil
?
Recall the following bullet points from zig zen
:
@intFromFloat(x)
requires the reader to remember that the conversion rounds toward zero. In comparison, @intFromFloat(@trunc(x))
is much more explicit and unambiguous.
Another point in favor of making float arguments with fractional parts UB can be seen if we look at the set of conversion builtins (after implementing part 1):
builtin | lossy/lossless | dst <- src |
---|---|---|
@bitCast |
lossless | bits <- bits |
@addrSpaceCast |
lossless | ptr <- ptr |
@alignCast |
lossless | ptr <- ptr |
@constCast |
lossless | ptr <- ptr |
@ptrCast |
lossless | ptr <- ptr |
@volatileCast |
lossless | ptr <- ptr |
@ptrFromInt |
lossless | ptr <- int |
@errorCast |
lossless | error <- error |
@errorFromInt |
lossless | error <- int |
@enumFromInt |
lossless | enum <- int |
@floatCast |
lossy | float <- float |
@floatFromInt |
lossy | float <- int |
@intCast |
lossless | int <- int |
@intFromBool |
lossless | int <- bool |
@intFromError |
lossless | int <- error |
@intFromEnum |
lossless | int <- enum |
@intFromFloat |
lossy | int <- float |
@intFromPtr |
lossless | int <- ptr |
@truncate |
lossy | int <- int (and maybe float?) |
@saturate |
lossy | int <- int/float |
Here, "lossless" means that no actual information is lost when converting the value; you can always reverse the conversion and get the original value back. Disregarding @truncate
and @saturate
which are explicitly designed to be lossy, @intFromFloat
is the only conversion builtin with a non-float destination type that is lossy, due to discarding the fractional part. Making float values with fractional parts UB would make the conversion (almost*) lossless; after a successful @intFromFloat
conversion, it is always possible get back a result equal to the original value via @floatFromInt
.
*almost because -0.0
is the sole exception; while -0.0 == 0.0
, you will never be able to get the original signed zero back after the first conversion to int.
In summary, I suggest making the following changes to the definitions of @intFromFloat
and @saturate
:
@intFromFloat
@intFromFloat(float: anytype) anytype
Converts a float to the inferred integer result type.
If the float value has a fractional part, is NaN or is out of range of the destination type, it invokes safety-checked Undefined Behavior.
To convert a float value with a fractional part, use
@round
,@trunc
,@floor
or@ceil
to round the value to a whole number, then pass the rounded result to@intFromFloat
.
@saturate
@saturate(int_or_float: anytype) anytype
Converts an integer or float to the inferred integer result type, clamping the value between the minimum and maximum representable values of the destination type.
For integer arguments, this conversion is always safe.
For float arguments, if the value has a fractional part or is NaN, it invokes safety-checked Undefined Behavior. Infinities are clamped.
To convert a float value with a fractional part, use
@round
,@trunc
,@floor
or@ceil
to round the value to a whole number, then pass the rounded result to@saturate
.
Changing @intFromFloat
(and @saturate
, if its part 1 iteration is implemented independently before part 2) to perform no implicit rounding and invoke safety-checked undefined behavior for values with fractional parts would break a lot of Zig code.
Therefore, for one full release cycle, @intFromFloat
should retain its current behavior of rounding toward zero prior to conversion. The implicit rounding will be clearly documented as deprecated and to become illegal in a future release, and zig fmt
will automatically rewrite expressions of the form @intFromFloat(expr)
, where expr
is not @round(...)
, @trunc(...)
, @floor(...)
or @ceil(...)
, into @intFromFloat(@trunc(expr))
. This should give users more than enough time to update and audit their code.
Then, at the start of the subsequent release cycle, the deprecated implicit rounding is removed.
@intFromFloat
and @saturate
is one of @round
, @trunc
, @floor
or @ceil
, or a value that can be statically known to have no fractional part (by being known to be rounded), the safety check for fractional parts can be eliminated.@intFromFloat(@trunc(x))
and equivalent expressions can be lowered as one "convert to integer using truncation" instruction, instead of "round toward zero" followed by "convert to integer using truncation" (i.e. i32.trunc_f32_u
instead of f32.trunc
followed by i32.trunc_f32_u
for WebAssembly). Same for other rounding methods when applicable instructions are available.@roundHalfEven
and rename @round
to @roundHalfAway
for clarity.
Motivation
std.math.lossyCast
. This functionality should be provided by the language.Updated Builtins
These four functions have clarified definitions:
@round
: Returns the nearest representable integer to the operand; away from zero in halfway case.@trunc
: Returns the nearest representable integer to the operand, towards zero.@floor
: Returns the nearest representable integer to the operand, towards negative infinity.@ceil
: Returns the nearest representable integer to the operand, towards positive infinity.When the result type is non-integer, the behavior is the same as before. If the operand is nan or infinite, the operand is returned. In this case an integer operand is not allowed except where it would be allowed by
@as
.When the result type is integer, these functions have a saturating effect. A value outside the integer range is clamped; nan is illegal. In this case an integer operand is allowed.
Conformance Tests
Uses of
std.math.lossyCast
can be upgraded to@round
.Related:
3806
11234
13642