Closed irevoire closed 2 months ago
Thank you for looking into this!
The issue is that when a function with a type annotation returns something, we should convert the returned value to the expected type:
Hm, no. Whether or not a quantity is simplified should definitely not depend on the presence of a type annotation. It shouldn't depend on the type at all. Types in Numbat always mean: physical dimensions (Scalar, Length, Velocity) and not units (—, meter, meter/second).
Here, for example,
floor(time / day)
returns aTime / Time
, which we want to convert to a simple scalar, I guess?
The type Time / Time
is exactly the same type as Scalar
.
With this code, only two tests fail.
I think I would need a bit more information to see what exactly went wrong in those two tests.
Hm, no. Whether or not a quantity is simplified should definitely not depend on the presence of a type annotation. It shouldn't depend on the type at all. Types in Numbat always mean: physical dimensions (Scalar, Length, Velocity) and not units (—, meter, meter/second).
Thanks, I was completely wrong about my approach and was able to fix some stuff because of your comment.
I think the PR is almost ready to merge, but there is still an issue I don't really understand:
% cargo run -- -e "floor(1.5 seconds / 1 days)"
0.0000115741
The floor
function doesn't work, and probably the ceil
, round
etc
Also, on main
is this behavior expected? https://numbat.dev/?q=floor%281.5+second+%2F+day%29%E2%8F%8Efloor%281.5+second+%2F+day+%E2%9E%9E+second+%2F+day%29%E2%8F%8E
That seems strange to me 🤔
I think the PR is almost ready to merge, but there is still an issue I don't really understand:
% cargo run -- -e "floor(1.5 seconds / 1 days)" 0.0000115741
That happens because it gets floor
ed to 1.0 seconds / days
and then converted to a scalar (0.0000115741).
Also, on
main
is this behavior expected? https://numbat.dev/?q=floor%281.5+second+%2F+day%29%E2%8F%8Efloor%281.5+second+%2F+day+%E2%9E%9E+second+%2F+day%29%E2%8F%8E
Sort of.
The problem is not with your PR. The problem is the very existence of those functions. Please see my explanation here and the ticket https://github.com/sharkdp/numbat/issues/336.
I'll try to give a more helpful response (or maybe a fix) in a few days.
Actually. Maybe your PR even solves the problem that I mentioned in that description?! In that case, we should probably provide floor_in
, round_in
etc. and remove floor
, round
etc entirely.
The problem is not with your PR. The problem is the very existence of those functions. Please see my explanation https://github.com/sharkdp/numbat/issues/534#issuecomment-2286914824 and the ticket https://github.com/sharkdp/numbat/issues/336.
Oh yes, you're right; these functions don't really make sense when playing with different units; it's actually a way more complex subject than I expected 😅
In that case, we should probably provide floor_in, round_in etc.
That seems smart, and the right side of the function call could be any value instead of a type right?
So round_in(time, days)
would works, but round_in(143, 13)
would work as well right?
and remove floor, round etc entirely.
💯
Now that I understand the issue, I don't see how it could make sense to round stuff randomly on units we don't have control over, the in
should always be specified.
On a side note, it should also always be the second arguments so we can easily write 80 km / hours |> round(m/s)
.
What should be the exact syntax in our case for simple scalars?
I guess in my case I should write: round_in(time / day, 1)
and internally round_in
should call convertTo
?
The 1
is ugly though 😔
Let's try to fix the notation, so we don't get confused. I would propose we change the type annotation for round
to make it more restrictive (instead of deleting it entirely) and introduce round_in
like so:
fn round(value: Scalar) -> Scalar
fn round_in<D: Dim>(base: D, value: D) -> D = round(value / base) × base
I think this solves all problems:
assert_eq(round(1.234), 1)
assert_eq(1.234 m |> round_in(m), 1 m)
assert_eq(1.234 m |> round_in(cm), 123 cm)
assert_eq(1.234 m |> round_in(mm), 1234 mm)
assert_eq(1.234 m |> round_in(10 m), 0)
assert_eq(1.234 m |> round_in(1 m), 1 m)
assert_eq(1.234 m |> round_in(0.1 m), 1.2 m, 1e-9 m)
assert_eq(1.234 m |> round_in(0.01 m), 1.23 m, 1e-9 m)
assert_eq(1234 |> round_in(1000), 1000)
assert_eq(1234 |> round_in(100), 1200)
assert_eq(1234 |> round_in(10), 1230)
assert_eq(1234 |> round_in(1), 1234)
assert_eq(1234 |> round_in(0.1), 1234)
and the right side of the function call could be any value instead of a type right?
base
can be any value, yes.
So
round_in(time, days)
would works, butround_in(143, 13)
would work as well right?
So with the notation above: round_in(days, time)
would work. It's the same as time |> round_in(days)
. And round_in(13, 143)
would also work, but it would be a bit weird :smile:. But things like round_in(10, 143)
make sense and would yield 140
.
On a side note, it should also always be the second arguments so we can easily write
80 km / hours |> round(m/s)
.
Yes.
What should be the exact syntax in our case for simple scalars?
I think we should keep round
for that after all. It's by far the most common use case to call round
on scalar values, so we should still provide. But restrict it to only work on Scalar
values. If someone attempts to do round(1.234 m)
, that wouldn't work. Ideally, we could show a helpful error message mentioning round_in
.
Oops, I swapped the order of the parameters with |>
in my previous message. My bad 🤦
I’ll look into it later but:
If someone attempts to do round(1.234 m),
Don't we have an issue here? Above you also wrote:
The type Time / Time is exactly the same type as Scalar.
Which was causing the issue initially with a floor(time / day)
I updated the code with a round_in
/floor_in
function, but it doesn't work because:
fn round_in<D: Dim>(base: D, value: D) -> D = round(value / base) × base
In floor(value / base)
in my case returns something in s / day
which is badly rounded and then returns something non-floored.
% cargo run -- -e "floor_in(day, 12420s)"
Compiling numbat-cli v1.13.0 (/Users/irevoire/numbat/numbat-cli)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.93s
Running `target/debug/numbat -e 'floor_in(day, 12420s)'`
0.14375 day
I guess one solution could be to call fullSimplify
before calling floor?
That seems a bit hacky to me and hard to maintain in the VM...
Another solution could be to provide a new function, simplify
, that can only be called on scalar
and then move the complexity to the stdlib of numbat instead of the VM. Still hacky IMO
And the last solution I can think of would be to provide a new special internal type like « number » or something that is a fully simplified scalar. It could only be generated by converting a scalar to it and the VM could automatically generate the fullsimplify instruction.
In
floor(value / base)
in my case returns something ins / day
which is badly rounded and then returns something non-floored.
I think you need to change simple_polymorphic_math_function!(round, round);
to simple_scalar_math_function!(round, round)
in numbat/src/ffi/math.rs
(and similar floor
and all the others). It will use Quantity::as_scalar
which is guaranteed to succeed by the type checker (if we change the signature to Scalar -> Scalar
).
Oooh nice I didn't know we had that! I’ll rebase once your PR lands on main 👌
I’ll rebase once your PR lands on main 👌
This has been merged now. (I really like how this example looks now with the new |>
operator and the new round_in
function).
(I really like how this example looks now with the new |> operator and the new round_in function).
Ahah this one is awesome, I didn't know about it! 😂
I rebased my PR, edited the original message and updated the performance improvements
This is great. What I don't fully instead at the moment: using your can_simplify
trick.. could we even take this further and remove the FullSimplify
instruction alltogether?
I think we can, but I didn't try it because it's hard to know if we can simplify something.
So, as stated in the description, the idea is to not simplify when we don't need to. So now, the only times we need to simplify an expression are:
{expr}
The issue is that if someone wrote an expression that's not simplified, like expr -> km/m
, even if it doesn't make sense, we should output the expression in the type he specified.
This boolean is only false
after processing a ConvertTo
expression, and any operation applied to it will make the boolean true
again.
could we even take this further and remove the
FullSimplify
instruction altogether?
In the end, I believe this is possible, but it requires us to keep track of the usage of each variable while compiling the stmts to asm and, although it doesn't seem hard, that's not something we already do in numbat.
let a = 3m/m -> km/m
let stuff = 47
let bidule = 34 + stuff + a
print(a) # We need to find back the definition of `a` and see if it contained a `ConvertTo` thing
Just thinking out loud now I wrote this example, maybe we have only two cases currently when we should not simplify an expression:
Either we're printing an identifier whose definition ends with a ConvertTo
, or the expression we're currently printing ends with a ConvertTo
.
ConvertTo
Sorry I’m not sure that's super clear but let me know if it isn't 😅
I was thinking of the following:
FullSimplify
)(x -> m) + 2 cm
)full_simplify()
on the value first. values that were previously flagged will not get simplified (because full_simplify
checks the flag).Wouldn't this be enough already?
Yes, I need to try a few things, but it should work and be even faster 🔥
So it turns out this was really easy to implement on top of what you did (I hope I didn't miss something you had thought about). I basically just had to remove some code and add a few .full_simplify()
calls.
It's very nice conceptually that FullSimplify
is now gone.
This PR also fixes two issues we discovered recently. I added regression tests for them.
For the cargo bench
prelude benchmark, I see a 20% improvement (48 ms => 37 ms). And similarly when measuring with hyperfine
:
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
./numbat-master -e "1+1" |
56.1 ± 1.8 | 53.7 | 62.1 | 1.22 ± 0.05 |
./numbat-no-simplify -e "1+1" |
45.9 ± 1.2 | 43.8 | 50.2 | 1.00 |
For the benchmark you suggested in your original post, I still see the 10% improvement. And I see that you have a much faster machine than I have :smile:
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
./numbat-master -e "…" |
2.291 ± 0.030 | 2.258 | 2.352 | 1.09 ± 0.03 |
./numbat-no-simplify -e "…" |
2.098 ± 0.046 | 2.029 | 2.168 | 1.00 |
Thank you very much for your work. This is great. Let me know if I missed something.
Hey, it's awesome! I'm sorry I didn't finish this myself. I started working again and didn't find the time to work on it again, but it's awesome that you finished it!
Thank you
Reduce the number of calls to simplify
Fixes https://github.com/sharkdp/numbat/issues/535
We try to call
FullSimplify
as few as possible. Basically right before outputting something to the end user (calling print, returning a value to the user or composing strings).Performances
On my computer:
range(0, 1_000_000) |> map(sqrt) |> sum
improved by ~12% (from 1.5s to 1.3s)Original issue
Hey, after your answer, I started some work on https://github.com/sharkdp/numbat/issues/535
My initial idea was to call it
full_simplify
only at the end of statements.To solve this part, I added a boolean in every value that tells us if the immediate last operation was a
ConvertTo
.As expected, I got a 10% performance improvement:
The issue
With this code, only two tests fail.
The issue is that when a function with a type annotation returns something, we should convert the returned value to the expected type:
Here, for example,
floor(time / day)
returns aTime / Time
, which we want to convert to a simple scalar, I guess? I don't understand how to do that. Thereturn_type_annotation
of a function is aTypeAnnotation
, and I need to give aUnit
toConvertTo
. I didn't find any way to do that in the code and am struggling to make the conversion myself.Let me know if you think this makes sense and know how to help me.