Why are we calling `FullSimplify` everywhere

Hey,

I was profiling numbat again on this code range(0, 1_000_000) |> map(sqrt) |> sum and noticed that ~10% of the time was spent calling the FullSimplify opcode.

I tried to remove the self.vm.add_op(Op::FullSimplify); in compile_expression_with_simplify, and I can confirm that it's not a theoretical gain. I went from 2.1s of execution to 1.8s, and it's super consistent (15% improvements, if I’m not mistaken).

My question is: what's the purpose of this instruction? From what I understand, it tries to find the best way to represent a value, right? Like if you have 1000m, it should output 1km, or am I completely wrong? And if that’s the case, then we should never call it until we output something to the end user (which is why I think I’m wrong).

And even in the case I’m wrong, I don't see the point of calling it in my example; since it's a scalar without any unit, maybe we could store it somewhere that it's already simplified and doesn't need to be simplified again? 🤔

That is a good question, thank you. I'm aware about the overhead of full_simplify calls. That overhead could become even larger once we add more sophisticated simplification heuristics.

My question is: what's the purpose of this instruction? From what I understand, it tries to find the best way to represent a value, right?

It's mostly about simplifying the unit of the quantity. 1 (m/s)/s becomes 1 m/s². 1 m·s·m becomes 1 m²·s. 1 cm/m becomes 0.01. 1 Mbit/s * hour becomes 3600 Mbit.

Like if you have 1000m, it should output 1km, or am I completely wrong?

That could be part of the simplification procedure, but it's currently not (see #510)

And if that’s the case, then we should never call it until we output something to the end user (which is why I think I’m wrong).

The problem is that we don't want to run in unconditionally. For example, if someone explicitly requests a conversion to a certain unit (e.g. 10 m² to cm·m), we would end up with a value of 1000 cm·m, but we should not run simplification on that quantity. Otherwise, we would be back at 10 m² or 100_000 cm².

So the next best thing would be to say: Okay, if the top-level expression is of shape x to y (or x -> y), then we do not run simplification. But that is not enough. What if we have this:

fn return_area_in_cm_times_m(a: Area) = a -> cm·m

return_area_in_cm_times_m(10 m²)

I want that program to return 1000 cm·m.

And that is why I introduced FullSimplify as a operation in the VM. If we find better ways to solve this, I'm all for it! I haven't put much thought into alternative solutions, but it sure feels like there are smarter ways to do this.

And even in the case I’m wrong, I don't see the point of calling it in my example; since it's a scalar without any unit

Well, we don't know that during compilation. Thanks to static analysis and Numbats type system, we know that we are dealing with a quantity of type Scalar, but there is no way of knowing whether that is represented as a plain number or as something like 1 cm/m.

maybe we could store it somewhere that it's already simplified and doesn't need to be simplified again? 🤔

Maybe?

sharkdp / numbat

Why are we calling `FullSimplify` everywhere #535