This PR has some quick stuff to make some arithmetic work faster on the JIT. There's more work to do, but this is a first step at least.
One change is that using (max 0 ...) in the implementation of Nat.drop was slow, so if you use that in a loop, you lose performance. It's been replaced with something that should be equivalent, but faster.
I've also changed the way that unison definitions get 'curried.' It's back to the original strategy of generating case-lambda expressions for every definition. My experiments suggest that this optimizes better in various cases. I've also added machinery to selectively apply this behavior, because it causes compilation to be a lot slower.
According to my tests, it shouldn't be necessary for every definition to use this strategy. It's mostly recursive functions that the optimizer refuses to handle well with pre-defined currying functions. But I also couldn't get the optimizer to optimize builtins properly in actual code without them also using this sort of currying. At this point I'm unsure of what the difference between my test cases and the actual code is, so I thought I'd just push this to get the optimization out, and try to figure out how to be more intelligent about it later.
With this, counting up to 1 billion takes around 1.5s on my machine, which matches a loop written directly in racket. This is only testing a couple operations, though, so there may be random other things like the (max 0 ...) situation out there that I haven't looked at yet.
This PR has some quick stuff to make some arithmetic work faster on the JIT. There's more work to do, but this is a first step at least.
One change is that using
(max 0 ...)
in the implementation ofNat.drop
was slow, so if you use that in a loop, you lose performance. It's been replaced with something that should be equivalent, but faster.I've also changed the way that unison definitions get 'curried.' It's back to the original strategy of generating
case-lambda
expressions for every definition. My experiments suggest that this optimizes better in various cases. I've also added machinery to selectively apply this behavior, because it causes compilation to be a lot slower.According to my tests, it shouldn't be necessary for every definition to use this strategy. It's mostly recursive functions that the optimizer refuses to handle well with pre-defined currying functions. But I also couldn't get the optimizer to optimize builtins properly in actual code without them also using this sort of currying. At this point I'm unsure of what the difference between my test cases and the actual code is, so I thought I'd just push this to get the optimization out, and try to figure out how to be more intelligent about it later.
With this, counting up to 1 billion takes around 1.5s on my machine, which matches a loop written directly in racket. This is only testing a couple operations, though, so there may be random other things like the
(max 0 ...)
situation out there that I haven't looked at yet.