mattwparas / steel

An embedded scheme interpreter in Rust
Apache License 2.0
980 stars 46 forks source link

minor repl improvements #210

Closed jrvidal closed 3 weeks ago

jrvidal commented 2 months ago
jrvidal commented 1 month ago

This is great - would you be able to run some benchmarks and see if there is any performance change due to this? I'd expect negligible but perhaps worth checking

(define (loop n) (if (= n 0) #f (loop (- n 1))))

(define (try-loop) (loop 100000000))

This loop takes close to ~7s on my machine, and there's a reliable difference after this change, around 1%-2%. Not sure how acceptable that is.

jrvidal commented 1 month ago

I'm thinking now: perhaps an obvious optimization is to avoid checking the atomic on every instruction, and instead just do it every 1024 instructions or so. I presume the "intermediate counter" can be better optimized to a register.

mattwparas commented 1 month ago

It's possible the additional counter and check for every 1024 could be more expensive than checking the atomic I suppose - I'm not opposed to either, whichever one is faster!

Also we can probably put this behind a feature flag before rolling it out everywhere. I could imagine that you could have use cases where you don't need to interrupt it and shouldn't have to pay the cost (even though the cost is minimal)

jrvidal commented 1 month ago

It's possible the additional counter and check for every 1024 could be more expensive than checking the atomic I suppose

^ My benchmarks seem to indicate that you were right... CPUs are complicated beasts :shrug:

we can probably put this behind a feature flag

I've made it optional, both at runtime and compile time. Wrapping it in an Option penalizes the REPL a bit more in favor of speeding up running scripts, which I think it makes sense.

jrvidal commented 1 month ago

if that is an alright timeline for you?

Sure, whatever works for you :+1:

mattwparas commented 1 month ago

if that is an alright timeline for you?

Sure, whatever works for you 👍

So I didn't get around to the benchmarks - did you want to merge this as is (including the $1 stuff)? Since its opt in, I could live on it for a bit and run some benchmarks separately and see what the overall cost is. I'd imagine about 1% is probably right, which I think I could live with.

jrvidal commented 1 month ago

did you want to merge this as is (including the $1 stuff)?

I'm in no particular hurry to merge anything, please do what you think it's best. I guess, worse case scenario, if you find the benchmarks to be less than ideal, we could scrap the interrupt stuff out or rethink it.

mattwparas commented 1 month ago

did you want to merge this as is (including the $1 stuff)?

I'm in no particular hurry to merge anything, please do what you think it's best. I guess, worse case scenario, if you find the benchmarks to be less than ideal, we could scrap the interrupt stuff out or rethink it.

Alright so I did some digging - can you just try running:

cargo bench --package steel-core

And we can compare the before changes and after changes? I'll also replicate on my machine

mattwparas commented 1 month ago

Looks like I'm getting about 1-3% performance decrease, however given that its optional I'm inclined to include this change since I think being able to interrupt and continue working at the repl is most likely worth it.

jrvidal commented 1 month ago

The numbers I'm getting are kind of all over the place:

ackermann-3-3                   =>  +7.7%
binary-trees                    =>  +1.3%
engine-creation                 =>  +0.7%
fib-28                          =>  -3.4%
filter-big                      =>    +5%
map-big                         =>  +7.3%
multiple-transducers            =>  -4.1%
range-big                       => +24.6%
register-fn                     =>  -6.5%
ten-thousand-iterations         =>  +9.7%
ten-thousand-iterations-letrec  =>  +6.1%
thread-creation                 =>  +0.1%
transducer-map                  =>  +1.9%
trie-sort-without-optimizations =>  -0.5%

This is the diff between the estimated mean for each benchmark, in alphabetic order. Just in case, did you remember to enable the interrupt feature when testing this branch?

mattwparas commented 1 month ago

The numbers I'm getting are kind of all over the place:

ackermann-3-3                   =>  +7.7%
binary-trees                    =>  +1.3%
engine-creation                 =>  +0.7%
fib-28                          =>  -3.4%
filter-big                      =>    +5%
map-big                         =>  +7.3%
multiple-transducers            =>  -4.1%
range-big                       => +24.6%
register-fn                     =>  -6.5%
ten-thousand-iterations         =>  +9.7%
ten-thousand-iterations-letrec  =>  +6.1%
thread-creation                 =>  +0.1%
transducer-map                  =>  +1.9%
trie-sort-without-optimizations =>  -0.5%

This is the diff between the estimated mean for each benchmark, in alphabetic order. Just in case, did you remember to enable the interrupt feature when testing this branch?

I did enable the interrupt feature, but yeah those are really all over the place. Maybe some of these benchmarks are too noisy?