inducer / pymbolic

A simple package to do symbolic math (focus on code gen and DSLs)
http://mathema.tician.de/software/pymbolic
Other
106 stars 24 forks source link

Hack traversals for speed #110

Closed inducer closed 1 year ago

inducer commented 1 year ago

There's actually a path towards merging this now.

The root of this is this contest set up by @kaushikcfd to show that we should all abandon Python for Rust (or some other AOT statically-typed language).

Here's my fork of @kaushikcfd's gist: https://gist.github.com/inducer/25e3372fd5c82384d437c777ed4153e9

Latest results:

inducer commented 1 year ago

Pypy is 0.90s.

kaushikcfd commented 1 year ago

For posterity, could also report the runtimes for the rust-port on your machine?

cargo new --bin symoxide_contest
cd symoxide_contest
cargo add symoxide
curl https://gist.githubusercontent.com/kaushikcfd/74c442a075557dad466cd3daea9c151f/raw/d593ce7d5a6de6764e541921472456c8528efb3b/symoxide_expr_traversal.rs > src/main.rs
cargo run --release
inducer commented 1 year ago

Just ran this. Somewhere around .28s.

nightly-x86_64-unknown-linux-gnu (default)
rustc 1.66.0-nightly (432abd86f 2022-09-20)
inducer commented 1 year ago

So pypy times seem super variable. Just reran, with

Python 3.8.13 (7.3.9+dfsg-4, Aug 09 2022, 12:51:24)
[PyPy 7.3.9 with GCC 12.1.0]

(nominally unchanged to before) and got 0.63s.

inducer commented 1 year ago

I have an idea for a Python-side AST-to-AST transform that would eliminate the bounces to rec in the bulk of cases, by effectively inlining the common case of rec. IDK whether I'll be desperate enough this weekend to do it, but I think that might do something, because it would cut the number of frames set up in a pymbolic traversal by ~2.

kaushikcfd commented 1 year ago

So pypy times seem super variable. Just reran, with

Maybe earlier the JIT costs were also included? Also, not sure if tracing JIT is mature enough to compete with AOT compilers for such workloads in terms of both -- compilation overheads and quality of generated code.

inducer commented 1 year ago

not sure if tracing JIT is mature enough to compete with AOT compilers

What does "mature" mean to you here? For me, I'm interested in correctness and speed, in that order. :slightly_smiling_face:

kaushikcfd commented 1 year ago

What does "mature" mean to you here? For me, I'm interested in correctness and speed, in that order. slightly_smiling_face

Correctness is the bare minimum [^1]. I was mainly referring about the JIT overheads (#recompliations and compilation costs)

[^1]: not saying it is trivial :)

inducer commented 1 year ago

The "mapper optimizer" is now a thing. It succeeds at removing passed-through *args, **kwargs when those aren't needed, and that's clearly profitable.

It also contains a code path for inlining self.rec and/or the cache retrieval, but those are curiously unprofitable. Not sure I understand why, given that I thought we were bound by frame setup.

inducer commented 1 year ago

The doc failure is https://github.com/sphinx-doc/sphinx/issues/10861.

kaushikcfd commented 1 year ago

Bumped the baseline by a bit, see https://github.com/kaushikcfd/symoxide/pull/3 :).

inducer commented 1 year ago

The final tally here:

There is a loopy failure, but that's just a doctest fail because this turns off flatten-by-default in the IdentityMapper.