frame cloning too expensive

pmachata commented 9 years ago

$ time dwgrep/dwgrep -c -e '
    let A := [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
    let B := [A elem 10 mul A elem add];
    let C := [B elem 100 mul B elem add];
    C elem'
10000

real    0m4.888s
user    0m3.907s
sys 0m0.986s

The problem lies (I think) in the fact that elem clones the context for each new result. Cloning context involves cloning the topmost stack frame, which contains A, B, and C, containing lots of values, which are all recursively cloned. Of course that needs to be done, because the two cloned frames might end up binding different values to individual slots, so at least the empty slots need to be per-frame. Perhaps something clever with partial sharing of frames can be invented.

There's a lot of this thing going on. All the values are immutable (except for set_pos), and cloning them shouldn't be strictly necessary. However, simply converting unique pointers to shared ones makes the matters worse (see branch pmachata/shared_ptr), possibly because work with atomic counters is worse than actually cloning the values.

pmachata commented 7 years ago

I'm rerunning some tests. I'm not sure what was going on back then, but now the shared pointer ends up being a very significant win. I updated the shared_ptr branch so that it compiles and fixed an issue. But fundamentally I believe that using shared_ptr is not even necessary, and one could just statically determine where each lifetime ends and use raw pointers.

pmachata commented 7 years ago

Pushed a branch pmachata/frames that should address this. The idea is that as build.cc converts the parse tree to operation tree, it statically determines for each read operation which bind operation will provide that value. It then directly gives the bind operation pointer to the read operation. The read op then calls upstream->next(), which propagates up the chain, eventually reaching the bind in question, which pulls TOS and remembers it. The read then asks the bind for the remembered value.

This prompts a number of changes in how name binding is handled, e.g. unbound names are now determined statically instead of in runtime, and scopes become free, allowing some nice cleanups.

Closures don't work in their current state, so the seven tests that use them fail, but the rest all works. Closures will need to be handled by rebinding read operations within a closure to fake read operations, or maybe replacing the read operations with a literal. Then all the content that the closure needs will be linked to the value itself, and apply can shed the extra frame-setting logic.

At that points, frames as such can be entirely dropped.

pmachata commented 6 years ago

Fixed in 0.3.

pmachata / dwgrep

frame cloning too expensive #25