Improve performance - Githubissues

gregr commented 8 years ago

Fix improper walks by list-split-ground

We were only getting the benefit of set-var-val! vars by not properly passing the state's substitution to walk. This fix reduces latency by between 1/3 and 2/3 on the harder benchmarks (the harder, the more savings, as usual).

gregr commented 8 years ago

So the former 3s benchmark (which was 30s before today) now takes about 2s, and the new former 19s benchmark now takes about 7s.

gregr commented 8 years ago

FYI, I'm also experimenting with optionally limiting search depth in faster-mk. I have a depth-limiting version which about doubles performance again on top of this pull request (2s -> 1s, 7s -> 4s).

Unfortunately I've still yet to see the newest/hardest queries finish without running out of memory, even then.

gregr commented 8 years ago

I finally found a way to synthesize virtually all of append (in under 10 seconds, even):

(define append
  (lambda (l s)
    (if ,A ,B ,C))

But the change needed is disappointing to have to make. I changed the definition of if-primo to expect the condition to always return a boolean. This means that instead of (=/= #f t) for the true case we have (== #t t), which is not the usual Scheme semantics.

The next hardest query (having to guess that if is needed in the first place) still doesn't finish.

webyrd commented 8 years ago

How are you keeping the synthesis from cheating in these examples? With careful tests?

The next hardest query (having to guess that if is needed in the first place) still doesn't finish.

I think it would be reasonable, when running on a machine with many cores, to have at least one core guess that the definition is recursive, and therefore has an if as its body. Of course, speeding up synthesis so that this isn't needed would be nice, but I think for more complex examples we'll need these sorts of heuristics/speculative evaluations.

But the change needed is disappointing to have to make. I changed the definition of if-primo to expect the condition to always return a boolean. This means that instead of (=/= #f t) for the true case we have `(==

t t)`, which is not the usual Scheme semantics.

Once again, given enough parallelism at the hardware level, it seems reasonable to have at least one thread guess that the test of an if returns a Boolean. Also, we should probably have at least one thread guess that the entire definition is well-typed (for some type system closer to that of ML or whatever), and try to perform type inference. If this happens to be the case, synthesis could be far faster.

On Fri, Aug 12, 2016 at 9:22 PM, Greg Rosenblatt notifications@github.com wrote:

I finally found a way to synthesize virtually all of append (in under 10 seconds, even):

(define append (lambda (l s) (if ,A ,B ,C))

But the change needed is disappointing to have to make. I changed the definition of if-primo to expect the condition to always return a boolean. This means that instead of (=/= #f t) for the true case we have (==

t t), which is not the usual Scheme semantics.

The next hardest query (having to guess that if is needed in the first place) still doesn't finish.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/webyrd/Barliman/pull/12#issuecomment-239599134, or mute the thread https://github.com/notifications/unsubscribe-auth/ABHThJR-4OjRnyA_Q_oEvzxQkEmC8QF7ks5qfTiSgaJpZM4Jjkzt .

gregr commented 8 years ago

Great! Running multiple versions in parallel should also allow us to make use depth limiting for faster queries in many cases without jeopardizing completeness.

To get this new synthesis result, I haven't had to change the nature of test cases at all, believe it or not. I'm using the same test cases as seen in test-old-interp.scm for the other benchmarks. I just changed the interpreter and it worked everything out by itself!

gregr commented 8 years ago

I just managed to synthesize all of append from test cases alone, with Barliman's starting skeleton! That means it guesses the name append, the parameter structure of the lambda, and that an if statement will work.

(define ,A
  (lambda ,B ,C))

Guess:

(define append
  (lambda (_.0 _.1)
    (if (null? _.0)
      _.1
      (cons (car _.0) (append (cdr _.0) _.1)))))

Side conditions: (sym _.0 _.1)

Still using the same test cases as before, with the boolean-only if semantics, but now I've moved if-primo out of prim-expo and placed it just ahead of variadic lambda application. Synthesis takes about 39 seconds.

gregr commented 8 years ago

I should note that the 39 second timing is with a depth limit. Without the limit, I haven't seen a result even after a few minutes of waiting.

gregr commented 8 years ago

By the way, it turns out you don't even need Barliman's starting skeleton for the full synthesis result. Without the skeleton, you get the same result in the same amount of time.

gregr commented 8 years ago

By providing just a little more than the starting skeleton ...

(define append
  (lambda (l s) ,A))

... performance improves dramatically. This one completes correctly in less than 5 seconds.

gregr commented 8 years ago

Oh, if you're confused as to how this is possible without additional tests to prevent cheating, I forgot to remind you that the depth limit itself also helps to protect against cheating.

To avoid confusion, note that this effect is not caused by cheats being pruned by the depth limit directly: the cheats we've seen are found at shallower depths than real answers, so that cannot explain this effect.

The real explanation seems to be that interesting lines of search retain more search resources as their siblings die off from hitting the limit. Interesting lines of search are disproportionately rewarded by this sibling pruning, compared with cheating lines of search.

This is the same reason that non-gensym test cases with a depth limit are enough to solve some of the problems that require gensym test cases without a limit. We figured this out while trying to understand why run-da-dls can solve hard 9 without gensyms in its test cases a while back: https://github.com/gregr/racket-misc/blob/8ee15173e25521b57bf14507c6bf61422fd6a57e/interp-tests.rkt#L740-L754

gregr commented 8 years ago

I've pushed an experimental branch (don't merge it!) for depth-limited search and tweaks to if semantics for you to play with: https://github.com/webyrd/Barliman/commits/do-not-merge-experimental-dls

You can manually adjust max-search-depth in mk.scm to adjust the depth limit. Let me know if you have any questions.

webyrd commented 8 years ago

If we had ~100 hardware threads, could we just increase the max-search-depth exponentially on several threads? Would that well, do you think?

On Sat, Aug 13, 2016 at 12:16 PM, Greg Rosenblatt notifications@github.com wrote:

I've pushed an experimental branch (don't merge it!) for depth-limited search and tweaks to if semantics for you to play with: https://github.com/webyrd/Barliman/commits/do-not-merge-experimental-dls

You can manually adjust max-search-depth in mk.scm to adjust the depth limit. Let me know if you have any questions.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/webyrd/Barliman/pull/12#issuecomment-239634242, or mute the thread https://github.com/notifications/unsubscribe-auth/ABHThBQ9Fr39lEmd1MYEPmmLV0KAwt4Qks5qfgnsgaJpZM4Jjkzt .

gregr commented 8 years ago

Absolutely, that's the exact arrangement I was thinking we should try!

webyrd commented 8 years ago

Okay! It's probably time to think about getting access to a beefy machine, or renting an Amazon X1 with spot pricing. :)

On Sat, Aug 13, 2016 at 12:18 PM, Greg Rosenblatt notifications@github.com wrote:

Absolutely, that's the exact arrangement I was thinking we should try!

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/webyrd/Barliman/pull/12#issuecomment-239634336, or mute the thread https://github.com/notifications/unsubscribe-auth/ABHThAjoQJHQxnnFUdKZSkffJhcAvjORks5qfgp3gaJpZM4Jjkzt .

gregr commented 8 years ago

Here's a screenshot of the complete synthesis:

gregr commented 8 years ago

For the simple examples we've been trying, a small depth is always going to work and parallel processes aren't going to help much.

Maybe we should first find more involved examples that can benefit from searching across multiple depth limits simultaneously.

webyrd commented 8 years ago

I was able to use your branch to synthesize all of 'zip' in ~30 seconds, using the three obvious tests with gensyms. I had no luck synthesizing map, perhaps partly because I wasn't sure which tests might make sense.

On Sat, Aug 13, 2016 at 12:24 PM, Greg Rosenblatt notifications@github.com wrote:

For the simple examples we've been trying, a small depth is always going to work and parallel processes aren't going to help much.

Maybe we should first find more involved examples that can benefit from searching across multiple depth limits simultaneously.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/webyrd/Barliman/pull/12#issuecomment-239634614, or mute the thread https://github.com/notifications/unsubscribe-auth/ABHThCrkJ5PLqwG4dwyrkyrvUuRXWTaAks5qfgv7gaJpZM4Jjkzt .

gregr commented 8 years ago

It seems to have a lot of trouble figuring out to pass the procedure as an argument. This doesn't seem to come back:

(define map
  (lambda (f xs)
    (if (null? xs)
     '() (cons (f (car xs)) (map ,A (cdr xs))))))

But it doesn't take too long for it to guess everything else once you help it with passing the procedure:

(define map
  (lambda (f xs)
    (,A ,B
     ,C (,D ,E (,F f ,G)))))

I wonder why this is. Maybe backwards information flow in this case is more difficult?

webyrd commented 8 years ago

Seems like unfolding might be very helpful here. We are no where close to using the full information of mostly-ground code. I'm not sure which approach would work best in general, though.

Do you think Mercury would be able to do something clever with

` (define map (lambda (f xs) (if (null? xs) '() (cons (f (car xs)) (map ,A (cdr xs))))))

`

On Tue, Aug 16, 2016 at 9:32 AM, Greg Rosenblatt notifications@github.com wrote:

It seems to have a lot of trouble figuring out to pass the procedure as an argument. This doesn't seem to come back:

(define map (lambda (f xs) (if (null? xs) '() (cons (f (car xs)) (map ,A (cdr xs))))))

But it doesn't take too long for it to guess everything else once you help it with passing the procedure:

(define map (lambda (f xs) (,A ,B ,C (,D ,E (,F f ,G)))))

I wonder why this is. Maybe backwards information flow in this case is more difficult?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/webyrd/Barliman/pull/12#issuecomment-240140047, or mute the thread https://github.com/notifications/unsubscribe-auth/ABHThJrLWUyzmzVYkdhRbHTPi_dRI_wJks5qgdgNgaJpZM4Jjkzt .

gregr commented 8 years ago

Not really thinking in terms of Mercury anymore, but I have an idea for using mostly-instantiated code to get to the point faster. I'm going to try adding a conde1 to faster-mk, representing conde with cases that are all mutually exclusive w.r.t. some goal prefix (index), and defer interleaving for as long as we have enough information to make a deterministic choice (pruning all but one branch based on the prefixes).

webyrd commented 8 years ago

That might work!

There has to be a way to take advantage of the determinism, when we have it. I also wonder if this might be a reasonable case for abstract interpretation or partial evaluation.

On Tue, Aug 16, 2016 at 11:08 AM, Greg Rosenblatt notifications@github.com wrote:

Not really thinking in terms of Mercury anymore, but I have an idea for using mostly-instantiated code to get to the point faster. I'm going to try adding a conde1 to faster-mk, representing conde with cases that are all mutually exclusive w.r.t. some goal prefix (index), and defer interleaving for as long as we have enough information to make a deterministic choice (pruning all but one branch based on the prefixes).

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/webyrd/Barliman/pull/12#issuecomment-240169591, or mute the thread https://github.com/notifications/unsubscribe-auth/ABHThHLJf1OrFBOLsk65_OO70Fc6AL4rks5qge6GgaJpZM4Jjkzt .

gregr commented 8 years ago

Unfortunately my first attempt with conde1 hasn't improved performance, and may have actually reduced it by activating the garbage collector more often. I'll have to figure this out later.

But I did find a set of tests that makes it easier to find a definition for map by passing primitives as the procedure, probably because they are simpler to evaluate than lambdas.

This query completes in about 20 seconds:

(define map
  (lambda (f xs)
    ,A))

Tests:
(map ',g1 '())
'()

(map car '((,g2 . ,g3)))
`(,g2)

(map cdr '((,g4 . ,g5) (,g6 . ,g7)))
`(,g5 ,g7)

Best Guess:
(define map (lambda (f xs) (if (null? xs) xs (cons (f (car xs)) (map f (cdr xs))))))

gregr commented 8 years ago

The full query takes just under 3 minutes:

,A

Best Guess:
(define map (lambda (_.0 _.1) (if (null? _.1) _.1 (cons (_.0 (car _.1)) (map _.0 (cdr _.1))))))

Side conditions:
(sym _.0 _.1)

webyrd commented 8 years ago

Nice! I particularly like this test:

(map ',g1 '()) '()

Parallelization/templated starting guesses should help a lot, I'd think.

On Tue, Aug 16, 2016 at 2:15 PM, Greg Rosenblatt notifications@github.com wrote:

The full query takes just under 3 minutes:

,A

Best Guess: (define map (lambda (.0 .1) (if (null? .1) .1 (cons (.0 (car .1)) (map .0 (cdr .1))))))

Side conditions: (sym .0 .1)

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/webyrd/Barliman/pull/12#issuecomment-240223682, or mute the thread https://github.com/notifications/unsubscribe-auth/ABHThMTVIEd3NTOH-eV12tRjrVXEXceAks5qghpEgaJpZM4Jjkzt .

webyrd / Barliman

Improve performance #12

t t)`, which is not the usual Scheme semantics.

t t), which is not the usual Scheme semantics.