Cache & profiling slow inference rules

MichaelBurge commented 6 years ago

Summary

I have code which repeatedly reduces to the first term returned from apply-reduction-relation until no more remain. Sometimes, a term causes a reduction in the list to heavily slow down, and I attempted to write code which automatically determines the reduction responsible.

However, the cache defeats this by making subsequent timing information unreliable. In the following program, J takes 5 seconds to reduce the first time, and then 0 seconds. This is because redex caches the computed output of J.

#lang racket

(require redex/reduction-semantics)

(define (time f)
  (define t0 (current-seconds))
  (f)
  (define t (- (current-seconds) t0))
  t
  )

(define-language L
  [x ::= #t #f]
  )

(define-judgment-form L
  #:mode (J I O)
  [(side-condition ,(sleep 5))
   ----
   (J #t #f)])

(time (λ () (apply-reduction-relation J #t))) ; Outputs 5
(time (λ () (apply-reduction-relation J #t))) ; Outputs 0

Workarounds

There are several workarounds:

Setting caching-enabled? to #f makes the example program above reliably output 5. However, it appears to change the runtime complexity of my larger model. A reduction that calculated in 10 seconds from a new process didn't finish in 40 minutes with caching disabled.

By moving the model to a submodule and creating a separate namespace for each run, the caches can be isolated.

(define (run-with-cache-isolation)
(parameterize ([ current-namespace (make-base-namespace) ])
(namespace-require 'redex/reduction-semantics)
(namespace-require '(submod "a.rkt" model))
(eval '(apply-reduction-relation J #t) (current-namespace))
))

I haven't yet tested this on the larger program.

One drawback is that the cache will be empty rather than initialized to the original state. If this is an issue, OS-level tools like taking a core dump could be used to exactly recreate the cache.

Possible solutions

Each rule in a judgment form or reduction relation could get its own global timer. The timer would be started and stopped while redex is checking whether the rule applies. There could be a function to retrieve the list of names and times.
The cache could be exposed as a parameter, allowing people to write uninvasive profiling tools by saving and restoring it.

rfindler commented 6 years ago

Just to be sure I understand, you are asking for more fine-grained control over the caches for judgment forms and metafunctions (but not the pattern matching caches)?

MichaelBurge commented 6 years ago

I suppose the true ask is "Suppose someone put a very inefficient judgment form or metafunction into a large model. How would you quickly identify the specific rule responsible?"

Exposing the cache might allow me to implement a tool to do this. But another alternative might be for redex to collect its own timing information. Or judgments could raise an error if they take longer than 2 minutes to evaluate on some input.

I haven't quantified each type of cache, but pattern-matching seems relevant: The motivating example was an O(n^n) reduction relation, caused by overusing an in-hole pattern.

rfindler commented 6 years ago

Unfortunately, I don't have a good answer for the true ask. I would do things slowly, running the program over and over with smaller inputs (on smaller parts of the model) until I figured it out. I think that being able to tweak the caches might actually lead you down wrong paths, finding performance problems that aren't the real one (because the caches are actually helping).

As for the actual performance bug that motivated this PR, can you share a distilled version of it?

MichaelBurge commented 6 years ago

Thanks for taking an interest in this, Robby. This one has O(2^n), not O(n^n), but it has a similar idea:

#lang racket

(require redex/reduction-semantics)

(define-language L
  [e ::= #t #f (+ e e)]
  [E ::= hole (+ E e) (+ e E)]
  )

(define simplify
  (reduction-relation
   L
   [==> (+ #t e) #t]
   [==> (+ e #t) #t]
   with
   [(--> (in-hole E a) (in-hole E b))
    (==> a b)]
   ))

(define-syntax-rule (duplicate x)
  (term (+ ,x ,x)))

(length (apply-reduction-relation*
         simplify
         (term ,(duplicate (duplicate (duplicate (duplicate (duplicate (duplicate (duplicate #t))))))))
         ))
;; Runs for a long time before returning 1 entry.

Here simplify is intended to repeatedly rewrite syntax until it reaches some normal form. But E has the wrong recursion strategy: One should instead define a context that recurses onto immediate children, rather than all descendants.

With such a context, it's possible to efficiently define 3 metafunctions:

direct-simplify: Checks for exact matches to the simplification pattern
transform-simplify: Does a bottom-up transformation of the tree using the pattern
rewrite-simplify: Repeatedly transforms the tree until no instance of the pattern matches anywhere

And transform-simplify and rewrite-simplify can be implemented efficiently in terms of direct-simplify as combinators.

E above must be avoided in apply-reduction-relation, redex-match, redex-match?, or judgment-holds, if the context is within a rule that is the fixed point of itself.

The problem that bit me was that redex-match? is "unsafe" because it appears to check all 512=2^9 nodes in t below, when the depth-first search I had in mind should only require 9 checks.

; redex-match? checks all terms
(define t (duplicate (duplicate (duplicate (duplicate (duplicate (duplicate (duplicate (duplicate (duplicate #t))))))))))

(define T0 (current-seconds))
(define counter 0)
(redex-match? L
              (side-condition (in-hole E #t) (set! counter (+ counter 1)))
              t)
(define T1 (current-seconds))
`(RESULT #:counter ,counter #:seconds ,(- T1 T0))
; Output: '(RESULT #:counter 512 #:seconds 3)

; judgment-holds also checks every term
(define-judgment-form L
  #:mode (contains-true I)
  #:contract (contains-true e)
  [(contains-true (in-hole E (side-condition #t (set! counter (+ counter 1)))))]
  )
; Output: '(RESULT #:counter 512 #:seconds 2)

; But the rules in this judgment-holds are equivalent to an E that only checks immediate children.
(define-judgment-form L
  #:mode (contains-true I)
  #:contract (contains-true e)
  [(contains-true #t)]
  [(contains-true e_1)
   (side-condition ,(set! counter (+ counter 1)))
   ----
   (contains-true (+ e_1 e_2))]
  [(contains-true e_2)
   (side-condition ,(set! counter (+ counter 1)))
   ----
   (contains-true (+ e_1 e_2))]
  )

(judgment-holds (contains-true ,t))
; Output: '(RESULT #:counter 18 #:seconds 0)

The O(N^N) runtime arose when E was used in a fixed-point rule: This implicitly touched all possible normalization paths, rather than just one. I corrected it using the direct, transform, rewrite pattern above.

racket / redex