racket / redex

Other
93 stars 36 forks source link

Severe space time leak in matcher #236

Closed wilbowma closed 3 years ago

wilbowma commented 3 years ago
I've been using Redex to generate contracts in my compilers course, and have noticed a severe performance problem when running a lot of small compiles in a loop. Click to see a trace with increasing time/memory usage: ``` > (for ([j (in-range 1 100)]) (displayln (current-memory-use)) (time (compile '(module (procedure? eq?))))) 722112416 cpu time: 15 real time: 15 gc time: 2 725143072 cpu time: 16 real time: 16 gc time: 1 719356368 cpu time: 29 real time: 29 gc time: 11 724586656 cpu time: 22 real time: 22 gc time: 1 723731968 cpu time: 26 real time: 26 gc time: 1 725415824 cpu time: 30 real time: 30 gc time: 1 722358080 cpu time: 35 real time: 35 gc time: 2 722489168 cpu time: 37 real time: 38 gc time: 2 724774288 cpu time: 76 real time: 76 gc time: 35 721629056 cpu time: 46 real time: 46 gc time: 2 722193952 cpu time: 50 real time: 50 gc time: 2 725567776 cpu time: 56 real time: 56 gc time: 3 724761680 cpu time: 62 real time: 63 gc time: 5 719263056 cpu time: 65 real time: 66 gc time: 3 726143248 cpu time: 66 real time: 66 gc time: 3 727382480 cpu time: 72 real time: 72 gc time: 5 722920768 cpu time: 75 real time: 75 gc time: 3 722847808 cpu time: 81 real time: 81 gc time: 5 725919056 cpu time: 83 real time: 84 gc time: 4 723725104 cpu time: 95 real time: 95 gc time: 13 724319136 cpu time: 93 real time: 93 gc time: 4 730131200 cpu time: 98 real time: 98 gc time: 5 730923408 cpu time: 101 real time: 102 gc time: 5 726999488 cpu time: 107 real time: 107 gc time: 7 725782048 cpu time: 108 real time: 109 gc time: 6 728470560 cpu time: 119 real time: 119 gc time: 5 726525328 cpu time: 126 real time: 127 gc time: 11 727716864 cpu time: 136 real time: 138 gc time: 8 733089904 cpu time: 133 real time: 134 gc time: 6 732990816 cpu time: 130 real time: 130 gc time: 5 728847600 cpu time: 136 real time: 136 gc time: 7 736354528 cpu time: 146 real time: 147 gc time: 14 738446608 cpu time: 144 real time: 145 gc time: 7 736186736 cpu time: 146 real time: 147 gc time: 7 738137728 cpu time: 159 real time: 159 gc time: 9 734724160 cpu time: 166 real time: 167 gc time: 14 735701136 cpu time: 161 real time: 161 gc time: 6 739562016 cpu time: 171 real time: 173 gc time: 9 739274624 cpu time: 176 real time: 176 gc time: 8 742621520 cpu time: 183 real time: 183 gc time: 14 740356912 cpu time: 179 real time: 179 gc time: 9 741904752 cpu time: 185 real time: 186 gc time: 8 747294704 cpu time: 189 real time: 190 gc time: 9 747461360 cpu time: 202 real time: 202 gc time: 14 742952240 cpu time: 200 real time: 201 gc time: 9 750729280 cpu time: 202 real time: 202 gc time: 9 744976928 cpu time: 211 real time: 212 gc time: 15 750468448 cpu time: 209 real time: 209 gc time: 10 752012928 cpu time: 213 real time: 213 gc time: 9 748830944 cpu time: 218 real time: 218 gc time: 9 757345696 cpu time: 238 real time: 240 gc time: 18 752616640 cpu time: 237 real time: 237 gc time: 12 759107696 cpu time: 234 real time: 234 gc time: 10 761999712 cpu time: 243 real time: 244 gc time: 17 758880720 cpu time: 240 real time: 241 gc time: 10 759883696 cpu time: 251 real time: 252 gc time: 16 765128752 cpu time: 255 real time: 255 gc time: 12 764538096 cpu time: 260 real time: 260 gc time: 13 767477904 cpu time: 267 real time: 267 gc time: 19 766604976 cpu time: 265 real time: 267 gc time: 13 768135152 cpu time: 274 real time: 275 gc time: 13 766458576 cpu time: 288 real time: 289 gc time: 20 767209472 cpu time: 287 real time: 288 gc time: 15 771805072 cpu time: 287 real time: 287 gc time: 17 771857968 cpu time: 292 real time: 293 gc time: 14 777203088 cpu time: 302 real time: 303 gc time: 19 777480464 cpu time: 300 real time: 300 gc time: 14 781248944 ```
Profiling suggests the time is spent in the Redex matcher, particularly match-pattern? .../redex/private/matcher.rkt:593:0. Click here to see profile information: ``` > (profile (compile '(module #t))) Profiling results ----------------- Total cpu time observed: 703ms (out of 737ms) Number of samples taken: 14 (once every 50ms) ============================================================================================================================================================= Caller Idx Total Self Name+src Local% ms(pct) ms(pct) Callee ============================================================================================================================================================= [1] 703(100.0%) 0(0.0%) eventspace-handler-thread-proc ...queue.rkt:370:11 call-with-break-parameterization [2] 100.0% ------------------------------------------------------------------------------------------------------------------------------------------------------------- eventspace-handler-thread-proc [1] 100.0% [2] 703(100.0%) 0(0.0%) call-with-break-parameterization ...heme.rkt:148:2 ??? [3] 100.0% ------------------------------------------------------------------------------------------------------------------------------------------------------------- call-with-break-parameterization [2] 100.0% [3] 703(100.0%) 0(0.0%) ??? ...lib/mred/private/wx/common/queue.rkt:486:32 ??? [4] 100.0% ------------------------------------------------------------------------------------------------------------------------------------------------------------- ??? [3] 100.0% [4] 703(100.0%) 0(0.0%) ??? ...-lib/mred/private/wx/common/queue.rkt:435:6 ??? [5] 100.0% ------------------------------------------------------------------------------------------------------------------------------------------------------------- ??? [4] 100.0% [5] 703(100.0%) 0(0.0%) ??? ...t-dev/racket/collects/racket/repl.rkt:11:26 body of "/home/wilbowma/workspace/compilers2.0/cpsc411-priv/cpsc411-reference-lib/cpsc411/reference/a10-solution.rkt" [6]100.0% ------------------------------------------------------------------------------------------------------------------------------------------------------------- ??? [5] 100.0% [6] 703(100.0%) 0(0.0%) body of "/home/wilbowma/workspace/compilers2.0/cpsc411-priv/cpsc411-reference-lib/cpsc411/reference/a10-solution.rkt" ... ??? [7] 100.0% ------------------------------------------------------------------------------------------------------------------------------------------------------------- body of "/home/wilbowma/workspace/compilers2.0/cpsc411-priv/cpsc411-reference-lib/cpsc411/reference/a10-solution.rkt" [6]100.0% [7] 703(100.0%) 0(0.0%) ??? ...acket/share/pkgs/profile-lib/main.rkt:40:10 ??? [8] 100.0% ------------------------------------------------------------------------------------------------------------------------------------------------------------- ??? [7] 10.4% ??? [8] 89.6% [8] 703(100.0%) 0(0.0%) ??? ...ket/collects/racket/private/list.rkt:363:24 ??? [8] 89.6% ??? [9] 9.0% kw-chaperone [10] 1.4% ------------------------------------------------------------------------------------------------------------------------------------------------------------- ??? [8] 100.0% [9] 652(92.8%) 0(0.0%) ??? ...tract/private/arrow-higher-order.rkt:375:33 ??? [11] 76.9% ??? [13] 23.1% ------------------------------------------------------------------------------------------------------------------------------------------------------------- ??? [8] 100.0% [10] 50(7.2%) 0(0.0%) kw-chaperone ...ects/racket/private/kw.rkt:1945:33 ??? [12] 100.0% ------------------------------------------------------------------------------------------------------------------------------------------------------------- ??? [11] 29.9% ??? [9] 70.1% [11] 502(71.4%) 0(0.0%) ??? .../private/arrow-higher-order.rkt:375:33 ??? [13] 65.1% ??? [11] 29.9% f173 [32] 5.0% ------------------------------------------------------------------------------------------------------------------------------------------------------------- kw-chaperone [10] 100.0% [12] 50(7.2%) 0(0.0%) ??? .../private/arrow-higher-order.rkt:387:44 ??? [13] 100.0% ------------------------------------------------------------------------------------------------------------------------------------------------------------- ??? [12] 7.7% ??? [9] 23.0% ??? [11] 69.2% [13] 653(92.9%) 0(0.0%) ??? ...ects/racket/contract/private/guts.rkt:771:8 ??? [14] 100.0% ------------------------------------------------------------------------------------------------------------------------------------------------------------- ??? [13] 100.0% [14] 653(92.9%) 0(0.0%) ??? ...contract/private/arrow-val-first.rkt:486:18 match-pattern? [15] 100.0% ------------------------------------------------------------------------------------------------------------------------------------------------------------- ??? [14] 100.0% [15] 653(92.9%) 0(0.0%) match-pattern? .../redex/private/matcher.rkt:593:0 match-pattern [16] 100.0% ```

I'm trying to isolate a small counter example with the same behaviour, but so far I've failed. Not sure if there's some optimizations that are defeating me in the small cases that can't trigger in my compiler. Here's a small example that is morally the same as my compile function, but doesn't yet cause the same performance problem:

#lang racket

(require
 redex/reduction-semantics
 racket/contract)

(provide (all-defined-out))

(define-language testL
  [e v (e e ...)]
  [v (side-condition any_1 (equal? 'procedure? (term any_1)))
     (side-condition any_1 (equal? 'eq? (term any_1)))
     (side-condition any_1 (equal? 'module (term any_1)))])

(define (testL? x)
  (redex-match? testL e x))

(define/contract (foo x)
  (-> testL? testL?)
  x)

(for ([j (in-range 1 100)])
  (time ((apply compose (list foo foo foo foo)) '(module (procedure? eq?)))))

I'll keep working on a minimal example.

rfindler commented 3 years ago

Redex has caches that probably explain the behavior you're seeing.

wilbowma commented 3 years ago

If I disable caching, then it just takes forever every time. If I limit the cache sizes, the behaviour doesn't seem to change, which is weird.

wilbowma commented 3 years ago

I lie: disabling caching also doesn't affect the behaviour(?):

> (caching-enabled? #f)
> (for ([j (in-range 1 100)])
  (time (compile '(module (procedure? eq?)))))
cpu time: 571 real time: 572 gc time: 6
cpu time: 597 real time: 599 gc time: 5
cpu time: 638 real time: 639 gc time: 6
cpu time: 672 real time: 680 gc time: 6
cpu time: 719 real time: 722 gc time: 6
cpu time: 739 real time: 740 gc time: 7
cpu time: 771 real time: 772 gc time: 7
cpu time: 810 real time: 812 gc time: 7
cpu time: 841 real time: 843 gc time: 8
cpu time: 881 real time: 883 gc time: 8
cpu time: 918 real time: 919 gc time: 9
rfindler commented 3 years ago

I think that cache disables only the metafunction caching. There is also caching happening in the pattern matcher. I found it.

It is this call to memoize.

What's going on is there is a cache that maps patterns (as s-expressions) to functions to do the matching. Then the memoize call above adds that memoized function into the cache so each time we hit a specific non-terminal, they'll all share that same cache.

(I hope I'm not forgetting another place :)

I agree this behavior is reasonably considered buggy, btw. Welcome suggestions on what to do! (My first instinct is to just reset the cache when it gets large via a check in the memoize function, but I'm not sure.)

wilbowma commented 3 years ago

The docs suggests this parameter also control pattern matching caches:

When this parameter is #t (the default), Redex caches the results of pattern matching, metafunction, and judgment-form evaluation ...
wilbowma commented 3 years ago

Given the behaviour I'm seeing, my preference would be the ability to limit and disable that cache completely. It looks like it would be way faster (in my use case) without the cache, since it starts very fast and then quickly takes 10x longer.

Not sure if this would require new parameters. My reading of the docs suggests the existing parameters ought already do this, unless I'm misunderstanding something.

rfindler commented 3 years ago

It would be nice to know if removing that call to memoize actually did solve the problem, tho.

I guess I remembered the spec of that thing wrongly! Sounds reasonable to make the docs true to me, as long as nothing falls over.

wilbowma commented 3 years ago

The memoize implementation seems to respect the caching-enabled? parameter: https://github.com/racket/redex/blob/929e538115b1f981b2801a8a9778941c41053e82/redex-lib/redex/private/matcher.rkt#L1175, so I'm not sure that's the root cause.

I'll try removing the memoize and running some experiments in a bit.

rfindler commented 3 years ago

ah right, thanks.

If you get stuck minimizing, just send me some code.

rfindler commented 3 years ago

Here's a thought: is your code compiling fresh patterns over and over somehow? Adding this:

(printf ">> ~s\n" (hash-count compiled-pattern-cache))

just below the line with (hash-set! compiled-pattern-cache pattern val) would tell us if that's the one that's growing.

rfindler commented 3 years ago

Oh, actually, something like this is probably better, just to clarify if a single one is growing or not in case there is a lot going on:

(printf ">> ~s ~s\n" (eq-hash-code compiled-pattern-cache) (hash-count compiled-pattern-cache))
wilbowma commented 3 years ago

Nope, only print outs when the module is loaded, none when I run compile.

wilbowma commented 3 years ago

Removing memoize also doesn't solve it.

wilbowma commented 3 years ago

https://github.com/racket/redex/blob/929e538115b1f981b2801a8a9778941c41053e82/redex-lib/redex/private/matcher.rkt#L974

Looking more at the profiling info suggests the above line might be the key ``` Profiling results ----------------- Total cpu time observed: 5481ms (out of 5518ms) Number of samples taken: 109 (once every 50ms) (Hiding functions with self<1.0% and local<2.0%: 2 of 74 hidden) =========================================================================================== Caller Idx Total Self Name+src Local% ms(pct) ms(pct) Callee =========================================================================================== [1] 5481(100.0%) 0(0.0%) eventspace-handler-thread-proc ...queue.rkt:370:11 call-with-break-parameterization [3] 100.0% ------------------------------------------------------------------------------------------- match-nt/boolean [71] 100.0% [2] 4778(87.2%) 502(9.2%) call-nt-proc/bool ...ex/private/matcher.rkt:1849:0 ??? [4] 76.3% remove-bindings/filter [6] 5.3% loop [5] 5.3% loop [10] 3.2% ------------------------------------------------------------------------------------------- eventspace-handler-thread-proc [1] 100.0% [3] 5481(100.0%) 0(0.0%) call-with-break-parameterization ...heme.rkt:148:2 ??? [7] 100.0% ------------------------------------------------------------------------------------------- call-nt-proc/bool [2] 100.0% [4] 3722(67.9%) 50(0.9%) ??? .../redex-lib/redex/private/matcher.rkt:974:12 filter [8] 66.2% filter-multiples [9] 23.0% loop [10] 5.4% reverse [29] 2.7% ??? [67] 1.3% ------------------------------------------------------------------------------------------- call-nt-proc/bool [2] 47.4% loop [5] 52.6% [5] 402(7.3%) 100(1.8%) loop ...redex-lib/redex/private/matcher.rkt:1664:6 loop [5] 52.6% ??? [67] 22.5% ------------------------------------------------------------------------------------------- call-nt-proc/bool [2] 100.0% [6] 302(5.5%) 0(0.0%) remove-bindings/filter ...ivate/matcher.rkt:1879:0 filter-multiples [9] 100.0% - ```

This sort of makes sense, as this seems related to processing side-conditions, and my generated languages have lots of side conditions.

wilbowma commented 3 years ago

... okay this may actually be my own fault. I found a bug in my compiler that caused it to basically insert some (dead) code from previous runs in the program it was compiling.

wilbowma commented 3 years ago

My own fault; a bug in my code caused linear (in the number of calls) code duplication during the dynamic extent of the thread, and this caused a linear increase in the size of the input to Redex, and profiling suggested Redex just because contract checking via Redex is the most expensive part of my compiler.

rfindler commented 3 years ago

Glad you figured it out!