Run and assess KPlutus for large test files. - Githubissues

runtimeverification / plutus-core-semantics

BSD 3-Clause "New" or "Revised" License

27 stars 5 forks source link

Run and assess KPlutus for large test files. #152

Open ChristianoBraga opened 2 years ago

ChristianoBraga commented 2 years ago

We are now approaching what could perhaps be called an alpha for KPlutus. We need to make sure that KPlutus behaves "as expected" for the large test files in benchmark-validation-examples and nofib-exe-examples. There is the always pending issue that uplc does not cover the complete language so the need to assess and report all limitations identified by our test harness.

~~This issue should benefit from #161.~~ This issue is related with the the K issue #2755.

SchmErik commented 2 years ago

I've been running performance tests on fibonnaci on uplc and kplc. There's a fibonacci program in tests/textual/uplc-examples and I've run this program multiple times changing the parameter that represents the nth Fibonacci number to compute. I ran it for 0-30 incrementing by 2. uplc's performance is worse but I realized that there's some other takeaways from this data: one of them being uplc scaling better for fibonnaci numbers 0 - 16. Beyond that, the rate of change quickly increases and by the time it computes the 30th fib, the % change is about the same between uplc and kplc. When computing the 30th fibonacci, kplc is 78x faster than uplc but the rate of change between 30th fib and 28th fib is about the same.

nth fib	kplc	kplc % change	uplc	uplc % change	kplc/uplc diff
0	0.13		0.03		4.333333333
2	0.14	7.692307692	0.03	0	4.666666667
4	0.15	7.142857143	0.03	0	5
6	0.15	0	0.03	0	5
8	0.16	6.666666667	0.03	0	5.333333333
10	0.18	12.5	0.03	0	6
12	0.23	27.77777778	0.03	0	7.666666667
14	0.35	52.17391304	0.03	0	11.66666667
16	0.68	94.28571429	0.03	0	22.66666667
18	1.54	126.4705882	0.05	66.66666667	30.8
20	3.77	144.8051948	0.07	40	53.85714286
22	9.62	155.1724138	0.15	114.2857143	64.13333333
24	24.93	159.1476091	0.35	133.3333333	71.22857143
26	64.97	160.6097072	0.86	145.7142857	75.54651163
28	169.92	161.5360936	2.21	156.9767442	76.88687783
30	444.71	161.7172787	5.7	157.918552	78.01929825

SchmErik commented 2 years ago

I've profiled 30th fib on kplc using perf. One suggestion by Everett was to run perf record for 10 seconds and compare this against a longer execution. The perf output between the two didn't show much significant difference in where time was being spent. I saw about a difference of 0.5% in some categories.

SchmErik commented 2 years ago

This image displays where time is being spent. I decided to decode the rules and found that this top rule is associated to this axiom:

  axiom{R} \implies{R} (
    \and{R}(
      \top{R}(),
      \and{R} (
          \in{SortTerm{}, R} (
            X0:SortTerm{},
            \and{SortTerm{}}(Lbl'LPar'builtin'UndsRParUnds'UPLC-SYNTAX'Unds'Term'Unds'BuiltinName{}(VarBn:SortBuiltinName{}),Var'Unds'Gen0:SortTerm{})
          ),
          \top{R} ()
        )),
    \equals{SortTerm{},R} (
      Lbl'Hash'decodeCBORBytestrings'LParUndsRParUnds'UPLC-CBOR-PARSER'Unds'Term'Unds'Term{}(X0:SortTerm{}),
     \and{SortTerm{}} (
       Var'Unds'Gen0:SortTerm{},
        \top{SortTerm{}}())))

This looks like the decodeCBORByteStrings function that's being called during program load. One way to improve performance to decode CBOR data only when it's seen rather than calling this function to traverse the entire program before semantic rules start applying.

** edit - looks like rule 1023 isn't the one I posted... I tried removing any calls to this rule and this still appears in the perf output...

SchmErik commented 2 years ago

Ok... Looks like decode CBOR function is adding TONs of overhead. I removed the call to decodeCBORByteStrings in uplc-configuration.md and the 30th fibonacci completed in 42 seconds. I'm not sure why this is adding so much overhead but we can probably move the call to the eval function so it gets invoked only when we encounter a data constant encoded in CBOR.

SchmErik commented 2 years ago

I'm unable to reproduce the previous comment. Today, removing this call shaves about 60 seconds or 25% for 30th fibonacci...

gtrepta commented 2 years ago

I can't reproduce it either. The 30th fibonacci runs in 494 and 485 seconds, with and without cbor decoding, respectively.

SchmErik commented 1 year ago

Depending on how https://github.com/runtimeverification/plutus-core-semantics/issues/324 is implemented, we may want to consider keeping this first pass to detect unclosed terms.