runtimeverification / plutus-core-semantics

BSD 3-Clause "New" or "Revised" License
27 stars 5 forks source link

Run and assess KPlutus for large test files. #152

Open ChristianoBraga opened 2 years ago

ChristianoBraga commented 2 years ago

We are now approaching what could perhaps be called an alpha for KPlutus. We need to make sure that KPlutus behaves "as expected" for the large test files in benchmark-validation-examples and nofib-exe-examples. There is the always pending issue that uplc does not cover the complete language so the need to assess and report all limitations identified by our test harness.

This issue should benefit from #161. This issue is related with the the K issue #2755.

SchmErik commented 2 years ago

I've been running performance tests on fibonnaci on uplc and kplc. There's a fibonacci program in tests/textual/uplc-examples and I've run this program multiple times changing the parameter that represents the nth Fibonacci number to compute. I ran it for 0-30 incrementing by 2. uplc's performance is worse but I realized that there's some other takeaways from this data: one of them being uplc scaling better for fibonnaci numbers 0 - 16. Beyond that, the rate of change quickly increases and by the time it computes the 30th fib, the % change is about the same between uplc and kplc. When computing the 30th fibonacci, kplc is 78x faster than uplc but the rate of change between 30th fib and 28th fib is about the same.

nth fib kplc kplc % change uplc uplc % change kplc/uplc diff
0 0.13   0.03   4.333333333
2 0.14 7.692307692 0.03 0 4.666666667
4 0.15 7.142857143 0.03 0 5
6 0.15 0 0.03 0 5
8 0.16 6.666666667 0.03 0 5.333333333
10 0.18 12.5 0.03 0 6
12 0.23 27.77777778 0.03 0 7.666666667
14 0.35 52.17391304 0.03 0 11.66666667
16 0.68 94.28571429 0.03 0 22.66666667
18 1.54 126.4705882 0.05 66.66666667 30.8
20 3.77 144.8051948 0.07 40 53.85714286
22 9.62 155.1724138 0.15 114.2857143 64.13333333
24 24.93 159.1476091 0.35 133.3333333 71.22857143
26 64.97 160.6097072 0.86 145.7142857 75.54651163
28 169.92 161.5360936 2.21 156.9767442 76.88687783
30 444.71 161.7172787 5.7 157.918552 78.01929825
SchmErik commented 2 years ago

I've profiled 30th fib on kplc using perf. One suggestion by Everett was to run perf record for 10 seconds and compare this against a longer execution. The perf output between the two didn't show much significant difference in where time was being spent. I saw about a difference of 0.5% in some categories.

SchmErik commented 2 years ago

image

This image displays where time is being spent. I decided to decode the rules and found that this top rule is associated to this axiom:

  axiom{R} \implies{R} (
    \and{R}(
      \top{R}(),
      \and{R} (
          \in{SortTerm{}, R} (
            X0:SortTerm{},
            \and{SortTerm{}}(Lbl'LPar'builtin'UndsRParUnds'UPLC-SYNTAX'Unds'Term'Unds'BuiltinName{}(VarBn:SortBuiltinName{}),Var'Unds'Gen0:SortTerm{})
          ),
          \top{R} ()
        )),
    \equals{SortTerm{},R} (
      Lbl'Hash'decodeCBORBytestrings'LParUndsRParUnds'UPLC-CBOR-PARSER'Unds'Term'Unds'Term{}(X0:SortTerm{}),
     \and{SortTerm{}} (
       Var'Unds'Gen0:SortTerm{},
        \top{SortTerm{}}())))

This looks like the decodeCBORByteStrings function that's being called during program load. One way to improve performance to decode CBOR data only when it's seen rather than calling this function to traverse the entire program before semantic rules start applying.

** edit - looks like rule 1023 isn't the one I posted... I tried removing any calls to this rule and this still appears in the perf output...

SchmErik commented 2 years ago

Ok... Looks like decode CBOR function is adding TONs of overhead. I removed the call to decodeCBORByteStrings in uplc-configuration.md and the 30th fibonacci completed in 42 seconds. I'm not sure why this is adding so much overhead but we can probably move the call to the eval function so it gets invoked only when we encounter a data constant encoded in CBOR.

SchmErik commented 2 years ago

I'm unable to reproduce the previous comment. Today, removing this call shaves about 60 seconds or 25% for 30th fibonacci...

gtrepta commented 2 years ago

I can't reproduce it either. The 30th fibonacci runs in 494 and 485 seconds, with and without cbor decoding, respectively.

SchmErik commented 1 year ago

Depending on how https://github.com/runtimeverification/plutus-core-semantics/issues/324 is implemented, we may want to consider keeping this first pass to detect unclosed terms.