Open ChristianoBraga opened 2 years ago
I've been running performance tests on fibonnaci on uplc and kplc. There's a fibonacci program in tests/textual/uplc-examples
and I've run this program multiple times changing the parameter that represents the nth Fibonacci number to compute. I ran it for 0-30 incrementing by 2. uplc's performance is worse but I realized that there's some other takeaways from this data: one of them being uplc scaling better for fibonnaci numbers 0 - 16. Beyond that, the rate of change quickly increases and by the time it computes the 30th fib, the % change is about the same between uplc and kplc. When computing the 30th fibonacci, kplc is 78x faster than uplc but the rate of change between 30th fib and 28th fib is about the same.
nth fib | kplc | kplc % change | uplc | uplc % change | kplc/uplc diff |
---|---|---|---|---|---|
0 | 0.13 | 0.03 | 4.333333333 | ||
2 | 0.14 | 7.692307692 | 0.03 | 0 | 4.666666667 |
4 | 0.15 | 7.142857143 | 0.03 | 0 | 5 |
6 | 0.15 | 0 | 0.03 | 0 | 5 |
8 | 0.16 | 6.666666667 | 0.03 | 0 | 5.333333333 |
10 | 0.18 | 12.5 | 0.03 | 0 | 6 |
12 | 0.23 | 27.77777778 | 0.03 | 0 | 7.666666667 |
14 | 0.35 | 52.17391304 | 0.03 | 0 | 11.66666667 |
16 | 0.68 | 94.28571429 | 0.03 | 0 | 22.66666667 |
18 | 1.54 | 126.4705882 | 0.05 | 66.66666667 | 30.8 |
20 | 3.77 | 144.8051948 | 0.07 | 40 | 53.85714286 |
22 | 9.62 | 155.1724138 | 0.15 | 114.2857143 | 64.13333333 |
24 | 24.93 | 159.1476091 | 0.35 | 133.3333333 | 71.22857143 |
26 | 64.97 | 160.6097072 | 0.86 | 145.7142857 | 75.54651163 |
28 | 169.92 | 161.5360936 | 2.21 | 156.9767442 | 76.88687783 |
30 | 444.71 | 161.7172787 | 5.7 | 157.918552 | 78.01929825 |
I've profiled 30th fib on kplc using perf. One suggestion by Everett was to run perf record
for 10 seconds and compare this against a longer execution. The perf output between the two didn't show much significant difference in where time was being spent. I saw about a difference of 0.5% in some categories.
This image displays where time is being spent. I decided to decode the rules and found that this top rule is associated to this axiom:
axiom{R} \implies{R} (
\and{R}(
\top{R}(),
\and{R} (
\in{SortTerm{}, R} (
X0:SortTerm{},
\and{SortTerm{}}(Lbl'LPar'builtin'UndsRParUnds'UPLC-SYNTAX'Unds'Term'Unds'BuiltinName{}(VarBn:SortBuiltinName{}),Var'Unds'Gen0:SortTerm{})
),
\top{R} ()
)),
\equals{SortTerm{},R} (
Lbl'Hash'decodeCBORBytestrings'LParUndsRParUnds'UPLC-CBOR-PARSER'Unds'Term'Unds'Term{}(X0:SortTerm{}),
\and{SortTerm{}} (
Var'Unds'Gen0:SortTerm{},
\top{SortTerm{}}())))
This looks like the decodeCBORByteStrings function that's being called during program load. One way to improve performance to decode CBOR data only when it's seen rather than calling this function to traverse the entire program before semantic rules start applying.
** edit - looks like rule 1023 isn't the one I posted... I tried removing any calls to this rule and this still appears in the perf output...
Ok... Looks like decode CBOR function is adding TONs of overhead. I removed the call to decodeCBORByteStrings in uplc-configuration.md and the 30th fibonacci completed in 42 seconds. I'm not sure why this is adding so much overhead but we can probably move the call to the eval
function so it gets invoked only when we encounter a data constant encoded in CBOR.
I'm unable to reproduce the previous comment. Today, removing this call shaves about 60 seconds or 25% for 30th fibonacci...
I can't reproduce it either. The 30th fibonacci runs in 494 and 485 seconds, with and without cbor decoding, respectively.
Depending on how https://github.com/runtimeverification/plutus-core-semantics/issues/324 is implemented, we may want to consider keeping this first pass to detect unclosed terms.
We are now approaching what could perhaps be called an alpha for KPlutus. We need to make sure that KPlutus behaves "as expected" for the large test files in
benchmark-validation-examples
andnofib-exe-examples
. There is the always pending issue thatuplc
does not cover the complete language so the need to assess and report all limitations identified by our test harness.This issue should benefit from #161.This issue is related with the the K issue #2755.