simonmichael / hledger

Robust, fast, intuitive plain text accounting tool with CLI, TUI and web interfaces.
https://hledger.org
GNU General Public License v3.0
3.03k stars 321 forks source link

optimisation: some things seem a bit slower since 1.25 #2122

Open simonmichael opened 11 months ago

simonmichael commented 11 months ago

I tend to see this (register much faster, but print and balance a bit slower) in recent releases:

$ make bench BENCHEXES=hledger-1.25,hledger-1.26,hledger-1.31,hledger-1.32 

run quick performance benchmarks in bench.sh (approximate, can be skewed):
Running 5 tests 1 times with 4 executables at 2023-12-02 10:11:59 HST:

Best times:
+----------------------------------------------------++--------------+--------------+--------------+--------------+
|                                                    || hledger-1.25 | hledger-1.26 | hledger-1.31 | hledger-1.32 |
+====================================================++==============+==============+==============+==============+
| -f examples/10000x1000x10.journal print            ||         0.62 |         0.69 |         0.81 |         0.77 |
| -f examples/10000x1000x10.journal register         ||         9.37 |         3.21 |         3.44 |         2.74 |
| -f examples/10000x1000x10.journal balance          ||         0.47 |         0.50 |         0.73 |         0.74 |
| -f examples/1000x1000x10.journal balance --weekly  ||         0.66 |         0.76 |         1.02 |         0.81 |
| -f examples/10000x1000x10.journal balance --weekly ||         9.31 |         9.04 |        10.34 |         9.63 |
+----------------------------------------------------++--------------+--------------+--------------+--------------+

More notably, right now on this machine I can see ~25k TPS with hledger 1.25 stats, but only ~15k TPS with 1.32. A new round of profiling could be in order.

Related:

simonmichael commented 11 months ago

Are those all compiled with the same GHC version ? Actually no, those are built with 9.0.2, 9.2.2, 9.4.6 and 9.6.3 respectively. So it's not quite a fair test.

They are all ARM binaries, and all optimised builds (I'm fairly certain).

simonmichael commented 8 months ago

Profiling and some performance fixes were done as part of #2153. Summary: 1.32.3 is much faster than 1.29-1.32, but still slower than 1.25 (which is somewhat expected since it is providing more powerful features in the journal format).

$ for v in 1.25 1.26 1.28 1.29 1.30 1.31 1.32 1.32.3; do printf "$v:\t"; hledger-$v -f examples/10ktxns-10kaccts.journal stats | grep throughput; done
1.25:   Run time (throughput)    : 0.93s (10735 txns/s)
1.26:   Run time (throughput)    : 1.18s (8501 txns/s)
1.28:   Run time (throughput)    : 1.16s (8607 txns/s)
1.29:   Run time (throughput)    : 3.73s (2684 txns/s)
1.30:   Run time (throughput)    : 6.13s (1631 txns/s)
1.31:   Run time (throughput)    : 6.77s (1478 txns/s)
1.32:   Run time (throughput)    : 5.95s (1680 txns/s)
1.32.3: Run time (throughput)    : 1.60s (6239 txns/s)
simonmichael commented 8 months ago

See #2153 for some recent benchmarking data. That issue fixed a major performance drop for unusually large numbers of accounts from 1.29 onward. On this one we'll continue discussing the remaining, lesser performance drop since 1.25. This one can be argued as a wish rather than a bug or regression since absence of (reasonable) performance fluctuations has never been promised. I think as part of resolving this issue though we should implement performance regression testing, and try to avoid or at least document any performance drops going forward.

markokocic commented 8 months ago

Well, it's true that performance was never guaranteed, but I'd still consider it a regression, since hledger became slower, not faster since 1.25. With my ledger file, with 17k transaction and 470 accounts, I get similar results as in your last comment, runtime for balance is somewhere around ~1,5 times slower in git head than in 1.25. It may not be much, but it's enough to cross the subjective threshold of annoyance.

I know that there have been a lot of features added to hledger since 1.25 that explain the general slowdown. But then, there is this philosophical question should a feature add its weight in terms of performance even if not used? Could I get a much faster stripped down version with only basic features for a daily use, and have slower execution only when I actually use some advanced or heavier functions?

As far as I'm concerned, I can keep using 1.25 for the next few years until my journal file grows enough for me to reach the annoyance threshold and have to revisit again. Or split the file and get on with the life.

simonmichael commented 8 months ago

Noted @markokocic, thanks for your input as one of the (few?) users running large >10k-txn files. I agree it's arguable, and it's at least a bug in our process that it took so long to discover. But I feel like letting this one escape the dreaded black label on a technicality. There's no bounty involved since I reported it first so I'm assuming it's harmless to do so. Certainly it'll be considered a regression if it happens again, and certainly I hope we'll find ways to bring performance back to 1.25 level and beyond.

simonmichael commented 8 months ago

PS any ideas for simple robust automated performance testing ? I'm thinking about:

simonmichael commented 8 months ago

PS I recently added memory usage to the stats output, that could be interesting to test also.

I experimented with printing CPU time rather than elapsed time but it seems unlikely to be needed.

markokocic commented 8 months ago

I don't think saving results to file would help, since the run results may vary across runs. Maybe have a script that will do benchmark against ledger, hledger-1.25 and previous release which can be run on demand or on a ci.

Btw, I just checked a new stats command. Maybe you want to split memory data to a separate line for a better readability.

Regarding speed, I finally managed to fix my journal to its format can be read directly by toht ledger and hledger, and here's the quick comparison with the latest from git.

marko@fine ~/s/cvstree> time hledger bal > /dev/null

________________________________________________________
Executed in    3.89 secs    fish           external
   usr time    3.60 secs    1.21 millis    3.60 secs
   sys time    0.31 secs    0.00 millis    0.31 secs

marko@fine ~/s/cvstree> time ledger bal > /dev/null

________________________________________________________
Executed in  382.45 millis    fish           external
   usr time  309.59 millis    1.46 millis  308.13 millis
   sys time   72.52 millis    0.49 millis   72.03 millis

I like hledger usability more, but wish it would be possible to approach ledger balance performance as close as possible :)

Also, I don't care if the ticket is a regression or whoever reported it, at least not in a sense of bounty, I'd only wish some performance back.