lbryio / lbrycrd

The blockchain that provides the digital content namespace for the LBRY protocol
https://lbry.com
MIT License
2.57k stars 178 forks source link

lbrycrdd block check performance degrades non-linearly with txset #389

Open hashbender opened 4 years ago

hashbender commented 4 years ago

Background I run Luxor mining pool. Since ~5/21 we've seen larger numbers of transactions on the network, which is a good thing, but we've seen this impact the performance of lbrycrdd significantly.

Issue Blocks should not take >1s to Connect. If they do, this causes a huge problem for miners, especially for short block times like Lbry.

Expectation Performance in the same order of magnitude as BTC -

Bench output from lbrycrdd for block 771809

2020-05-28T16:11:32Z   - Load block from disk: 18.95ms [2.92s]
2020-05-28T16:11:32Z     - Sanity checks: 1.92ms [1.32s (0.88ms/blk)]
2020-05-28T16:11:32Z     - Fork checks: 0.05ms [0.08s (0.05ms/blk)]
2020-05-28T16:11:34Z       - Connect 398 transactions: 1822.49ms (4.579ms/tx, 2.925ms/txin) [327.93s (217.75ms/blk)]
2020-05-28T16:11:34Z     - Verify 623 txins: 1822.81ms (2.926ms/txin) [328.06s (217.84ms/blk)]
2020-05-28T16:11:34Z     - Index writing: 11.27ms [8.90s (5.91ms/blk)]
2020-05-28T16:11:34Z     - Callbacks: 0.05ms [0.09s (0.06ms/blk)]
2020-05-28T16:11:34Z   - Connect total: 1836.48ms [338.82s (224.98ms/blk)]
2020-05-28T16:11:34Z TrieCache size: 30670 nodes on block 771810, batch writes 3671856 bytes.
2020-05-28T16:11:34Z   - Flush: 95.29ms [68.03s (45.17ms/blk)]
2020-05-28T16:11:34Z   - Writing chainstate: 0.06ms [0.09s (0.06ms/blk)]
2020-05-28T16:11:34Z UpdateTip: new best=f46a34be4f5c791cf0b1d8667bae134c7eb10ee3dcfd89f04514d2e44a39cbdf height=771809 version=0x20000000 log2_work=73.716032 txb=398 tx=13122143 date='2020-05-28T15:47:20Z' progress=0.937661 cache=176.6MiB(548393txo)
2020-05-28T16:11:34Z   - Connect postprocess: 0.82ms [2.73s (1.81ms/blk)]
2020-05-28T16:11:34Z - Connect block: 1951.61ms [412.60s (273.97ms/blk)]

Bench output for a similar block from btc (height 632057)

2020-05-28T16:32:32Z   - Load block from disk: 26.86ms [12.35s]
2020-05-28T16:32:32Z     - Sanity checks: 6.81ms [1.82s (2.91ms/blk)]
2020-05-28T16:32:32Z     - Fork checks: 0.10ms [0.04s (0.06ms/blk)]
2020-05-28T16:32:33Z       - Connect 3034 transactions: 517.85ms (0.171ms/tx, 0.091ms/txin) [466.69s (746.70ms/blk)]
2020-05-28T16:32:33Z     - Verify 5688 txins: 518.47ms (0.091ms/txin) [471.33s (754.12ms/blk)]
2020-05-28T16:32:33Z     - Index writing: 18.54ms [11.76s (18.82ms/blk)]
2020-05-28T16:32:33Z     - Callbacks: 0.18ms [0.06s (0.10ms/blk)]
2020-05-28T16:32:33Z   - Connect total: 545.06ms [485.48s (776.77ms/blk)]
2020-05-28T16:32:33Z   - Flush: 14.22ms [8.43s (13.48ms/blk)]
2020-05-28T16:32:33Z   - Writing chainstate: 0.08ms [0.05s (0.07ms/blk)]
2020-05-28T16:32:33Z UpdateTip: new best=00000000000000000009a30ace6dd38643db1798e34c15a8119e067ee0d4eed9 height=632057 version=0x20c00000 log2_work=91.984454 tx=533818995 date='2020-05-28T16:18:15Z' progress=0.999994 cache=384.0MiB(2932736txo) warning='74 of last 100 blocks have unexpected version'
2020-05-28T16:32:33Z   - Connect postprocess: 6.06ms [7.94s (12.70ms/blk)]
2020-05-28T16:32:33Z - Connect block: 592.27ms [514.24s (822.78ms/blk)]

The most concerning problem is the tx validation speed. You'll notice it's 50x slower on LBRY than BTC.

Further, it seems to get worse as more transactions are added. Notice lbry block 771819 only has 55, and processes at 0.493 ms / txin.
Verify 55 txins: 27.14ms (0.493ms/txin)

This leads me to believe there is some non-linear regression

Reproducer I can reliably reproduce this issue by running the lbrycrdd daemon with -debug=bench and see how long the block connect takes.

Version: LBRYcrd Core Daemon version v0.17.3.2-be118de

Machine 2015 Macbook Pro, 2.8 GHz Quad-Core Intel Core i7 16 GB 1600 MHz DDR3

Config Default config.

BrannonKing commented 4 years ago

@nitronick600 , I can look into this. In the meantime will you please run the same timing test with v0.17.4.5 and v0.19.1.2 ? You will have to reindex once when going to 17.4.5 but not when going to v19 from there. Run with -dbcache=1200 on those.

hashbender commented 4 years ago

Alright, I'll run the reindex now for 17.4.5

If I recall correctly, we need to run v0.17.3.2 because of a consensus issue in the later versions?

hashbender commented 4 years ago

This is also causing the getblocktemplate RPC to take multiple seconds to return a value.

BrannonKing commented 4 years ago

The consensus issue in 17.4.x is repaired in 17.4.5. It is marked as pre-release so that we could get some more internal miles on it, increase our confidence in it.

hashbender commented 4 years ago

"CreateNewBlock() packages: 1.56ms (457 packages, 19 updated descendants), validity: 6329.01ms (total 6330.57ms)"

This block generation time is pretty common now.

BrannonKing commented 4 years ago

I want to clarify a few things:

  1. In both codebases the "connect x transactions" time is included in the "verify x txins" on the line after. That's probably a bug.
  2. In the "connect x transactions" phase, lbrycrd does substantially more work than bitcoin. lbrycrd parses out all the claims included in the transactions and adds them to the claim trie (or removes them in the case of a spend). It then does an "increment block" phase on the trie where it updates any claims that are becoming active or expiring, and handles any new takeover height records.
  3. In the "validity" stage of CreateNewBlock, it does all that claim trie work just mentioned for the new block template.
  4. I'm certain that it's possible to optimize the trie code more, but you should know that a lot of work has gone into optimizing it already. I believe that addressing #386 would help.
bvbfan commented 4 years ago

computeNodeHash is most expensive function as well as getMerkleHash according to Valgrind cachegrind. Also if we move connected transactions bench log before merkle hash check, we can notice merkle is a 2 to 10x slower

- Connect 416 transactions: 204.59ms (0.492ms/tx, 0.401ms/txin) [7.63s (95.43ms/blk)]
2020-05-29T08:37:21Z 
- Verify 510 txins: 571.33ms (1.120ms/txin) [16.19s (202.32ms/blk)]

Verify is exactly how long is merkle hash computation, that's on v17_master, sqlite backend. If i recall my attempt to use one query with step/final function, i remember that step function is called in unordered way, despite query is ordered, looks like it's called before actual result ordering, that results in using a map/flat_map to have children ordered then map to vector conversion or anther computation function taking a map. So i left the approach. cachegrind.zip Screenshot_20200529_141049

bvbfan commented 4 years ago
- Connect 1711 transactions: 2210.91ms (1.292ms/tx, 0.762ms/txin) [3.46s (494.53ms/blk)]
- Verify 2902 txins: 5492.32ms (1.893ms/txin) [8.34s (1191.10ms/blk)]

On block 771091 we have 5.5 sec. on merkle hash. I backport my query from new hash PR, where it has a children present check, it saves some time but it still slow. We should improve merkle hash query somehow. debug.log I made a branch of the backport query merkle_improve, @BrannonKing you can check it.

hashbender commented 4 years ago

17.4.5 doesn't seem any better:

2020-05-29T15:12:36Z   - Load block from disk: 0.00ms [344.33s]
2020-05-29T15:12:36Z     - Sanity checks: 0.17ms [78.61s (0.10ms/blk)]
2020-05-29T15:12:36Z     - Fork checks: 0.03ms [119.62s (0.15ms/blk)]
2020-05-29T15:12:49Z       - Connect 193 transactions: 13842.09ms (71.721ms/tx, 26.983ms/txin) [26992.48s (34.95ms/blk)]
2020-05-29T15:12:49Z     - Verify 513 txins: 13842.18ms (26.983ms/txin) [27089.23s (35.07ms/blk)]
2020-05-29T15:12:50Z     - Index writing: 388.68ms [1710.53s (2.21ms/blk)]
2020-05-29T15:12:50Z     - Callbacks: 0.05ms [38.66s (0.05ms/blk)]
2020-05-29T15:12:50Z   - Connect total: 14233.13ms [29289.18s (37.92ms/blk)]
2020-05-29T15:12:50Z   - Flush: 196.57ms [8700.50s (11.27ms/blk)]
2020-05-29T15:12:50Z   - Writing chainstate: 0.12ms [4231.59s (5.48ms/blk)]
2020-05-29T15:12:50Z UpdateTip: new best=2c1b20404b68f240d8cf81254d4bd407a6a5aa229f1a94ef70d7a4286a25ea43 height=772335 version=0x20000000 log2_work=73.717776 txb=193 tx=13233281 date='2020-05-29T15:12:21Z' progress=0.998687 cache=25.3MiB(1079txo)
2020-05-29T15:12:50Z   - Connect postprocess: 11.89ms [81.29s (0.11ms/blk)]
2020-05-29T15:12:50Z - Connect block: 14441.72ms [42646.90s (55.22ms/blk)]

Will try 19.1.2 next

bvbfan commented 4 years ago

@nitronick600 If you can compile (clone branch merkle_improve follow the instructions on readme) and try it.

hashbender commented 4 years ago

19.1.2 isn't any better either:

2020-05-29T15:41:13Z Syncing txindex with block chain from height 749986
2020-05-29T15:41:30Z   - Load block from disk: 0.01ms [0.01s]
2020-05-29T15:41:30Z     - Sanity checks: 0.07ms [0.00s (0.08ms/blk)]
2020-05-29T15:41:30Z     - Fork checks: 0.07ms [0.00s (0.04ms/blk)]
2020-05-29T15:41:34Z       - Connect 117 transactions: 4279.97ms (36.581ms/tx, 27.792ms/txin) [37.46s (2676.07ms/blk)]
2020-05-29T15:41:34Z     - Verify 154 txins: 4280.05ms (27.793ms/txin) [37.47s (2676.22ms/blk)]
2020-05-29T15:41:34Z     - Index writing: 1.75ms [0.03s (1.92ms/blk)]
2020-05-29T15:41:34Z     - Callbacks: 0.04ms [0.00s (0.05ms/blk)]
2020-05-29T15:41:34Z   - Connect total: 4283.35ms [37.50s (2678.91ms/blk)]
2020-05-29T15:41:35Z   - Flush: 119.16ms [2.14s (152.72ms/blk)]
2020-05-29T15:41:35Z   - Writing chainstate: 0.06ms [0.00s (0.06ms/blk)]
2020-05-29T15:41:35Z UpdateTip: new best=ba92d47cf65695fe5d3edef80c27670cf9628c263a1205dbcf03546ff3862b0c height=772349 version=0x20000000 log2_work=73.717818 txb=117 tx=13235608 date='2020-05-29T15:41:21Z' progress=0.999366 cache=0.2MiB(647txo)
2020-05-29T15:41:35Z   - Connect postprocess: 1.14ms [0.97s (69.23ms/blk)]
2020-05-29T15:41:35Z - Connect block: 4403.71ms [40.62s (2901.34ms/blk)]

As an aside, it took almost 15 minutes to startup.

hashbender commented 4 years ago

Have we been able to make any progress on this?

bvbfan commented 4 years ago

We have progress on, #390 was merged to v17_master which makes things faster. You can compile by yourself or wait for release, if you use HDD you may interested to try to increase db cache, -dbcache=8192

bvbfan commented 4 years ago

@nitronick600 you can test new release https://github.com/lbryio/lbrycrd/releases/tag/v0.17.4.6

hashbender commented 4 years ago

We're running the branch now merkle_improve branch now, but I don't have enough info to say if this fixes the problem. It does seem that blocks are much faster to verify; will get back to you on submitblock

bvbfan commented 4 years ago

In the release 0.17.4.6 has another one improvement especially indexing by claim id instead of claim name which gives another boost. So the release is faster than merkle_improve.

hashbender commented 4 years ago

Will this require a reindex?

bvbfan commented 4 years ago

No, just use new executable. Memory usage is lowered by default if you prefer to use higher memory (will be faster) -dbcache=4096 as command line argument.