guggero commented 3 years ago

Fixes https://github.com/lightninglabs/neutrino/issues/196.

With this PR we store the block index keys (hash->height) in sub- buckets with the first two bytes of the hash as the bucket name. Storing a large number of keys in the same bucket has a large impact on memory usage in bbolt if small-ish batch sizes are used (the b+ tree needs to be copied with every resize operation). Using sub buckets is a compromise between memory usage and access time. 2 bytes (=max 65535 sub buckets) seems to be the sweet spot (-50% memory usage, +30% access time). We take the bytes from the beginning of the byte-serialized hash since all Bitcoin hashes are reverse-serialized when displayed as strings. That means the leading zeroes of a block hash are actually at the end of the byte slice.

As the benchmarks below show, the 2 byte prefix seems to be the sweet spot between memory usage, access speed and DB file size.

I also looked at other ways of reducing the memory footprint of the bbolt based index. The main culprit seems to be the relatively small batch size of 2k blocks per update. That number is limited by how many blocks a peer serving block headers is serving in one message and cannot easily be increased. The only other way to increase the DB write batch size is to add a second level cache. But that would require more refactoring and could lead to possible de-synchronization of the index and the actual block file.

Benchmarks

All results are retrieved by running:

go test -v -run=. -bench=. -benchmem -memprofile=mem.out ./headerfs

System:

goos: linux
goarch: amd64
pkg: github.com/lightninglabs/neutrino/headerfs
cpu: AMD Ryzen 9 3900X 12-Core Processor

Test 1: control, no changes

Benchmark output

DB file size at cleanup: 39206912
BenchmarkWriteHeadersSmallBatch-24             1        15387606278 ns/op       7304628624 B/op  8107418 allocs/op
BenchmarkWriteHeadersMediumBatch
DB file size at cleanup: 46354432
BenchmarkWriteHeadersMediumBatch-24            1        6071342964 ns/op        5157273568 B/op  5878518 allocs/op
BenchmarkWriteHeadersLargeBatch
DB file size at cleanup: 63823872
BenchmarkWriteHeadersLargeBatch-24             1        3368511955 ns/op        2849549856 B/op  5009240 allocs/op
BenchmarkHeightLookupLatency
DB file size at cleanup: 5394432
BenchmarkHeightLookupLatency-24           979939              1676 ns/op             640 B/op         12 allocs/op

Max DB file size: ~63 MB

Memory usage (go tool pprof mem.out -> top):

Showing nodes accounting for 14.80GB, 96.93% of 15.27GB total
Dropped 41 nodes (cum <= 0.08GB)
Showing top 10 nodes out of 47
      flat  flat%   sum%        cum   cum%
    7.44GB 48.72% 48.72%     7.44GB 48.72%  go.etcd.io/bbolt.(*node).put
    4.82GB 31.56% 80.28%     4.82GB 31.56%  go.etcd.io/bbolt.(*node).read
    1.02GB  6.65% 86.93%     1.02GB  6.65%  go.etcd.io/bbolt.Open.func1
    0.40GB  2.64% 89.57%     0.40GB  2.64%  go.etcd.io/bbolt.(*Cursor).search
    0.39GB  2.55% 92.12%    14.70GB 96.26%  github.com/lightninglabs/neutrino/headerfs.writeRandomBatch
    0.25GB  1.63% 93.75%     5.07GB 33.19%  go.etcd.io/bbolt.(*Bucket).node
    0.21GB  1.36% 95.11%     0.27GB  1.79%  go.etcd.io/bbolt.(*DB).beginTx
    0.12GB   0.8% 95.90%     0.12GB   0.8%  go.etcd.io/bbolt.(*node).dereference
    0.10GB  0.64% 96.55%     1.27GB  8.34%  go.etcd.io/bbolt.(*Tx).allocate
    0.06GB  0.39% 96.93%    12.66GB 82.89%  go.etcd.io/bbolt.(*Bucket).Put

Test2: 1 byte sub bucket

Benchmark output

BenchmarkWriteHeadersSmallBatch
DB file size at cleanup: 38875136 bytes
BenchmarkWriteHeadersSmallBatch-24             1        17966398972 ns/op       7106311624 B/op 15420393 allocs/op
BenchmarkWriteHeadersMediumBatch
DB file size at cleanup: 46874624 bytes
BenchmarkWriteHeadersMediumBatch-24            1        6767009109 ns/op        5594956944 B/op  7798011 allocs/op
BenchmarkWriteHeadersLargeBatch
DB file size at cleanup: 64094208 bytes
BenchmarkWriteHeadersLargeBatch-24             1        3515226806 ns/op        2800528448 B/op  5739020 allocs/op
BenchmarkHeightLookupLatency
DB file size at cleanup: 5799936 bytes
BenchmarkHeightLookupLatency-24           829075              1985 ns/op             702 B/op         14 allocs/op

Max DB file size: ~64 MB

Memory usage (go tool pprof mem.out -> top):

Showing nodes accounting for 14.20GB, 96.02% of 14.79GB total
Dropped 37 nodes (cum <= 0.07GB)
Showing top 10 nodes out of 50
      flat  flat%   sum%        cum   cum%
    7.25GB 49.03% 49.03%     7.25GB 49.03%  go.etcd.io/bbolt.(*node).put
    4.25GB 28.71% 77.74%     4.25GB 28.71%  go.etcd.io/bbolt.(*node).read
    1.08GB  7.30% 85.04%     1.08GB  7.30%  go.etcd.io/bbolt.Open.func1
    0.39GB  2.65% 87.69%    14.34GB 96.95%  github.com/lightninglabs/neutrino/headerfs.writeRandomBatch
    0.38GB  2.59% 90.29%     0.38GB  2.59%  go.etcd.io/bbolt.(*Cursor).search
    0.32GB  2.17% 92.45%     4.57GB 30.88%  go.etcd.io/bbolt.(*Bucket).node
    0.15GB     1% 93.45%     1.38GB  9.33%  go.etcd.io/bbolt.(*Tx).allocate
    0.14GB  0.96% 94.41%     0.19GB  1.29%  go.etcd.io/bbolt.(*Bucket).openBucket
    0.14GB  0.94% 95.35%     0.20GB  1.33%  go.etcd.io/bbolt.(*DB).beginTx
    0.10GB  0.67% 96.02%     0.10GB  0.67%  go.etcd.io/bbolt.(*node).dereference

Test 3: 2 byte sub bucket

Benchmark output

DB file size at cleanup: 43966464 bytes
BenchmarkWriteHeadersSmallBatch-24             1        14826193397 ns/op       4140556080 B/op 16198594 allocs/op
BenchmarkWriteHeadersMediumBatch
DB file size at cleanup: 51122176 bytes
BenchmarkWriteHeadersMediumBatch-24            1        6134732835 ns/op        2248978232 B/op 14419852 allocs/op
BenchmarkWriteHeadersLargeBatch
DB file size at cleanup: 69414912 bytes
BenchmarkWriteHeadersLargeBatch-24             1        4380655964 ns/op        1854111016 B/op 14156142 allocs/op
BenchmarkHeightLookupLatency
DB file size at cleanup: 9207808 bytes
BenchmarkHeightLookupLatency-24           573646              2156 ns/op             835 B/op         14 allocs/op

Max DB file size: ~69 MB

Memory usage (go tool pprof mem.out -> top):

Showing nodes accounting for 8135.27MB, 91.56% of 8885.17MB total
Dropped 31 nodes (cum <= 44.43MB)
Showing top 10 nodes out of 62
      flat  flat%   sum%        cum   cum%
 2923.08MB 32.90% 32.90%  2923.08MB 32.90%  go.etcd.io/bbolt.(*node).read
 1466.33MB 16.50% 49.40%  1466.33MB 16.50%  go.etcd.io/bbolt.(*node).put
  842.56MB  9.48% 58.88%   842.56MB  9.48%  go.etcd.io/bbolt.(*Cursor).search
  607.21MB  6.83% 65.72%  3530.28MB 39.73%  go.etcd.io/bbolt.(*Bucket).node
  488.10MB  5.49% 71.21%   618.11MB  6.96%  go.etcd.io/bbolt.(*Bucket).openBucket
  472.04MB  5.31% 76.52%   472.04MB  5.31%  go.etcd.io/bbolt.(*node).dereference
  456.66MB  5.14% 81.66%   456.66MB  5.14%  go.etcd.io/bbolt.(*Bucket).write
  383.48MB  4.32% 85.98%  8458.17MB 95.19%  github.com/lightninglabs/neutrino/headerfs.writeRandomBatch
  345.33MB  3.89% 89.87%   345.33MB  3.89%  go.etcd.io/bbolt.Open.func1
  150.49MB  1.69% 91.56%  1101.62MB 12.40%  go.etcd.io/bbolt.(*Bucket).Bucket

Test 4: 3 byte sub bucket

Benchmark output

DB file size at cleanup: 75845632 bytes
BenchmarkWriteHeadersSmallBatch-24             1        17389070045 ns/op       6542484944 B/op 17463690 allocs/op
BenchmarkWriteHeadersMediumBatch
DB file size at cleanup: 83427328 bytes
BenchmarkWriteHeadersMediumBatch-24            1        7554943584 ns/op        4277629144 B/op 15690789 allocs/op
BenchmarkWriteHeadersLargeBatch
DB file size at cleanup: 106274816 bytes
BenchmarkWriteHeadersLargeBatch-24             1        5288760033 ns/op        2839629112 B/op 15431008 allocs/op
BenchmarkHeightLookupLatency
DB file size at cleanup: 10883072 bytes
BenchmarkHeightLookupLatency-24           743611              2181 ns/op             822 B/op         14 allocs/op

Max DB file size: ~106 MB

Memory usage (go tool pprof mem.out -> top):

Showing nodes accounting for 13588.32MB, 93.68% of 14504.27MB total
Dropped 29 nodes (cum <= 72.52MB)
Showing top 10 nodes out of 53
      flat  flat%   sum%        cum   cum%
 4970.84MB 34.27% 34.27%  4970.84MB 34.27%  go.etcd.io/bbolt.(*node).put
 4373.62MB 30.15% 64.43%  4373.62MB 30.15%  go.etcd.io/bbolt.(*node).read
 1043.04MB  7.19% 71.62%  1043.04MB  7.19%  go.etcd.io/bbolt.Open.func1
  911.56MB  6.28% 77.90%   911.56MB  6.28%  go.etcd.io/bbolt.(*Cursor).search
  710.76MB  4.90% 82.80%  5084.38MB 35.05%  go.etcd.io/bbolt.(*Bucket).node
  506.04MB  3.49% 86.29%   506.04MB  3.49%  go.etcd.io/bbolt.(*node).dereference
  388.40MB  2.68% 88.97% 13916.60MB 95.95%  github.com/lightninglabs/neutrino/headerfs.writeRandomBatch
  310.02MB  2.14% 91.11%   460.03MB  3.17%  go.etcd.io/bbolt.(*Bucket).openBucket
  209.52MB  1.44% 92.55%   209.52MB  1.44%  go.etcd.io/bbolt.(*Bucket).write
  164.54MB  1.13% 93.68%   227.54MB  1.57%  go.etcd.io/bbolt.(*DB).beginTx

coveralls commented 3 years ago

Coverage increased (+0.3%) to 72.252% when pulling 9ea609b35bab767e8d085a89795916c9344b0636 on guggero:block-header-index-mem-fix into e1978372d15e3b588121bd5d98e1b68afcf39090 on lightninglabs:master.

guggero commented 3 years ago

If a neutrino node has a database file that is partially synced, this PR breaks it, so would need a migration.

Good point. I added a fallback to the root bucket like @halseth suggested and also added a test for it.

I think it would pay dividends to implement parallel header download (#71) instead.

Do you mean instead of merging this PR? I think we can still implement the parallel header download and get benefits from both improvements.

Crypt-iQ commented 3 years ago

If a neutrino node has a database file that is partially synced, this PR breaks it, so would need a migration.

Good point. I added a fallback to the root bucket like @halseth suggested and also added a test for it.

I think it would pay dividends to implement parallel header download (#71) instead.

Do you mean instead of merging this PR? I think we can still implement the parallel header download and get benefits from both improvements.

both improvements makes sense, disregard earlier comment will re-review

lightninglabs / neutrino

Fix memory usage of bbolt block header index #213

Benchmarks

Test 1: control, no changes

Test2: 1 byte sub bucket

Test 3: 2 byte sub bucket

Test 4: 3 byte sub bucket