oasisprotocol / nexus

Official indexer for the Oasis Network.
Apache License 2.0
14 stars 4 forks source link

Speed up full reindex #179

Open mitjat opened 2 years ago

mitjat commented 2 years ago

Try to speed up reindexing of the whole chain by sharding blocks among multiple analyzers.

First, test if the node scales well to multiple clients; just run 2 or 3 analyzers locally (each with a different block range) and eyeball speed.

Not crucial right now, but needed for a functional prod deploy; we cannot afford for our response to a hypothetical needed hotfix to be "we'll be back online in 3-4 days".

mitjat commented 2 years ago

This DB update statemetn looks order-dependent and prevents us from processing blocks out of order. Maybe we can change the order query so that the second operand to +/- is derived from the tx or event only?

aefhm commented 2 years ago

order-dependent

By that, you don't mean within the same height right? https://github.com/oasisprotocol/oasis-indexer/blob/4aa5a529147586a1316c2c5f25dd7ccc56e1ac58/analyzer/queries.go#L209-L210

Maybe we can change the order query so that the second operand to +/- is derived from the tx or event only?

I think that is definitely preferred when possible. Hmm, balance in general I believe is always a fractional ownership of the shares of the total escrowed or debonding amount.

mitjat commented 2 years ago

Nit: The initial value of escrow_balance_active (when inserting the row in the DB for the first time) comes from NewShares, which is a Quantity (= int), not something fractional (as in a decimal number).

I meant order-dependent in the sense of the order of blocks, yes.

The more I dig into this one query, the more I uncover. Forked the discussion off into #192 so this ticket stays focused on speeding up the reindex.

My current take is that we can probably ignore this one SQL query for now because we never(?)/rarely slash so the query doesn't really get used, and the query seems buggy in its current form anyway. So let's not block indexer parallelization on fixing this one query.

aefhm commented 1 year ago

My current take is that we can probably ignore this one SQL query for now because we never(?)/rarely slash so the query doesn't really get used, and the query seems buggy in its current form anyway. So let's not block indexer parallelization on fixing this one query.

Fair enough.

Nit: The initial value of escrow_balance_active (when inserting the row in the DB for the first time) comes from NewShares, which is a Quantity (= int), not something fractional (as in a decimal number).

Uh, I think it comes from the amount?

https://github.com/oasisprotocol/oasis-indexer/blob/fdd9f4105985ca5b6466dfa787ecd6ddb3da1942/analyzer/consensus/consensus.go#L715-L730

mitjat commented 1 year ago

Parallelization works to some extent, but less well than I hoped it would.

I configured the indexer to run multiple analyzers (took some hacky work; the assumption that there is only ever one analyzer of a given type is baked in in several places); here's what I got:

analyzers blocks per minute
1 133
2 211
4 119 and 222 (two separate runs)

All analyzers ran in the 8_000_000 to 8_500_000 block range, consensus only. Each analyzer was assigned a custom 100k range. This was tested on the node via a kubectl port forward; the latter might have brought additional IO issues. (Edit: Definitely. See below.)

There were two reasons that I saw for the slow-down when adding 4+ analyzers:

Fetching individual blocks from the node takes longer the more analyzers there are. The eyeballed median is 0.04s for one analyzer, 0.06s for two and four analyzers. But there is lots of variance; eyeballed p90=0.1s and p99=0.3s. (Note: We fetch a lot more than just the block data for each height)

Experiments to follow:

mitjat commented 1 year ago

The only problematic, not-analyzable-out-of-order event (TakeEscrowEvent) from the second comment above has been made analyzable out of order by adding an additional field to the Event in https://github.com/oasisprotocol/oasis-core/pull/5016. That field will not be present for already-generated Events though, so we cannot parallelize the full reindex for now :/

FWIW, here are the numbers on parallelization in a more realistic setup, i.e. in k8s:

analyzers blocks per minute
1 948
2 1363
4 834
8 416

For posterity, the above was obtained with (technically, counts processed blocks in the last minute of logs):

for n in 1 2 4 8; do f=par${n}.3; start="$(TZ=UTC date --date "$(tail "$f" -n1 | cut -d' ' -f1) -1min" --rfc-3339=ns | tr ' ' T)"; cat "$f" | awk '$1 > "'"$start"'"' | wc -l; done;

Observation: oasis-node is not great at handling parallel connections:

analyzers time to fetch block data with GetBlock (this is just one of the gRPC calls, I didn't instrument the others)
1 0.0139342
2 0.0201103
4 0.0775929
8 0.078754

The above was obtained from logs with

less par1.3 | cut -d' ' -f2- | grep 'block fetched in' | cut -d' ' -f4 | avg

Next steps:

mitjat commented 1 year ago

https://app.clickup.com/t/3ufk8j6

mitjat commented 1 year ago

Ran some CPU profiling. Cropped CPU flamegraph: image

No major surprises there:

Inspect for yourself: indexer-cpu.log (see commands below on how to visualize)

Notes:

Code to run profiling CPU profiling: Added to topmost main.go:

    f, err := os.Create("/tmp/indexer-cpu.log")
    if err != nil {
        fmt.Fprintf(os.Stderr, "Cannot instantiate profiling: %+v", err)
    }
    runtime.SetCPUProfileRate(100)
    pprof.StartCPUProfile(f)
    defer pprof.StopCPUProfile() // doesn't ever run because indexer never finishes; below code makes it stop profiling on ctrl+C

    c := make(chan os.Signal, 2)
    signal.Notify(c, os.Interrupt, syscall.SIGQUIT) // subscribe to system signals
    onKill := func(c chan os.Signal) {
        select {
        case <-c:
            fmt.Fprintf(os.Stderr, "BANZAIIIIIIIIIIIII")
            pprof.StopCPUProfile()
            os.Exit(0)
        }
    }
    go onKill(c)

Ran indexer as usual, killed it with ctrl+C. Visualized the profiling info with go tool pprof -http=":9999" ./oasis-indexer /tmp/indexer-cpu.log

pro-wh commented 1 year ago
  1. why crop
  2. cbor not showing up anywhere in the screenshot

On Tue, Nov 22, 2022, 4:47 PM mitjat @.***> wrote:

Ran some CPU profiling. Cropped CPU flamegraph: [image: image] https://user-images.githubusercontent.com/629970/203448819-c5731a14-ebf1-4553-b55b-1907c7c7df24.png

No major surprises there:

  • CPU is largely idle
  • Most of the time is spent decoding CBOR (and to a lesser extent, speaking the grpc/http/postgres protocols)
  • Minor surprise: About 18% of the total (CPU) time and about 35% of block-analysis time is spent on parsing the registry data. That's a lot. If these percentages transfer over to wall time, it's worth considering the tradeoffs of pulling the registry data wholesale once every N rounds or seconds.

Inspect for yourself: indexer-cpu.log https://github.com/oasisprotocol/oasis-indexer/files/10072093/indexer-cpu.log (see commands below on how to visualize)

Notes: Code to run profiling CPU profiling: Added to topmost main.go:

f, err := os.Create("/tmp/indexer-cpu.log") if err != nil { fmt.Fprintf(os.Stderr, "Cannot instantiate profiling: %+v", err) } runtime.SetCPUProfileRate(100) pprof.StartCPUProfile(f) defer pprof.StopCPUProfile() // doesn't ever run because indexer never finishes; below code makes it stop profiling on ctrl+C

c := make(chan os.Signal, 2) signal.Notify(c, os.Interrupt, syscall.SIGQUIT) // subscribe to system signals onKill := func(c chan os.Signal) { select { case <-c: fmt.Fprintf(os.Stderr, "BANZAIIIIIIIIIIIII") pprof.StopCPUProfile() os.Exit(0) } } go onKill(c)

Ran indexer as usual, killed it with ctrl+C. Visualized the profiling info with go tool pprof -http=":9999" ./oasis-indexer /tmp/indexer-cpu.log

— Reply to this email directly, view it on GitHub https://github.com/oasisprotocol/oasis-indexer/issues/179#issuecomment-1324413402, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJDVILQ7GLKEYXD5WSVAASLWJVSSFANCNFSM6AAAAAARE57LS4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Disclaimer: This e-mail and any attachments may contain confidential information. If you are not the intended recipient, any disclosure, copying, distribution or use of any information contained herein is strictly prohibited. If you have received this transmission in error, please immediately notify the sender and destroy the original transmission and any attachments without reading or saving.

pro-wh commented 1 year ago

image

reran the above log to post an uncropped pic :skull:

mitjat commented 1 year ago

Each round spends about this much time in IO wait:

This is not wall time, and we parallelize some oasis-node requests, so there's some speedup there. Also, this is on my local laptop with a k8s tunnel to the prod node and a local postgres. In prod, I see postgres latencies closer to 25ms. Still, same ballpark.

If we have to keep pg writes sequential (because of deadlocks when updating e.g. balances), we're looking at a max possible speed of (1000ms)/(25ms) = 40 blocks per second. That's very roughly (10e6 blocks)/40/3600=70h for a full reindex, and slightly slower if we include emerald parsing. But also, I hope we can avoid creating competing db writes during the fast full rescan, and do much better for time.

Steps to trace each round ```diff --- a/analyzer/consensus/consensus.go +++ b/analyzer/consensus/consensus.go @@ -160,6 +161,11 @@ func (m *Main) Start() { backoff.Wait() m.logger.Info("attempting block", "height", height) + f, perr := os.Create(fmt.Sprintf("/tmp/consensus-%d.trace", height)) + if perr != nil { + panic(perr) + } + trace.Start(f) if err := m.processBlock(ctx, height); err != nil { if err == analyzer.ErrOutOfRange { m.logger.Info("no data available; will retry", @@ -175,6 +181,8 @@ func (m *Main) Start() { backoff.Failure() continue } + trace.Stop() + f.Close() m.logger.Info("processed block", "height", height) backoff.Success() ``` Run analyzer as usual. View trace: `go tool trace /tmp/consensus-8092108.trace` opens a web page. For network blocking times, see "Network blocking profile". I didn't find much use for the traditional main trace view (first link in interface) because it's just a bunch of white space (= waiting).
aefhm commented 1 year ago

About 18% of the total (CPU) time and about 35% of block-analysis time is spent on parsing the registry data.

Yikes. Agree that this is probably not worth optimizing right now though.

mitjat commented 1 year ago

Plan:

pro-wh commented 1 year ago

for token support, I want to run paratime queries during analysis. this is to get token symbol and other data

Edit [Mitja]: Discussed in Slack, we're leaning towards fetching all this data in a separate, out-of-band "analyzer" / data source.

mitjat commented 1 year ago

Some more pointers to myself: During reindex: