Open mitjat opened 2 years ago
This DB update statemetn looks order-dependent and prevents us from processing blocks out of order. Maybe we can change the order query so that the second operand to +/- is derived from the tx or event only?
order-dependent
By that, you don't mean within the same height right? https://github.com/oasisprotocol/oasis-indexer/blob/4aa5a529147586a1316c2c5f25dd7ccc56e1ac58/analyzer/queries.go#L209-L210
Maybe we can change the order query so that the second operand to +/- is derived from the tx or event only?
I think that is definitely preferred when possible. Hmm, balance
in general I believe is always a fractional ownership of the shares of the total escrowed or debonding amount.
Nit: The initial value of escrow_balance_active
(when inserting the row in the DB for the first time) comes from NewShares
, which is a Quantity
(= int), not something fractional (as in a decimal number).
I meant order-dependent in the sense of the order of blocks, yes.
The more I dig into this one query, the more I uncover. Forked the discussion off into #192 so this ticket stays focused on speeding up the reindex.
My current take is that we can probably ignore this one SQL query for now because we never(?)/rarely slash so the query doesn't really get used, and the query seems buggy in its current form anyway. So let's not block indexer parallelization on fixing this one query.
My current take is that we can probably ignore this one SQL query for now because we never(?)/rarely slash so the query doesn't really get used, and the query seems buggy in its current form anyway. So let's not block indexer parallelization on fixing this one query.
Fair enough.
Nit: The initial value of escrow_balance_active (when inserting the row in the DB for the first time) comes from NewShares, which is a Quantity (= int), not something fractional (as in a decimal number).
Uh, I think it comes from the amount
?
Parallelization works to some extent, but less well than I hoped it would.
I configured the indexer to run multiple analyzers (took some hacky work; the assumption that there is only ever one analyzer of a given type is baked in in several places); here's what I got:
analyzers | blocks per minute |
---|---|
1 | 133 |
2 | 211 |
4 | 119 and 222 (two separate runs) |
All analyzers ran in the 8_000_000 to 8_500_000 block range, consensus only. Each analyzer was assigned a custom 100k range. This was tested on the node via a kubectl port forward; the latter might have brought additional IO issues. (Edit: Definitely. See below.)
There were two reasons that I saw for the slow-down when adding 4+ analyzers:
failed to receive server preface within timeout
messages (each of which freezes an analyzer for 20 seconds, which is the duration of the timeout). Fetching individual blocks from the node takes longer the more analyzers there are. The eyeballed median is 0.04s for one analyzer, 0.06s for two and four analyzers. But there is lots of variance; eyeballed p90=0.1s and p99=0.3s. (Note: We fetch a lot more than just the block data for each height)
Experiments to follow:
The only problematic, not-analyzable-out-of-order event (TakeEscrowEvent
) from the second comment above has been made analyzable out of order by adding an additional field to the Event in https://github.com/oasisprotocol/oasis-core/pull/5016.
That field will not be present for already-generated Events though, so we cannot parallelize the full reindex for now :/
FWIW, here are the numbers on parallelization in a more realistic setup, i.e. in k8s:
analyzers | blocks per minute |
---|---|
1 | 948 |
2 | 1363 |
4 | 834 |
8 | 416 |
For posterity, the above was obtained with (technically, counts processed blocks in the last minute of logs):
for n in 1 2 4 8; do f=par${n}.3; start="$(TZ=UTC date --date "$(tail "$f" -n1 | cut -d' ' -f1) -1min" --rfc-3339=ns | tr ' ' T)"; cat "$f" | awk '$1 > "'"$start"'"' | wc -l; done;
Observation: oasis-node is not great at handling parallel connections:
analyzers | time to fetch block data with GetBlock (this is just one of the gRPC calls, I didn't instrument the others) |
---|---|
1 | 0.0139342 |
2 | 0.0201103 |
4 | 0.0775929 |
8 | 0.078754 |
The above was obtained from logs with
less par1.3 | cut -d' ' -f2- | grep 'block fetched in' | cut -d' ' -f4 | avg
Next steps:
Ran some CPU profiling. Cropped CPU flamegraph:
No major surprises there:
Inspect for yourself: indexer-cpu.log (see commands below on how to visualize)
Notes: Code to run profiling
CPU profiling: Added to topmost main.go:
f, err := os.Create("/tmp/indexer-cpu.log")
if err != nil {
fmt.Fprintf(os.Stderr, "Cannot instantiate profiling: %+v", err)
}
runtime.SetCPUProfileRate(100)
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile() // doesn't ever run because indexer never finishes; below code makes it stop profiling on ctrl+C
c := make(chan os.Signal, 2)
signal.Notify(c, os.Interrupt, syscall.SIGQUIT) // subscribe to system signals
onKill := func(c chan os.Signal) {
select {
case <-c:
fmt.Fprintf(os.Stderr, "BANZAIIIIIIIIIIIII")
pprof.StopCPUProfile()
os.Exit(0)
}
}
go onKill(c)
Ran indexer as usual, killed it with ctrl+C. Visualized the profiling info with go tool pprof -http=":9999" ./oasis-indexer /tmp/indexer-cpu.log
On Tue, Nov 22, 2022, 4:47 PM mitjat @.***> wrote:
Ran some CPU profiling. Cropped CPU flamegraph: [image: image] https://user-images.githubusercontent.com/629970/203448819-c5731a14-ebf1-4553-b55b-1907c7c7df24.png
No major surprises there:
- CPU is largely idle
- Most of the time is spent decoding CBOR (and to a lesser extent, speaking the grpc/http/postgres protocols)
- Minor surprise: About 18% of the total (CPU) time and about 35% of block-analysis time is spent on parsing the registry data. That's a lot. If these percentages transfer over to wall time, it's worth considering the tradeoffs of pulling the registry data wholesale once every N rounds or seconds.
Inspect for yourself: indexer-cpu.log https://github.com/oasisprotocol/oasis-indexer/files/10072093/indexer-cpu.log (see commands below on how to visualize)
Notes: Code to run profiling CPU profiling: Added to topmost main.go:
f, err := os.Create("/tmp/indexer-cpu.log") if err != nil { fmt.Fprintf(os.Stderr, "Cannot instantiate profiling: %+v", err) } runtime.SetCPUProfileRate(100) pprof.StartCPUProfile(f) defer pprof.StopCPUProfile() // doesn't ever run because indexer never finishes; below code makes it stop profiling on ctrl+C
c := make(chan os.Signal, 2) signal.Notify(c, os.Interrupt, syscall.SIGQUIT) // subscribe to system signals onKill := func(c chan os.Signal) { select { case <-c: fmt.Fprintf(os.Stderr, "BANZAIIIIIIIIIIIII") pprof.StopCPUProfile() os.Exit(0) } } go onKill(c)
Ran indexer as usual, killed it with ctrl+C. Visualized the profiling info with go tool pprof -http=":9999" ./oasis-indexer /tmp/indexer-cpu.log
— Reply to this email directly, view it on GitHub https://github.com/oasisprotocol/oasis-indexer/issues/179#issuecomment-1324413402, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJDVILQ7GLKEYXD5WSVAASLWJVSSFANCNFSM6AAAAAARE57LS4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Disclaimer: This e-mail and any attachments may contain confidential information. If you are not the intended recipient, any disclosure, copying, distribution or use of any information contained herein is strictly prohibited. If you have received this transmission in error, please immediately notify the sender and destroy the original transmission and any attachments without reading or saving.
reran the above log to post an uncropped pic :skull:
Each round spends about this much time in IO wait:
This is not wall time, and we parallelize some oasis-node requests, so there's some speedup there. Also, this is on my local laptop with a k8s tunnel to the prod node and a local postgres. In prod, I see postgres latencies closer to 25ms. Still, same ballpark.
If we have to keep pg writes sequential (because of deadlocks when updating e.g. balances), we're looking at a max possible speed of (1000ms)/(25ms) = 40 blocks per second. That's very roughly (10e6 blocks)/40/3600=70h for a full reindex, and slightly slower if we include emerald parsing. But also, I hope we can avoid creating competing db writes during the fast full rescan, and do much better for time.
About 18% of the total (CPU) time and about 35% of block-analysis time is spent on parsing the registry data.
Yikes. Agree that this is probably not worth optimizing right now though.
Plan:
GenesisAtHeight
in the endentities,nodes,runtimes,accounts,commissions,allowances,delegations,debonding_delegations,proposals,votes
GenesisAtHeight
for paratimes :(. Luckily (and unlike consensus), all updates are commutative, so the only problem is DB lock contention.GenesisAtHeight
datafor token support, I want to run paratime queries during analysis. this is to get token symbol and other data
Edit [Mitja]: Discussed in Slack, we're leaning towards fetching all this data in a separate, out-of-band "analyzer" / data source.
Some more pointers to myself: During reindex:
ALTER TABLE SET UNLOGGED
Try to speed up reindexing of the whole chain by sharding blocks among multiple analyzers.
First, test if the node scales well to multiple clients; just run 2 or 3 analyzers locally (each with a different block range) and eyeball speed.
Not crucial right now, but needed for a functional prod deploy; we cannot afford for our response to a hypothetical needed hotfix to be "we'll be back online in 3-4 days".