Performance Tracking Issue

raphjaph commented 1 year ago

Why is ord slow?

There are a few different ways that ord is slow:

Using --index-sats. This is memory and compute intensive. Additionally, committing can be very slow. Possibly slow enough to make the RPC client timeout. See #2455, indexing fails every 5000 blocks.
Fetching block headers below the first inscription block height. Block headers for the entire chain are less than 100mb, however we fetch and deserialize them individually and sequentially.
Fetching and deserialization blocks after the first inscription height.
Indexing inscriptions, above the height that inscriptions become common.
Backfilling the outpoint to value cache.

How can we make it faster?

Committing based on memory usage or time instead of block height. Waiting longer and longer between commits will have diminishing marginal returns, increase the working set size, make commits, which are uninterruptible take longer, take longer, and lose more work when crashing, so committing as frequently as possible is desirable.
Always requesting full blocks so that no backfilling is required. Predicated on fetching full blocks being faster than requesting transactions to backfill the cache.
Fetching blocks and transactions over the P2P interface.
Fetching blocks and transactions over the REST interface.
Reading blocks directly out of bitcoin core's blockfiles.
Creating a snapshot of the value cache, and backfilling it in one go.
Ordered insertion of entries into the database, which should improve memory locality.
Offloading deserialization onto background threads.
Not clearing the value cache every commit
Reducing the size of the index. This is not only good on its own, but might increase memory locality and reduce the size of the working set.
Confirm that the block queue is always full.
Try to reuse read and write transactions

From profiling, I/O, serializing, and deserializing seems to be the main bottleneck, so switching to using P2P or REST seems like the first step. Once that's as fast as possible, it will likely make other bottlenecks apparent.

P2P vs REST

P2P:

binary
fetch multiple headers at a time
fetch multiple blocks at a time
cannot fetch transactions
does not require config changes
may be faster and more robust than the REST interface
more complex, requires implementing the bitcoin core P2P protocol

REST:

binary
fetch multiple headers at a time
fetch single blocks
fetch single transactions
requires adding rest=1 to bitcoin's config file
may be slower and less robust than the REST interface
less complex, can use a standard http library

Related Issues

[x] #1893
[ ] #1894
[ ] #2652
[ ] #2741
[ ] #2668
[ ] #2562
[ ] #1630
[ ] #1959
[ ] #2358
[ ] #1812
[x] #2651
[ ] #1558
[ ] #1636
[ ] #2614
[ ] #1516
[ ] #2676
[ ] #2698
[ ] #2699
[ ] #2700
[ ] #2702
[ ] #2810
[ ] #2662
[ ] #2803
[ ] #2818
[ ] When storing transactions in the database, insert in sorted order
[ ] try getting the size of the index down below the amount of available memory
[ ] play with the redb cache size
[x] #2710
[x] #2642
[x] #2641

casey commented 1 year ago

I've been working on speeding up transaction fetching, and there are a bunch of annoying issues and tradeoffs. We currently use the JSON-RPC API, which is text-based, so transaction are fetched as hex and then decoded. This is slow, although I really should doulbe-check that I was reading the flamegraph correctly and it is indeed a bottleneck, since I was looking at the flamegraph on MacOS, and I'm no longer confident that it's accurate, and it might not have been showing off-CPU time.

Anyways, that being said: We already batch transaction requests using JSON-RPC, so even though it's a slow and text based, we might not see a speedup from using the REST interface, since it can only process requests serially. If the REST interface supported HTTP 1 pipelining, or HTTP 2 multiplexing, we could send requests in parallel over a single connection, but it doesn't appear that that's the case.

elocremarc commented 1 year ago

I am gonna quote what @Psifour said in the chat from the code hangout. Maybe he can elaborate more if he wants. Could we do multithreading? One thing about multithreading is people would probably need to set dbcache in their bitcoin conf right?

We've dabbled with a few ways to index faster. Scanning UTXO set FIRST to identify the outputs that haven't moved since before the first known inscription let's you do some optimization. Multi-threading and treating the network graph as a series of pipes that originate in coinbase and terminate in the UTXO set can also lead to some unique optimizations.

elocremarc commented 1 year ago

A PR that might be relevant to this discussion is @veryordinally hash PR https://github.com/ordinals/ord/pull/2344#issuecomment-1739954448

casey commented 1 year ago

Multithreading is quite hard. Individual transactions can be processed serially, but the coinbase transaction of each block cannot, because it depends on all the transaction in that block. Additionally, I don't think a write transaction can be shared between threads in redb without locking.

Also, I don't think that content hashes is a speedup.

Psifour commented 1 year ago

I am gonna quote what @Psifour said in the chat from the code hangout. Maybe he can elaborate more if he wants. Could we do multithreading? One thing about multithreading is people would probably need to set dbcache in their bitcoin conf right?

Short version is that it depends on which specific set of data you care about. If you are locked into supporting old edge-cases then it becomes much harder (as Casey rightfully points out with coinbase transactions being tricky).

The main challenge is that almost all optimizations come with opinionated indexing. Sat indexing, inscription parsing, and mapping inscription reveals to the current UTXO set all come with different optimization challenges that can only be satisfied if they are insulated from each other to a certain extent (but as I said, that is part of an 'opinionated' solution that I am exploring for our own indexer).

SmarakNayak commented 1 year ago

Would sequence numbers still work with multithreading?

ordinals / ord