Open raphjaph opened 1 year ago
I've been working on speeding up transaction fetching, and there are a bunch of annoying issues and tradeoffs. We currently use the JSON-RPC API, which is text-based, so transaction are fetched as hex and then decoded. This is slow, although I really should doulbe-check that I was reading the flamegraph correctly and it is indeed a bottleneck, since I was looking at the flamegraph on MacOS, and I'm no longer confident that it's accurate, and it might not have been showing off-CPU time.
Anyways, that being said: We already batch transaction requests using JSON-RPC, so even though it's a slow and text based, we might not see a speedup from using the REST interface, since it can only process requests serially. If the REST interface supported HTTP 1 pipelining, or HTTP 2 multiplexing, we could send requests in parallel over a single connection, but it doesn't appear that that's the case.
I am gonna quote what @Psifour said in the chat from the code hangout. Maybe he can elaborate more if he wants.
Could we do multithreading?
One thing about multithreading is people would probably need to set dbcache
in their bitcoin conf right?
We've dabbled with a few ways to index faster. Scanning UTXO set FIRST to identify the outputs that haven't moved since before the first known inscription let's you do some optimization. Multi-threading and treating the network graph as a series of pipes that originate in coinbase and terminate in the UTXO set can also lead to some unique optimizations.
A PR that might be relevant to this discussion is @veryordinally hash PR https://github.com/ordinals/ord/pull/2344#issuecomment-1739954448
Multithreading is quite hard. Individual transactions can be processed serially, but the coinbase transaction of each block cannot, because it depends on all the transaction in that block. Additionally, I don't think a write transaction can be shared between threads in redb without locking.
Also, I don't think that content hashes is a speedup.
I am gonna quote what @Psifour said in the chat from the code hangout. Maybe he can elaborate more if he wants. Could we do multithreading? One thing about multithreading is people would probably need to set
dbcache
in their bitcoin conf right?
Short version is that it depends on which specific set of data you care about. If you are locked into supporting old edge-cases then it becomes much harder (as Casey rightfully points out with coinbase transactions being tricky).
The main challenge is that almost all optimizations come with opinionated indexing. Sat indexing, inscription parsing, and mapping inscription reveals to the current UTXO set all come with different optimization challenges that can only be satisfied if they are insulated from each other to a certain extent (but as I said, that is part of an 'opinionated' solution that I am exploring for our own indexer).
Would sequence numbers still work with multithreading?
Why is ord slow?
There are a few different ways that ord is slow:
--index-sats
. This is memory and compute intensive. Additionally, committing can be very slow. Possibly slow enough to make the RPC client timeout. See #2455, indexing fails every 5000 blocks.How can we make it faster?
From profiling, I/O, serializing, and deserializing seems to be the main bottleneck, so switching to using P2P or REST seems like the first step. Once that's as fast as possible, it will likely make other bottlenecks apparent.
P2P vs REST
P2P:
REST:
rest=1
to bitcoin's config fileRelated Issues