Closed partialord closed 2 months ago
Nice! These changes look good to me.
The new usage of "entry"/"value" doesn't quite seem consistent, though. A UtxoEntry isn't an entry in the sense of the Entry trait; it's an entry in the sense of InscriptionEntry or RuneEntry, whose low-level types are (quite logically) called InscriptionEntryValue and RuneEntryValue.
So if we want to be consistent, I think the types should be UtxoEntry, UtxoEntryValue, and UtxoEntryValueBuf. Or for shorter names, we could call this a Utxo rather than a UtxoEntry, and then the names would be Utxo, UtxoValue, and UtxoValueBuf. I think that might help with variable naming, too -- all the "utxo_entry" variables I had which are now "value" would instead become "utxo", which is a lot more informative.
Yah, good point. Utxo
feels a bit ambiguous though, since there are a bunch of things in the codebase that you might refer to as a "utxo". I'm tempted to go with UtxoEntry
, UtxoEntryValue
, and UtxoEntryBuilder
, Just because UtxoEntryValueBuf
is heinous T_T. What do you think?
Yeah, I agree that Utxo and UtxoValue are ambiguous.
Personally I feel like UtxoEntryValue is already kind of heinous, though. I think the "Value" convention works for the existing types mostly because the Value types are so rarely used: the runes code is entirely based on RuneEntry, and you never see RuneEntryValue except in the table definition. So even though RuneEntryValue is a relatively long and uninformative name, it successfully communicates "this is a variant version of RuneEntry", and that's good enough for its very minor usage.
But UtxoEntry isn't like that at all. The "high-level" UtxoEntry type isn't even usable by itself -- it just stores references into the low-level value! The low-level type is the one you actually use everywhere. And as we optimize ord, other types will probably start to follow this pattern too, because it's more efficient; converting all the data into a new format on every load and store is pretty expensive in terms of memory allocations and copies. In particular, keeping sat ranges in the low-level format is high on my list of future optimizations.
For type names, my general feeling is it's much more important to communicate what data the type contains than it is to describe exactly what format that data is stored in. This is why the "Value" convention works at all: the word "Value" doesn't communicate much, but that's fine, because a RuneEntry and a RuneEntryValue store the same data and we don't really care about the details of the representation.
So I think my preference is still to use UtxoEntry and UtxoEntryBuf for the low-level types. In particular, this successfully communicates that these are the types you're supposed to use: when you load a rune entry from the database, you want to store it as a RuneEntry, and when you load a UTXO entry, you want to store it as a UtxoEntry (or UtxoEntryBuf). The other versions of each type are just transient things you convert through but don't keep around.
I'd love to find a better name than ParsedUtxoEntry for the high-level type, but also I feel like that name isn't particularly important for exactly the same reason that "RuneEntryValue" isn't that important: it doesn't get used much.
Oh yeah, another option that might be worth considering: we could get rid of ParsedUtxoEntry entirely, and just have accessors on the low-level UtxoEntry to retrieve the individual fields. I suspect this is a wash in terms of performance -- it's slightly faster in some cases (because we don't have to parse the whole entry to retrieve a single field) and slightly slower in others (because we have to reparse the beginning to access the middle), but I'm guessing the difference is insignificant in both cases.
This was actually my original plan during development. It makes field access a little nicer for the user, since they don't have to call .parse(), and it avoids the need for separate high-level and low-level types entirely. I switched from this to ParsedUtxoEntry in order to make the parsing code a bit easier to read (because parsing the whole entry at once is simpler than parsing it incrementally), but I think I might actually be able to make an incremental version that's as easy to read as the current code.
Alternatively, if it turns out that the parser really needs to be stateful to be efficient, we could have UtxoEntry, UtxoEntryBuf, and UtxoEntryParser. This avoids the redundant parsing work, and from the user's point of view it would work just like the current code: you call .parse() to get a UtxoEntryParser, and then you can call accessors to get the individual fields. But the naming makes it very clear that the underlying UtxoEntry is the real type, and the parser is just a view into it.
Yah, I think you're right, I find all that convincing. I pushed a commit going back tot he old names. UtxoEntryBuf
and ParsedUtxoEntry
are not great, but at least they're clear!
Replace the OUTPOINT_TO_TXOUT, OUTPOINT_TO_SAT_RANGES, and SATPOINT_TO_SEQUENCE_NUMBER tables with a new OUTPOINT_TO_UTXO_ENTRY table. This saves space by avoiding redundant key storage, and makes updates quicker since we can write all of an outpoint's data at once.
It also simplifies caching. We used to have utxo_cache for OUTPOINT_TO_TXOUT and range_cache for OUTPOINT_TO_SAT_RANGES, while SATPOINT_TO_SEQUENCE_NUMBER was uncached; now, there's a single unified utxo_cache which covers all three. Additionally, since SCRIPT_PUBKEY_TO_OUTPOINT and SEQUENCE_NUMBER_TO_SATPOINT are storing data which is already included in the cache (just with the keys and values swapped), we get to cache those tables too for free.
The combined effect is that an index with --index-sats and --index-addresses is now about 54 GB smaller and twice as fast to build.