rust-bitcoin / rust-bitcoin

Rust Bitcoin library
Creative Commons Zero v1.0 Universal
2.11k stars 681 forks source link

Consider renaming `txid` to `compute_txid` or similar. #2363

Closed Kixunil closed 9 months ago

Kixunil commented 9 months ago

The name txid looks like a simple getter. I remember very well first seeing the API years ago that I was confused and didn't know if it actually is cheap to call or not.

Alternatively we could have something like struct FrozenTransaction(Transaction, Txid) for people who want to cache it but I'm not sure about its viability.

See https://github.com/rust-bitcoin/rust-bitcoin/issues/2358#issuecomment-1900892424

apoelstra commented 9 months ago

I think it'd be confusing and annoying to just rename it, especially for people who don't care about performance or who aren't using obscenely-large transactions.

We could try closing the Transaction structure and then we could cache it.

dpc commented 9 months ago

I remember very well first seeing the API years ago that I was confused and didn't know if it actually is cheap to call or not.

When I first used rust-bitcoin I only discovered it when optimizing performance.

It's obvious when you understand what's happening under the hood, but when you are new rust-bitcoin user or you're are just focused on something else, it's very easy to overlook.

If a function is not a cheap getter, it should just have a verb in a name.

I think it'd be confusing and annoying to just rename it,

To help migration: the old name could be kept at first with a comment, then deprecated with a message to use a new name, then removed completely.

Other than that I don't think couple of letters more make a big difference.

I'm always tempted to name things like this backwards (txid_compute) so the LSP auto-completion does the right thing for the user that intuitively types tx.txid.

We could try closing the Transaction structure and then we could cache it.

That requires an extra field and either:

Kixunil commented 9 months ago

It's annoying but likely the correct type of annoyance.

@dpc IIRC LSP can see doc(alias = "txid")

Regarding caching, we could use a 32-byte array of atomics (or packing it into u64 atomics) and an atomic that txid is set. Notice that txid doesn't change if transaction didn't so multiple threads would write the same data which is fine.

However there's a much bigger problem: invalidation. It'd mean literally no mutable references to the inside except for Witness (an if we're caching wtxid too then not even those). There would have to be a setter for everything and at best we might be able to use proxy objects to make it look a bit better. I think this is unacceptable.

apoelstra commented 9 months ago

It'd mean literally no mutable references to the inside except for Witness (an if we're caching wtxid too then not even those).

Well, any accessor to mutable references could just turn off the "txid is set" bit.

Kixunil commented 9 months ago

Hmm, OK, that could work...

I'm tempted to at least try it and maybe benchmark it.

apoelstra commented 9 months ago

Plausibly we could even avoid the atomics, with a bit of ugliness. We'd have:

The main issue here is that *Ref types are really annoying to work with (e.g. the type returned by Mutex or RefCell. So it is probably better to move recomputation into txid using atomics. (But ofc then there is a perf hit from the use of atomics.

Another solution would be to split the type into an immutable Transaction with all cached data, and a mutable TransactionConstructor which only had an expensive compute_txid method.

apoelstra commented 9 months ago

Also, flagging that we have a similar issue with blockhashes and the Block type, which is a POD type but whose hash requires us to hash all contained transactions.

Kixunil commented 9 months ago

But ofc then there is a perf hit from the use of atomics.

I suspect that recomputing Txid every time you change the transaction regardless of whether you need it would be a much bigger perf hit.

Immutable/mutable split sounds similar to my idea of FrozenTransaction just different names.

apoelstra commented 9 months ago

Immutable/mutable split sounds similar to my idea of FrozenTransaction just different names.

Nice. I think we should give the short name to the frozen variant because transactions rarely change once they're constructed.

Kixunil commented 9 months ago

So move the current thing to be TransactionMut? I'd do it after abstract transactions so that we can deduplicate stuff.

dpc commented 9 months ago

@dpc IIRC LSP can see doc(alias = "txid")

That is great to know. Thanks!

Also, flagging that we have a similar issue with blockhashes and the Block type, which is a POD type but whose hash requires us to hash all contained transactions.

There is a thousands ways one might want organize computation of things like txids, or handling blockchain data. Trying to come up with complications to anticipate them is just a loosing battle, IMO. If you have N versions of e.g. Transactions then there's a need to make them interoperable, introduce traits to glue them and complexity keeps growing. In addition one might want to do zero-copy deserialization (there was already PR like that around), or store things in data-oriented architecture, etc.

It seems more practical to focus on the most common, basic and natural type to be used in most software for interoperability, let the users deal with txid as they see fit (e.g. pass (TxId, Transaction) in the APIs and leave fancy stuff to extra libraries or the future.

The issue here is that txid being CPU heavy and could use a name adjustment to make it more obvious. How did it lead to ... TransactionMut? :D

dpc commented 9 months ago

If anyone just wants to cache txid() they can wrap Transaction in a struct, add field for a cache and deal with it however they need. Not exactly difficult. It's easier to add functionality that is needed, then remove one that one doesn't want.

Kixunil commented 9 months ago

Yeah, I'm not particularly happy about any of the solutions. I mainly discussed hoping that maybe we can figure out something sensible.

yancyribbens commented 9 months ago

Yeah, I'm not particularly happy about any of the solutions. I mainly discussed hoping that maybe we can figure out something sensible.

depricating txid in favor of compute_txid seems like a good start. I agree that txid sounds like a simple cheap operation while compute_txid is a better name for what's going on.

Kixunil commented 9 months ago

I just had an idea that there could be another motivation for FrozenTransaction: supporting more efficient layout similar to that of Witness.

However, I don't believe this is worth changing Transaction for 1.0.

I think that the whole transaction module and its dependencies are quite battle-tested and widely used in multiple crates. Whatever crazy ideas we come up with can be added as another extension crate, especially after supporting abstract transactions. I really want to focus on 1.0 of the leaf crates.

apoelstra commented 9 months ago

@dpc

There is a thousands ways one might want organize computation of things like txids, or handling blockchain data. Trying to come up with complications to anticipate them is just a loosing battle, IMO.

Bitcoin Core has CTransaction and CMutableTransaction and this has worked fine for them for many years, even though C++ has a much weaker type system which can obfuscate what's going on.

In this crate we have Script (now ScriptBuf) as well as script::Builder. This is a very common idiom in Rust. If we also had transaction::Builder and block::Builder this would be nicely consistent.

@Kixunil

However, I don't believe this is worth changing Transaction for 1.0.

You may be right. Renaming txid to compute_txid, providing a deprecation cycle, and using #[doc(alias)] is easy and has little room for bikeshedding (it sounds like we've agreed on the name, at least :)). And it would be an improvement over the status quo.

Moving to a hyper-efficient builder model feels like there's a lot of design space. We could store the flat bytes of the transaction along with offsets, and/or tweak things to improve alignment, and/or exploit system endianness to cast bytes to u32, etc etc. And how would transaction::Builder interact with PSBT? Seems like it should, but damned if I know how. And this can be easily done in a parallel crate and later pulled into 2.0 if this makes sense.