TapTweak API for a single script path spending case

dr-orlovsky commented 1 year ago

Correct me if I'm wrong, but taproot script tree with a for single script path spending condition we will have a merkle root as a hash of that leaf script. In order to construct output key in such cases we can't provide a merkle root value in the existing API in a type-safe way.

https://github.com/rust-bitcoin/rust-bitcoin/blob/60f3a19acd34e1654aed21ec103ef0e34b714e0d/bitcoin/src/util/schnorr.rs#L126

Kixunil commented 1 year ago

You mean we can't convert TapLeafHash to TapBranchHash but we should be able to for single-leaf trees?

So we should have something like fn TapBranchHash::with_single_leaf(leaf_hash: TapLeafHash)?

dr-orlovsky commented 1 year ago

The conversion is possible, but the API is confusing. When I saw that the method doesn't accept leaf hash I had to go to the BIP to double-check how to commit in a taproot tree to a single script path - and it took a while. If each dev will have to do that double-check to ensure that by the conversion he doesn't break consensus rules, I think the API can be called "unefficient".

I propose to change the type to TapNodeHash, which will solve the whole issue.

apoelstra commented 1 year ago

I think TapNodeHash would be more confusing since it would no longer correspond to the names/tag in the BIP.

For the case of single-script taptrees we should add a code example in the docs.

dr-orlovsky commented 1 year ago

Maybe tap_tweak(... merkle_root: Option<impl IntoNodeHash>), otherwise why we have that trait?

sanket1729 commented 1 year ago

The single case is indeed confusing. I like KixUnil's fn TapBranchHash::with_single_leaf(leaf_hash: TapLeafHash the most. I think Into might make it confusing to assume those types are interchangeable, there are not. They are only interchangeable if TapTree has only one leaf.

Having an explicit constructor with that name, along with some examples might be the way.

Kixunil commented 1 year ago

Funnily enough, I need this too. Am I correct that the implementation should be this?

pub fn with_single_leaf(leaf_hash: TapLeafHash) -> TapBranchHash {
    TapBranchHash::from_inner(leaf_hash.into_inner())
}

Kixunil commented 1 year ago

Re-reading this thread again I think I now understand better what @dr-orlovsky meant by IntoNodeHash and TapNodeHash. And I believe he's right.

What's going on here is the way we use TapBranchHash really should be TapNodeHash. What the BIP calls TapBranch is the name of hashing algorithm not the name of resulting type. There are two ways to correctly construct a hash for a node in the merkle tree: hashing single leaf script with TapLeaf tag or hashing two other node hashes with TapBranch tag.

And just because a value can be computed by two different functions it doesn't make it a different type.

Keeping the name TapBranch actually makes it very confusing to me because my brain processes the word "branch" as "part of a tree". But TapBranchHash is also used to represent whole tree. It feels like my brain has a little segfault every time it tries to process that.

/// A node in a Taproot merkle tree
///
/// This represents a node in the tree that is either:
/// * a leaf - directly hashsing leaf script
/// * a hash of two sub-trees which was created by hashing their root nodes
///
/// Thus, this may represent either whole tree, just a part of the tree (branch) or just the leaf. Note that in the cases a tree contains only one leaf this is both the root and the leaf  which are the same in that case.
// Notice this is not hash newtype! There are only two valid ways of constructing it so the usual interface for hashing arbitrary bytes doesn't make sense.
#[]
pub TapNodeHash(sha256::Hash);

impl From<TapScriptHash> for TapNodeHash {}

impl From<TapNodeHash> for sha256::Hash { /* ... */ }

impl TapNodeHash {
    /// Hashes the nodes using the TapBranch tag
    // notice I used the word from BIP in doc so that people will find it
    fn from_nodes(a: impl Into<TapNodeHash>, b: impl Into<TapNodeHash>) -> Self { /* ... */ }
}

// impl AsRef<$byte_type>, Borrow<$byte_type>, From<$byte_type> for Self
// where $byte_type is [u8] and [u8; 32] as appropriate

Adding this to 0.30 because I think we should fix this rather soon.

Forgot to say @dr-orlovsky I've looked at your bitcoin_scripts crate and it looks awesome! I hope we can bring many ideas from it into this one.

Kixunil commented 1 year ago

One more, a bit crazy-looking idea: when we bump MSRV track the depth in a const generic so that if people statically construct TapNodeHash the 128-limit overflow can be statically checked (well, this actually requires nightly but we could put panic in place of it now and add the bound in the future without changing API for correct usages).

dr-orlovsky commented 1 year ago

Thinking about this for the second time: after working a bit with Haskell and Idris I start to think about types as a units of semantics - and not as representation of in-memory layouts or outputs of specific algorithms - like H2O is "water" no matter was it made by 2H2 + O2 -> 2H2O or CH4 + O2 -> H2O + C reactions.

Unfortunately, rust is not that specific in following this idiom: you may have u8 representing a plenty of different things, and frequently a type in rust is just about memory layout or output of specific functions. On my opinion, if a semantic may have two different memory layout representations it is still a single type, just a unit/enum with associated data.

So, coming back to the question of taproot tree hashes. In terms of types, we have: 1) the hash of the leaf; 2) the hash of a two other nodes ("branch"); 3) the root of the tree can be either of them; 4) a hash of something which we do not know what it is for sure ("hidden node"). This also may happen in root.

Type representation of the above can be:

pub struct Depth(u7);

pub struct TapLeafHash(TaggedHash<LEAF_MIDSTATE>);
pub struct TapBranchHash(TaggedHash<BRANCH_MIDSTATE>, Depth);

// this must not implement `hash::Hash` trait and can't be constructed by hashing some data
pub struct TapUnknownHash(sha256::Hash);

pub enum TapRootHash {
    Leaf(TapLeafHash),
    Branch(TapBranchHash),
    Unknown(TapUnknownHash),
}

dr-orlovsky commented 1 year ago

@Kixunil thank you for the good words about bitcoin_scripts, but I am not satisfied with them anymore :)

I am working on making bp library a strongly-typed lib inspired by Idris, where each type will have a semantic meaning. I will use the system I recently developed (called "strict encoding"), which allows for any rust type to generate its own formal representation, have that representation analysed etc; and use it to have things in that lib like "automatically test if the new release doesn't break any consensus-level serialization rules with the previous release - and detect in which parts".

You are always welcome to join that effort!

Kixunil commented 1 year ago

I start to think about types as a units of semantics - and not as representation of in-memory layouts or outputs of specific algorithms - like H2O is "water" no matter was it made by 2H2 + O2 -> 2H2O or CH4 + O2 -> H2O + C reactions.

Indeed, that's my mind-model as well.

Unfortunately, rust is not that specific in following this idiom: you may have u8 representing a plenty of different things, and frequently a type in rust is just about memory layout

You mean things like usize may mean number of any kind of item in an array?

Type representation of the above can be

Is Depth even meaningful in consensus? My understanding is that it is only used to verify whether a tree is too deep and is not used in the signing algorithm.

I am working on making bp library a strongly-typed lib inspired by Idris, where each type will have a semantic meaning.

I'm curious what the differences will be regarding scripts. It seemed bitcoin_scripts is already doing a good job.

dr-orlovsky commented 1 year ago

You mean things like usize may mean number of any kind of item in an array?

usize is more or less ok, since its semantics is "size of a collection". What's bad is u8 in any struct type or within associated enum data.

Is Depth even meaningful in consensus? My understanding is that it is only used to verify whether a tree is too deep and is not used in the signing algorithm.

You can't have a control block with merkle path size longer than Depth

I'm curious what the differences will be regarding scripts. It seemed bitcoin_scripts is already doing a good job.

I do not like that the types there use Script internally. Script type is really bad since there is no "script" in bitcoin. There are ScriptPubkey, SigScript, RedeemScript, WitnessScript, TapScript, LeafScript - since the semantic meaning of the same bytecode in all of them is different (take for instance 0x50 as the first script byte post SegWit).

Kixunil commented 1 year ago

I do not like that the types there use Script internally.

Why? Isn't it just a helper implementation detail?

dr-orlovsky commented 1 year ago

Script is just Vec<u8> or Box<[u8]>. So you do not need another helper implementation detail type since we already have Vec/Box as a helper implementation detail. Script type has no semantic meaning. As a "parser" into opcodes it also doesn't work: try to parse OP_CHECKSIGVERIFY in TapScript context. The opcode parsers must be implemented by specific script types (like TapScript) and not by a Script. I.e. Script type is meaningless.

sanket1729 commented 1 year ago

I am convinced that this should be fixed. I like @Kixunil's suggestion to rename TapBranchHash to TapNodeHash. The algorithm and types are not the same and the name branch is also confusing to me.

Although what @dr-orlovsky says is semantically correct, I believe adding TaprootHash, and UnknownHash will only increase the confusion in an already confusing list of hashes. Everyone will always have to re-learn this every single time they look at the code.

TapLeafHash: Only used in signing algorithms. It is only accidental that TapLeafHash used in the sighash algorithm is the same as the one used in the computing merkle tree hash.
TapNodeHash: Used in the Tree construction algorithm. This uses the tap branch algorithm. I can also stick to the TapBranchHash name if that is convenient. This is same hash that will be used to represent MerkleRoot of tree.

There are other hashes like TapTweakHash and TapSighashHash, but I believe those are not confusing.

Let me suggest another third alternative that I believe might be more user-friendly,

1) Rename TapBranchHash to TapNodeHash. Do not make this a new hash type as per @Kixunil's suggestion. Have methods a) TapNodeHash::from_leaf(Script, LeafVersion): Will compute the hash with the TapLeaf tag b) TapNodeHash::from_nodes(TapNodeHash, TapNodeHash): Applies the TapBranch tag c) and TapNodeHash::from_hidden_node(sha256::Hash). If you are dealing with Trees you only need to worry about one hash TapNodeHash. d) TapNodeHash::with_leaf_hash: Will directly use the TapLeafHash given. This is really optional, and I think we should remove this. Users should use TapNodeHash::from_hidden_node instead.

I like the purpose-driven nature of types here: a) TapLeafHash: used in signing leaves b) TapNodeHash: Used in tree construction and Merkle root computation. c) TapsighashHash: Used in sighash d) TapTweak: used in tweaking

While it might be valuable in some cases to represent that TapNodeHash is really a single leaf or a combination. I think it will only add more confusion down the line. Users that really need to know how the tree structure can store the tree in other ways.

I believe what I am saying is similar to @Kixunil's suggestion, just without the TapScriptHash type.

Kixunil commented 1 year ago

@dr-orlovsky all good points with the current design of the script. However change to unsized script will drastically change this and there it will be much better to have Script<Foo> than replicating the unsized logic (with unsafe conversions between slices, boxes..., AsRef and other things) several times. Especially when some scripts imply others. It'll also reduce duplication of those methods that are actually shared (hashing mainly). So I think you're correct from semantical point considering the current state of things but generics are more practical.

TapTweakHash

Is this one even useful to be public? I feels like it just adds noise to the API but I may be missing something.

another third alternative that I believe might be more user-friendly,

Oh, wow I got pretty much the same idea and was about to suggest this. :D

I think we should remove this.

I'd almost want to remove the whole TapLeafHash type but it acts as a cache if you're signing multiple things with the same leaf hash (I happen to work on a project that does exactly this right now :)) And since we have this already It'd be weird to not have that conversion.

BTW super funny coincidence: I just managed to fix a bug that I only got because I confused leaf script with script_pubkey and since it was causing signature to be invalid it wasn't obvious what's wrong. If rust-bitcoin had script tagging (or if I had known about and used bitcoin-scripts) this wouldn't have happened. Several hours wasted to this ridiculous thing including myself going to sleep at ~5:30 instead of ~4:30. :D

sanket1729 commented 1 year ago

Oh, wow I got pretty much the same idea and was about to suggest this. :D

I will get on implementing this.

BTW super funny coincidence: I just managed to fix a bug that I only got because I confused leaf script with script_pubkey and since it was causing signature to be invalid it wasn't obvious what's wrong. If rust-bitcoin had script tagging (or if I had known about and used bitcoin-scripts) this wouldn't have happened. Several hours wasted to this ridiculous thing including myself going to sleep at ~5:30 instead of ~4:30. :D

Yep, I have also been bit by similar bugs. Adding context to Scripts has really helped us in rust-miniscript, and is something that we should try to replicate here. It might not be that invasive, and maybe I should bite the bullet and start working on it

dr-orlovsky commented 1 year ago

This all makes sense. But do we actually need separate TapTweakHash? Isn't it just the taptree Merkle root, i.e. TapNodeHash? What's the difference between those two types?

I think it would be nice to have two hash types: TapNodeHash, TapSigHash and TapLeafHash with simple/streamlined conversion of TapLeafHash -> TapNodeHash and internal key tweaked with TapNodeHash.

Kixunil commented 1 year ago

But do we actually need separate TapTweakHash? Isn't it just the taptree Merkle root

From the doc:

Produces H_taptweak(P||R) where P is the internal key and R is the merkle root.

IOW output_key = internal_key + G * (internal_key || merkle_root)

So as I say, it's an internal thing. Is it worth it making it public? I highly doubt so. Since it itself need the internal key to be produced it can't be used for some caching or anything like that. I also think it probably doesn't need to be a newtype internally. It doesn't help much with the safety and the newtype with all the impls just produces a bunch of extra work for the compiler.

dr-orlovsky commented 1 year ago

Agree. Seems like no dedicated type (newtype or type alias) is needed at all. What is used is the Merkle root, and the tweak is constructed and applied just inside the function as a scalar constructed out of the sha256 tagged hash of the internal key and merkle root

rust-bitcoin / rust-bitcoin

TapTweak API for a single script path spending case #1393