Closed dr-orlovsky closed 1 year ago
You mean we can't convert TapLeafHash
to TapBranchHash
but we should be able to for single-leaf trees?
So we should have something like fn TapBranchHash::with_single_leaf(leaf_hash: TapLeafHash)
?
The conversion is possible, but the API is confusing. When I saw that the method doesn't accept leaf hash I had to go to the BIP to double-check how to commit in a taproot tree to a single script path - and it took a while. If each dev will have to do that double-check to ensure that by the conversion he doesn't break consensus rules, I think the API can be called "unefficient".
I propose to change the type to TapNodeHash
, which will solve the whole issue.
I think TapNodeHash
would be more confusing since it would no longer correspond to the names/tag in the BIP.
For the case of single-script taptrees we should add a code example in the docs.
Maybe tap_tweak(... merkle_root: Option<impl IntoNodeHash>)
, otherwise why we have that trait?
The single case is indeed confusing. I like KixUnil's fn TapBranchHash::with_single_leaf(leaf_hash: TapLeafHash
the most. I think Into might make it confusing to assume those types are interchangeable, there are not. They are only interchangeable if TapTree has only one leaf.
Having an explicit constructor with that name, along with some examples might be the way.
Funnily enough, I need this too. Am I correct that the implementation should be this?
pub fn with_single_leaf(leaf_hash: TapLeafHash) -> TapBranchHash {
TapBranchHash::from_inner(leaf_hash.into_inner())
}
Re-reading this thread again I think I now understand better what @dr-orlovsky meant by IntoNodeHash
and TapNodeHash
. And I believe he's right.
What's going on here is the way we use TapBranchHash
really should be TapNodeHash
. What the BIP calls TapBranch
is the name of hashing algorithm not the name of resulting type. There are two ways to correctly construct a hash for a node in the merkle tree: hashing single leaf script with TapLeaf
tag or hashing two other node hashes with TapBranch
tag.
And just because a value can be computed by two different functions it doesn't make it a different type.
Keeping the name TapBranch
actually makes it very confusing to me because my brain processes the word "branch" as "part of a tree". But TapBranchHash
is also used to represent whole tree. It feels like my brain has a little segfault every time it tries to process that.
/// A node in a Taproot merkle tree
///
/// This represents a node in the tree that is either:
/// * a leaf - directly hashsing leaf script
/// * a hash of two sub-trees which was created by hashing their root nodes
///
/// Thus, this may represent either whole tree, just a part of the tree (branch) or just the leaf. Note that in the cases a tree contains only one leaf this is both the root and the leaf which are the same in that case.
// Notice this is not hash newtype! There are only two valid ways of constructing it so the usual interface for hashing arbitrary bytes doesn't make sense.
#[]
pub TapNodeHash(sha256::Hash);
impl From<TapScriptHash> for TapNodeHash {}
impl From<TapNodeHash> for sha256::Hash { /* ... */ }
impl TapNodeHash {
/// Hashes the nodes using the TapBranch tag
// notice I used the word from BIP in doc so that people will find it
fn from_nodes(a: impl Into<TapNodeHash>, b: impl Into<TapNodeHash>) -> Self { /* ... */ }
}
// impl AsRef<$byte_type>, Borrow<$byte_type>, From<$byte_type> for Self
// where $byte_type is [u8] and [u8; 32] as appropriate
Adding this to 0.30 because I think we should fix this rather soon.
Forgot to say @dr-orlovsky I've looked at your bitcoin_scripts
crate and it looks awesome! I hope we can bring many ideas from it into this one.
One more, a bit crazy-looking idea: when we bump MSRV track the depth in a const generic so that if people statically construct TapNodeHash
the 128-limit overflow can be statically checked (well, this actually requires nightly but we could put panic in place of it now and add the bound in the future without changing API for correct usages).
Thinking about this for the second time: after working a bit with Haskell and Idris I start to think about types as a units of semantics - and not as representation of in-memory layouts or outputs of specific algorithms - like H2O is "water" no matter was it made by 2H2 + O2 -> 2H2O or CH4 + O2 -> H2O + C reactions.
Unfortunately, rust is not that specific in following this idiom: you may have u8
representing a plenty of different things, and frequently a type in rust is just about memory layout or output of specific functions. On my opinion, if a semantic may have two different memory layout representations it is still a single type, just a unit
/enum
with associated data.
So, coming back to the question of taproot tree hashes. In terms of types, we have: 1) the hash of the leaf; 2) the hash of a two other nodes ("branch"); 3) the root of the tree can be either of them; 4) a hash of something which we do not know what it is for sure ("hidden node"). This also may happen in root.
Type representation of the above can be:
pub struct Depth(u7);
pub struct TapLeafHash(TaggedHash<LEAF_MIDSTATE>);
pub struct TapBranchHash(TaggedHash<BRANCH_MIDSTATE>, Depth);
// this must not implement `hash::Hash` trait and can't be constructed by hashing some data
pub struct TapUnknownHash(sha256::Hash);
pub enum TapRootHash {
Leaf(TapLeafHash),
Branch(TapBranchHash),
Unknown(TapUnknownHash),
}
@Kixunil thank you for the good words about bitcoin_scripts
, but I am not satisfied with them anymore :)
I am working on making bp
library a strongly-typed lib inspired by Idris, where each type will have a semantic meaning. I will use the system I recently developed (called "strict encoding"), which allows for any rust type to generate its own formal representation, have that representation analysed etc; and use it to have things in that lib like "automatically test if the new release doesn't break any consensus-level serialization rules with the previous release - and detect in which parts".
You are always welcome to join that effort!
I start to think about types as a units of semantics - and not as representation of in-memory layouts or outputs of specific algorithms - like H2O is "water" no matter was it made by 2H2 + O2 -> 2H2O or CH4 + O2 -> H2O + C reactions.
Indeed, that's my mind-model as well.
Unfortunately, rust is not that specific in following this idiom: you may have u8 representing a plenty of different things, and frequently a type in rust is just about memory layout
You mean things like usize
may mean number of any kind of item in an array?
Type representation of the above can be
Is Depth
even meaningful in consensus? My understanding is that it is only used to verify whether a tree is too deep and is not used in the signing algorithm.
I am working on making
bp
library a strongly-typed lib inspired by Idris, where each type will have a semantic meaning.
I'm curious what the differences will be regarding scripts. It seemed bitcoin_scripts
is already doing a good job.
You mean things like
usize
may mean number of any kind of item in an array?
usize
is more or less ok, since its semantics is "size of a collection". What's bad is u8
in any struct type or within associated enum data.
Is
Depth
even meaningful in consensus? My understanding is that it is only used to verify whether a tree is too deep and is not used in the signing algorithm.
You can't have a control block with merkle path size longer than Depth
I'm curious what the differences will be regarding scripts. It seemed bitcoin_scripts is already doing a good job.
I do not like that the types there use Script
internally. Script
type is really bad since there is no "script" in bitcoin. There are ScriptPubkey
, SigScript
, RedeemScript
, WitnessScript
, TapScript
, LeafScript
- since the semantic meaning of the same bytecode in all of them is different (take for instance 0x50
as the first script byte post SegWit).
I do not like that the types there use
Script
internally.
Why? Isn't it just a helper implementation detail?
Script
is just Vec<u8>
or Box<[u8]>
. So you do not need another helper implementation detail type since we already have Vec/Box
as a helper implementation detail. Script
type has no semantic meaning. As a "parser" into opcodes it also doesn't work: try to parse OP_CHECKSIGVERIFY
in TapScript
context. The opcode parsers must be implemented by specific script types (like TapScript
) and not by a Script
. I.e. Script
type is meaningless.
I am convinced that this should be fixed. I like @Kixunil's suggestion to rename TapBranchHash
to TapNodeHash
. The algorithm and types are not the same and the name branch
is also confusing to me.
Although what @dr-orlovsky says is semantically correct, I believe adding TaprootHash
, and UnknownHash
will only increase the confusion in an already confusing list of hashes. Everyone will always have to re-learn this every single time they look at the code.
TapLeafHash
used in the sighash algorithm is the same as the one used in the computing merkle tree hash.There are other hashes like TapTweakHash
and TapSighashHash
, but I believe those are not confusing.
Let me suggest another third alternative that I believe might be more user-friendly,
1) Rename TapBranchHash
to TapNodeHash
. Do not make this a new hash type as per @Kixunil's suggestion. Have methods
a) TapNodeHash::from_leaf(Script, LeafVersion)
: Will compute the hash with the TapLeaf tag
b) TapNodeHash::from_nodes(TapNodeHash, TapNodeHash)
: Applies the TapBranch tag
c) and TapNodeHash::from_hidden_node(sha256::Hash)
. If you are dealing with Trees you only need to worry about one hash TapNodeHash
.
d) TapNodeHash::with_leaf_hash
: Will directly use the TapLeafHash
given. This is really optional, and I think we should remove this. Users should use TapNodeHash::from_hidden_node
instead.
I like the purpose-driven nature of types here: a) TapLeafHash: used in signing leaves b) TapNodeHash: Used in tree construction and Merkle root computation. c) TapsighashHash: Used in sighash d) TapTweak: used in tweaking
While it might be valuable in some cases to represent that TapNodeHash
is really a single leaf or a combination. I think it will only add more confusion down the line. Users that really need to know how the tree structure can store the tree in other ways.
I believe what I am saying is similar to @Kixunil's suggestion, just without the TapScriptHash
type.
@dr-orlovsky all good points with the current design of the script. However change to unsized script will drastically change this and there it will be much better to have Script<Foo>
than replicating the unsized logic (with unsafe
conversions between slices, boxes..., AsRef
and other things) several times. Especially when some scripts imply others. It'll also reduce duplication of those methods that are actually shared (hashing mainly). So I think you're correct from semantical point considering the current state of things but generics are more practical.
TapTweakHash
Is this one even useful to be public? I feels like it just adds noise to the API but I may be missing something.
another third alternative that I believe might be more user-friendly,
Oh, wow I got pretty much the same idea and was about to suggest this. :D
I think we should remove this.
I'd almost want to remove the whole TapLeafHash
type but it acts as a cache if you're signing multiple things with the same leaf hash (I happen to work on a project that does exactly this right now :)) And since we have this already It'd be weird to not have that conversion.
BTW super funny coincidence: I just managed to fix a bug that I only got because I confused leaf script with script_pubkey and since it was causing signature to be invalid it wasn't obvious what's wrong. If rust-bitcoin had script tagging (or if I had known about and used bitcoin-scripts) this wouldn't have happened. Several hours wasted to this ridiculous thing including myself going to sleep at ~5:30 instead of ~4:30. :D
Oh, wow I got pretty much the same idea and was about to suggest this. :D
I will get on implementing this.
BTW super funny coincidence: I just managed to fix a bug that I only got because I confused leaf script with script_pubkey and since it was causing signature to be invalid it wasn't obvious what's wrong. If rust-bitcoin had script tagging (or if I had known about and used bitcoin-scripts) this wouldn't have happened. Several hours wasted to this ridiculous thing including myself going to sleep at ~5:30 instead of ~4:30. :D
Yep, I have also been bit by similar bugs. Adding context to Scripts has really helped us in rust-miniscript, and is something that we should try to replicate here. It might not be that invasive, and maybe I should bite the bullet and start working on it
This all makes sense. But do we actually need separate TapTweakHash
? Isn't it just the taptree Merkle root, i.e. TapNodeHash
? What's the difference between those two types?
I think it would be nice to have two hash types: TapNodeHash
, TapSigHash
and TapLeafHash
with simple/streamlined conversion of TapLeafHash -> TapNodeHash
and internal key tweaked with TapNodeHash
.
But do we actually need separate
TapTweakHash
? Isn't it just the taptree Merkle root
From the doc:
Produces
H_taptweak(P||R)
whereP
is the internal key andR
is the merkle root.
IOW output_key = internal_key + G * (internal_key || merkle_root)
So as I say, it's an internal thing. Is it worth it making it public? I highly doubt so. Since it itself need the internal key to be produced it can't be used for some caching or anything like that. I also think it probably doesn't need to be a newtype internally. It doesn't help much with the safety and the newtype with all the impls just produces a bunch of extra work for the compiler.
Agree. Seems like no dedicated type (newtype or type alias) is needed at all. What is used is the Merkle root, and the tweak is constructed and applied just inside the function as a scalar constructed out of the sha256 tagged hash of the internal key and merkle root
Correct me if I'm wrong, but taproot script tree with a for single script path spending condition we will have a merkle root as a hash of that leaf script. In order to construct output key in such cases we can't provide a merkle root value in the existing API in a type-safe way.
https://github.com/rust-bitcoin/rust-bitcoin/blob/60f3a19acd34e1654aed21ec103ef0e34b714e0d/bitcoin/src/util/schnorr.rs#L126