rust-bitcoin / rust-bitcoin

Rust Bitcoin library
Creative Commons Zero v1.0 Universal
2.13k stars 691 forks source link

`Encodable` and `Decodable` should not be implemented for external, general-purpose types #1311

Open Kixunil opened 2 years ago

Kixunil commented 2 years ago

These implementations make assumptions about how all general-purpose types are encoded. These assumptions are not always correct and even if they were, they are not guaranteed to be correct in the future - a new encoding may be added to the protocol at any point.

There are indeed common patterns like LE-encoded integers. For those we should just have macros and usually also newtypes. CompactTarget is a nice example.

apoelstra commented 2 years ago

I think this would make the library both harder to use and less useful.

There are no protocols associated with Bitcoin that are so capricious that "a new encoding may be added in the future" in a breaking way, and us using macros rather than trait functions wouldn't help the situation anyway.

In general I wish Encodable were implemented on more types and that it were easier to implement on external types. The encoding it describes is consistent and has "obvious" implementations for basically all of the data manipulated by Bitcoin protocols: there are numbers (encoded LE), fixed-length bytestrings (encoded as-is) and variable-length bytestrings and arrays (encoded with a varint prefix).

Kixunil commented 2 years ago

in a breaking way

Not in a breaking way, just inconsistent with the previous encodings. E.g. there could be an array that is limited to 255 items so it simply uses u8 instead of compact size. We already have an inconsistency in encoding of the port number which is big endian.

This is also semantic question. What does Encodable/Decodable mean? "Any type that is being transferred between BItcoin nodes" maybe? They don't transfer integers or arrays directly, only inside messages. "Any type that has a single obvious binary encoding" doesn't hold for integers because they have two obvious encodings. But using this definition would allow sensibly implementing the trait for types that aren't used in Bitcoin consensus (e.g. Psbt).

apoelstra commented 2 years ago

"Any type that has a single obvious binary encoding" doesn't hold for integers because they have two obvious encodings.

I think only LE is obvious (and port numbers are a unique historical abberation), but I take your larger point, that it's really only the messages themselves that have a "well-defined encoding" while components like integers are encoded differently in different contexts. Even bytestrings are sometimes length-prefixed and sometimes not, which we can mostly model with arrays vs slices, but this has caused surprising behavior in the past. (In fact IIRC we removed the Encodable impl for slices, or something like that, exactly because it was hard to predict when &/autoref would cause us to serialize a slice instead of an array.)

All that to say,

"Any type that is being transferred between BItcoin nodes" maybe? They don't transfer integers or arrays directly, only inside messages.

I like this definition, and I'm warming up to your idea that we should drop the Encodable impl for primitive types like integers and just provtide macros (or even functions, I think, would be fine) to encode these types explicitly. This might even simplify some idioms, e.g. if we could write number.encode_varint() instead of VarInt(number as u64).encode().

I'm still not totally convinced that this is worth the ergonomic hit, but I agree with you in principle and I'm warming up to your approach.

Encodable and Decodable should not be implemented for external, general-purpose types [issue title]

I think when I first read this I missed the word "general", and thought you were talking about sealing the traits entirely so they could not be impl'd on external types. I'm strongly against that -- both because I feel that Encodable/Decodable should be about more than the p2p protocol, and because even if we restricted to the p2p protocol, external users may still need to impl the traits for their own types, either to cover gaps in the library or to test with proposed-but-not-accepted extensions.

Dropping the impls on u64 etc., I need to chew on, but I think I can get behind it.

Kixunil commented 2 years ago

Yep, I didn't mean to seal it. I was considering it in the past but decided not sealing makes sense. It will put more pressure on us getting those traits right before stabilizing but it seems worth it.

Regarding ergonomic hit I want to find some time to write a PR to demonstrate the difference. I suggested macros mainly to help with newtypes:

pub struct CompactTarget(u32);

impl_consensus_newtype!(CompactTarget, consensus::Int::le); // second argument specifies encoding/decoding strategy.

// impls `Encodable`/`Decodable` for structs by concatenating the encoding of individual fields - assumes they impl the trait.
consensus_sequential! {
    pub struct BlockHeader {
        pub version: BlockVersion,
        pub prev_blockhash: BlockHash,
        pub merkle_root: TxMerkleNode,
        pub time: BlockTime,
        pub bits: CompactTarget,
        pub nonce: BlockNonce,
    }
}

Maybe I'd even ACK proc macros given they could allow generating custom error types and they are pretty common anyway so maybe avoiding large compile times is dead cause.