rust-bitcoin / rust-bech32

Bech32 format encoding and decoding
MIT License
94 stars 49 forks source link

WIP: Add `hrpstring` module #112

Closed tcharding closed 1 year ago

tcharding commented 1 year ago

Throwing this up for you to look at @apoelstra when/if you get time.

Draft because the current implementation fails to parse a valid bech32 address from mainnet but AFAICT the code correctly implements BIP-173 - where is the bug?

Failing test is at the bottom of hrpstring.rs

    #[test]
    fn exclude_strings_that_are_not_valid_bech32_length_0() {
        // This is a real mainnet address so it must be valid.
        // https://blockstream.info/address/bc1qar0srrr7xfkvy5l643lydnw9re59gtzzwf5mdq
        let addr = "bc1qar0srrr7xfkvy5l643lydnw9re59gtzzwf5mdq";
        let parsed = Parsed::new(addr).expect("failed to parse address");

        assert_eq!(Ok(()), parsed.validate_checksum::<crate::Bech32>())
    }

It fails with InvalidDataLength from the call to checksum_length_checks:

    /// Helper function that sanity checks the length of an HRP string for a given checksum algorithm.
    ///
    /// Specifically, check that
    ///     * the data is at least long enough to contain a checksum
    ///     * that the length doesn't imply any "useless characters" where no bits are used
    fn checksum_length_checks<Ck: Checksum>(&self) -> Result<(), Error> {
        let len = match self.witness_version {
            Some(_) => self.data_chk.len() + 1,
            None => self.data_chk.len(),
        };

        if len < Ck::CHECKSUM_LENGTH {
            return Err(Error::InvalidChecksumLength);
        }
        // From BIP-173:
        // > Re-arrange those bits into groups of 8 bits. Any incomplete group at the
        // > end MUST be 4 bits or less, MUST be all zeroes, and is discarded.
        if (len - Ck::CHECKSUM_LENGTH) * 5 % 8 > 4 {
            return Err(Error::InvalidDataLength);
        }

        Ok(())
    }

And the relevant part of the bip is in bold:

Decoding

Software interpreting a segwit address:

tcharding commented 1 year ago

Unrelated to the length checks, while working on this I found data_iter to be annoying because of the generic. If we assume that users will do validity checks on the checksum before iterating the data perhaps we should remove the checksum during validation so that iterating does not require the generic (we can save it in another slice field of Parsed) .

apoelstra commented 1 year ago

If we assume that users will do validity checks on the checksum before iterating the data

How are they going to check the checksum without iterating through the data? Are you suggesting that they iterate twice? What API should they use to iterate the first time?

tcharding commented 1 year ago

After constructing the Parsed type we are expecting them to call parsed.validate_checksum before calling data_iter.

    /// When constructing the iterator we do some preliminary checks on the length
    /// of the checksummed data, hence the `Result` return type. But **this function
    /// does not validate the checksum**. Before using it, you should first call
    /// [`Self::validate_checksum`].
apoelstra commented 1 year ago

Ah! I see what you're saying. You need to use the same generic twice, once in validate_checksum and once in data_iter. I agree this is annoying.

The existing API is there so that you don't need to take self by mutable reference. I am tempted to change validate_checksum to take ownership of self, reduce the slice internally, and mark it #[must_use].

We should also probably make Parsed be Copy.

apoelstra commented 1 year ago

What would happen if you forgot to do this, and called data_iter anyway?

tcharding commented 1 year ago

Superseded by #113