payjoin / rust-payjoin

Supercharged payment batching to save you fees and preserve your privacy
https://payjoindevkit.org
85 stars 37 forks source link

Encode unpadded length of payload #375

Closed nothingmuch closed 4 weeks ago

nothingmuch commented 1 month ago

Although BIP 174 PSBTs are self terminating [edit: actually they aren't, see below nvm keylen=0 is syntactically invalid, not just semantically c.f. https://github.com/bitcoin/bips/blob/master/bip-0174.mediawiki#cite_note-1], storing the length in the encrypted payload avoid any behavioral dependency on a PSBT parser ignoring any trailing data (the NULL byte padding).

This is encoded as a BOLT 1 TLV^1 record with type 0 (and therefore also a valid TLV stream), facilitating forward compatibility with BIP 77 extensions that might not necessarily signal receiver capabilities in the BIP 21 URI. This implementation si,mply ignores any trailing data which would contain any subsequent TLV records with a type larger than 0.

Since payloads are at most 7168 bytes (including the overhead), the unpadded plaintext length will always fit in 16 bits. For very small payloads this value is possibly less than 253 (0xfd), resulting in an 8 bit length.

DanGould commented 1 month ago

In another chat, you presented the problem this solves to me as follows:

BIP 174 specifies that the last element, the last kvp in an output map, is terminated by a 0x00 without parsing the PSBT, the unpadded length that includes this terminator can't really be determined

Encrypt message a with plaintext "....\x00" and "...\x00\x00" result in the same ciphertext

and

For implementers, if their psbt parser rejects instead of ignoring trailing nulls, they would currently need to find the end by dropping the additional trailing nulls after the first one

Do we know of any PSBT parsing implementations that suffer this behavior that does not follow the specification? i.e. rejecting instead of ignoring trailing nulls

I'm going to suggest simpler ways to solve this. Potentially stupid:

Help me understand whether or not adding the complexity of BOLT 1 TLV encoding is a strict necessity or if it's just a nice-to-have to prevent a theoretical problem

nothingmuch commented 1 month ago

Do we know of any PSBT parsing implementations that suffer this behavior that does not follow the specification? i.e. rejecting instead of ignoring trailing nulls

BIP 174 (and 370) doesn't say anything about ignoring trailing null bytes, only to ignore (but preserve) unknown fields.

Strictly speaking NULL bytes are interpretable, since 4 null bytes are a valid key value pair in BIP 174 consisting of type 0 and empty key and value, then if there's 4n trailing bytes this adds n keypairs, with duplicate (empty) keys, so that's in violation of the spec.

If there's a number of trailing bytes that isn't a multiple of 4, then i'm not sure what happens.

  • determine the unpadded payload length by reading bytes from the end of the buffer to the start until a non-0x00 byte is found. Yes, this depends on a PSBT being the last element of the plaintext payload, but is that a real limitation in the v2 payload?

I don't think this is simpler compared to encoding the length, there are edge cases. For example, if the last keypair in the output map has a value that ends in a null byte, there would need to be two null bytes at the end of the PSBT or it'll be truncated.

  • Should the plaintext payload just be versioned to allow us to depend on behavior like e.g. "PSBT is the last element of the payload"

That's the point of making this a valid TLV, the type can be something other than 0,

Help me understand whether or not adding the complexity of BOLT 1 TLV encoding is a strict necessity or if it's just a nice-to-have to prevent a theoretical problem

The specification I'm suggesting in BIP 77 is: The first byte must be 0, followed by a BOLT-1 BigSize encoded length, followed by that many bytes. Trailing bytes must be ignored

So not actually a TLV stream as far as BIP 77 is concerned, just forwards compatible with TLV stream.

The simpler approach is to specify that it's always prefixed with a signed 16 bit value which must be positive (if it's negative then that's an incompatible version, i.e. top bit is reserved as a type bit effectively).

I prefer the TLV approach, either BOLT-1 or the <key> encoding in BIP 174 (only difference is endianness, I chose big endian/BOLT-1 since it's more clearly documented and I thought I saw other big-endian numerical value encoding but I must be confusing with some other codebase)

DanGould commented 1 month ago

Do we know of any PSBT parsing implementations that suffer this behavior that does not follow the specification? i.e. rejecting instead of ignoring trailing nulls

The answer appears to be no, and it's just a strict interpretation of the BIP174 spec that's caught your attention.

I'm inclined to close this wontfix since it seems like client PSBT parsers could address this on their own.

Have you been bitten by a problem like this before or was it just a careful reading of the spec that brought it up?

DanGould commented 1 month ago

@nothingmuch help me understand

If bitcoin-hpke was modified to retain the underlying in-place interface then this code could be further simplified so that there is only one PADDED_MESSAGE_BYTES length buffer shared by all steps, which would also save a copy step.

Can you add more color to what this modification would look like? Are you talking about serializing the pubkey in certain message types by default so that we wouldn't need to implement that in encrypt_message_a?

We've already got a bitcoin-hpke update in this next release, so the time for breaking changes and updating deps is now

nothingmuch commented 1 month ago

Can you add more color to what this modification would look like? Are you talking about serializing the pubkey in certain message types by default so that we wouldn't need to implement that in encrypt_message_a?

No, simpler: In both encryptmessage{a,b}, first allocate a [u8; PADDED_MESSAGE_BYTES] (s the code in this PR does), then borrow mutable slice for the ciphertext + authentication tag, write the plaintext into that, and then do AEAD encryption inplace instead of returning a separate Vec<u8> that is then copied into the target Vec<u8>.

The only motivation is that it makes the bounds checking a little easier to reason about IMO. From a performance standpoint this is a silly micro-optimization, so not a justification by any means.

FWIW, if you're cACK on length tagging and I should improve this PR, I also just learned that Cursor usage is unnecessary for writing, since &mut [u8] implements the Write trait.


For the public record, I made two mistakes in this PR that Dan pointed out in chat:

  1. PSBTs are self terminating after all, since empty keys are syntactically invalid, I missed that and mistakenly concluded those would need to be rejected as duplicate keys
  2. Null termination and binary safety are irrelevant (I forgot the additional layer of encoding, so the plaintext can be treated as a NULL terminated text string).

Therefore the only remaining justifications are:

  1. making x = decrypt(encrypt(x)), general/vague arguments for length encoded strings as opposed to terminated ones (but even in a memory unsafe language such as C there still would be a fixed maximum length, so arguably this is not a memory safety issue)

  2. forward compatibility with BIP 77 extensions that do not signal the extension explicitly in the BIP 21 URI. this probably doesn't affect the multiparty transaction scenario since that is harder to make backwards compatible and would use explicit signalling and domain separation anyway.

nothingmuch commented 1 month ago

rebased to unbreak the newly merged test

nothingmuch commented 4 weeks ago

Closing this since we agreed that this change is not necessary, will followup with BIP 77 PRs to clarify why it isn't necessary after finishing some other stuff.

Are the changes from Vec<u8> to &[u8; <constant>] still desirable on parameters and return types? I would be happy to redo only that as a separate PR that does not change functionality after I finish the higher priority stuff.