Address size and format

lrettig commented 2 years ago

https://ethereum-magicians.org/t/increasing-address-size-from-20-to-32-bytes/5485

noamnelke commented 2 years ago

Of the 3 motivations that Vitalik provides in the document, the first two are definitely not relevant to Spacemesh at this stage. Increasing the address size at this point will actually make it much harder to get those first two benefits if/when we need them in the future.

Regarding his 3rd motivation, increasing the hash part of the address from 20 to 26 bytes to gain more collision resistance, that's as relevant to Spacemesh as it is to Ethereum, but Vitalik doesn't go into why/if this is needed.

To be extra safe, I tried to look up if someone did the math so we don't have to and I was happy to find this back-of-the-napkin calculation that people at Tendermint did.

My conclusion is this: even under extreme assumptions, a birthday attack on a 20-byte address would likely cost over $1M and take years and a tremendous logistical effort to carry out. Finding a collision wouldn't be an attack in-and-of-itself, it could only be used as part of an attack (examples are given in the link I shared).

I don't oppose to slightly increasing the size of hashes (e.g. to 25 bytes), but I think we'd be fine without it at first. We can increase the size in the future along with other improvements. If it was my decision to make, I would keep using 20 bytes for all IDs.

lrettig commented 2 years ago

Another thought: do we want an explicit checksum built into our addresses? Both Bitcoin and Ethereum lack this. It was bolted on later using uppercase/lowercase in the case of Ethereum, but it's a bit flimsy, limited in the number of bits, and not enforced at the protocol level (and therefore optional in wallets/apps).

noamnelke commented 2 years ago

Storing the checksum on-mesh makes little sense - it takes up precious space with data that can be deduced from the non-checksummed address. So if this is gonna be "formal" it's still gonna be optional, meaning that if I'm writing a wallet and I want to accept addresses without a checksum - I can do it even if the node requires it (by just calculating it myself and adding it to the address).

Why is this important? Because it means that the checksum may be a standard that we define, but it's not really enforced by the protocol.

I don't object to defining a standard where each address is followed by a checksum, but I don't think it should be enforced at the protocol level. Instead, it should be up to the wallet to enforce this. Explorers should display the address with the checksum and wallets should provide addresses for incoming payments including a checksum, but the node should never hear about it, IMO.

If a checksum check fails the UX is up to the wallet. Reasonable behavior IMO is to warn the user, but allow them to continue. There should be a minor warning if the checksum is missing and a major warning if there's a checksum and it's wrong. Perhaps there should be no "continue anyway" button in that case (but users could remove the wrong checksum and then proceed anyway).

lrettig commented 2 years ago

Here's @iddo333's feedback:

It depends on our consensus protocol rules. If the honest full node is required to receive (with ed25519 signature which actually uses 64 byte hash size) and keep the preimage contract_code+params+salt upon the initial deployment of the contract, and disallow deploying any subsequent contract that hashes to the same address, then there isn't a security risk for full nodes even if the address size is less than 20 bytes (the only problem with collisions is that honest users might try to deploy a contact whose address is already taken, but even that won't be a problem because of the salt that we need anyway). For lite clients, it'd be possible to violate their security via collision attack, but the cost of the attack is high and it won't affect full nodes, so it isn't so attractive for an attacker to attack trust-free lite clients. The address size wouldn't be more than our default hash output (which can be 28 bytes or even 24 bytes), and it better not be less than 20 bytes so that we'd be able to support lite clients in the future.

lrettig commented 2 years ago

@noamnelke, I'm with you on not making this a hard "in protocol" requirement--and thus this should have no impact on, e.g., mesh growth or node communication--but we should design a sane standard for the address format that users will see and interact with. EIP-55 (Mixed-case checksum address encoding) gives eth addresses, on average, 15 check bits per address. By contrast, Bitcoin uses four bytes.

This raises the related question: do we want to use hex address encoding (like eth), base58 (like btc), or something else? Base58-encoded addresses have the benefit that, even with the checksum bytes, they're still pretty short.

lrettig commented 1 year ago

Re-syncing against internal thread with @noamnelke:

Yes, but I think that this is so minimal that we can really just decide last minute.

My personal opinion: we add no version byte and make the actual address 24 bytes long. If a version needs to be added we can bump the transaction version and derive the address version from that in supported clients.

The reason I'm not worried about adding a version byte is that if we ever need it, we'll make version 0 be a container for the old version. We'll need to upgrade all those other objects anyway to have a bigger address, so adding a version byte at that point won't be any harder (and transactions - that have to remain backwards compatible - have a version byte anyway).

In other words, old objects will never have to contain newer addresses and newer objects will have a version byte and can contain older addresses.

lrettig commented 1 year ago

Some further considerations here:

I really like the idea of adding a "address space ID if we use a state expiry scheme that requires it" (Vitalik's first point in the linked post above). This isn't critical at genesis and can be added later.
it would be a very good idea to include the network ID in the address, or at the very least, a mainnet/testnet differentiator, to prevent certain classes of user error
base32 is nice because it allows a smaller QR code size (alphanumeric mode encoding), and to avoid mixed case when reading out loud
Bitcoin's bech32 seems to provide all of these things. It's been adopted by many other chains, including avalanche and cosmos (see: https://github.com/cosmos/cosmos-sdk/blob/82e13b16842209b0c2e4cf952bfdcf25b5e303dd/types/address.go) and has battle-tested libraries (e.g., https://pkg.go.dev/github.com/btcsuite/btcutil/bech32). It's just an encoding format (i.e., agnostic to the underlying address), and is variable length.

lrettig commented 1 year ago

Recommendations:

internally, start with a single, fixed address size ~~(20 bytes should be fine for now)~~, but design our state database in such a way that we can use longer addresses in future. (One obvious way to do this is to use longer addresses, say, 32 bytes, allow zero-padding, and compress the zeroes down on disk and on the wire.) (Note that Bitcoin has switched address formats multiple times and maintains backwards compatibility.)
externally, use bech32 encoding with a HRP such as "sm" (mainnet) and "stest" or "testnet" (testnet).
change our API to use addresses for strings rather than bytes (to support bech32).
regarding address size, increasing this in future will break many smart contracts that rely on a smaller fixed address size (e.g., any contract that does accounting of assets such as NFTs or fungible tokens). they would not support larger address sizes without some very painful irregular state transitions (changing deployed code), hacks, or just being recompiled and redeployed. for this reason it feels safer to reserve a few extra bytes now. I propose using 24 bytes (but requiring the highest-order four bytes to be zeroes for now). this introduces an internal, "canonical" address format that will never change, even if the external encoded format changes. note that it also will not increase the length of the encoded addresses, since bech32 does not have a fixed length and we can zero-pad when parsing.

References:

lrettig commented 1 year ago

Dev tasks breakdown:

go-spacemesh

set HRP (human-readable part) for mainnet and testnet. proposed: sm, testnet
- sample code: https://github.com/ava-labs/avalanchego/blob/8f0f175868618d5222f4d32af62d4cf954010f56/utils/constants/network_ids.go
write, or find, or fork a bech32 implementation
change Address type to [24]byte (i.e., change AddressLength)
write a simple address parser (string -> Address) with error-handling code for wrong format, too long, bad HRP, etc.
- publish as a separate library so it can be used elsewhere
update API code to receive address as string and parse it
update addresses in genesis account config and tests

see https://github.com/spacemeshos/go-spacemesh/issues/3315

API

change Address type from bytearray -> string everywhere (https://github.com/spacemeshos/api/issues/172)

clients

make SMREPL, explorer, dash bech32-aware
- using above-mentioned go library, or a javascript library

lrettig commented 1 year ago

Research is done. Reassigning to @pigmej to manage assignment of dev task and inclusion in sprint.

dshulyak commented 1 year ago

why api needs hrp?

lrettig commented 1 year ago

To make sure the client is communicating with the right network - this is a UX feature to prevent people from accidentally submitting a tx on the wrong chain

lrettig commented 1 year ago

Also in future we may want to use this to identify shards, state expiry epochs etc. - it's an integral part of the address

lrettig commented 1 year ago

Design is done, tracking implementation in #140

spacemeshos / pm

Address size and format #106

go-spacemesh

API

clients