near / nearcore

Reference client for NEAR Protocol
https://near.org
GNU General Public License v3.0
2.31k stars 618 forks source link

Switching from b58 encoding #2100

Open ilblackdragon opened 4 years ago

ilblackdragon commented 4 years ago

@abacabadabacaba suggested to use an improved bs58 scheme that encodes into fixed length representation by splitting sequence into 8 byte chunks.

Additionally to add check sum to verify that the full sequence was provided and wasn't corrupted.

Interesting note that Ethereum didn't have this originally and they added it by using capitalization of letters in base16.

ilblackdragon commented 4 years ago

@frol @evgenykuzyakov just checking in if we want to execute on this before MainNet or will postpone

frol commented 4 years ago

@ilblackdragon Let's start with the problem. I assume the problem is that fixed-length values encode into a non-fixed-length base58 string.

I am against having "clever" custom encodings since it casts away existing tools out there.

Alternatives:

abacabadabacaba commented 4 years ago

Splitting an input into 8-byte chunks and encoding each of them separately (using fixed-length base58) is a simple approach that solves two major problem of base58: its variable length and the quadratic complexity of its encoding and decoding (which can be optimized, but this is not easy).

This approach still has some issues. For example, encoding requires 64-bit divisions, which are difficult to do in some environments (e.g. JavaScript). I have some ideas about a fixed-length encoding that is almost as efficient (length-wise) as base58 and can be efficiently encoded and decoded without using 64-bit arithmetic.

Another approach is to use some variant of base32 encoding. Base32 encoding is about 17% longer than base58, but it only uses letters of a single case. If we want an encoding that can be easily written down by humans, this is a clear advantage. Some variants, such as Crockford's, are specifically designed to be easy to write and pronounce.

frol commented 4 years ago

I want to highlight that we use base58 only to display hashes, i.e. sha256, that is 256 bits (32 bytes) of data, which encodes into:

abacabadabacaba commented 4 years ago

We also use base58 for keys, not just for hashes. Keys that we use are also 32 bytes (but I think that we currently encode Ed25519 private keys as 64 bytes, which is a bug).

In base58, it is 32-44 characters. In base32 (without padding), it is 52 characters.

However, this doesn't include a checksum. With 4-byte checksum, it will be:

frol commented 4 years ago

Would we consider using base64 for the keys?

Pros:

Cons:

abacabadabacaba commented 4 years ago

Public keys are used in URLs.

The motivation for base58 was ease of manual entry. If we don't care about that, we can use URL-safe variant of base64.

frol commented 4 years ago

I wonder if anyone ever entered base58 / uuid / base64 value longer than 20 (yet 44+) chars manually... :smiley:

MaksymZavershynskyi commented 4 years ago

We should not be creating our own encoding scheme, like the suggested modification of b58. This will create a lot fo confusion and will be a major source of bugs for people trying to integrate their tools with our RPC.

There is no point for optimizing for the number of bytes, since we are using JSON and our RPC responses are hundreds or even thousands of characters long even for simple requests due to all the JSON boilerplate and the names of the fields, see https://docs.nearprotocol.com/docs/interaction/rpc

An example of `broadcast_tx_commit` response using b58, making it 1830 characters long { "receipts_outcome": [ { "block_hash": "8xr1UHftKtth1HErsJ5SEHBgVpbHqyWHQVWeXnT78RfV", "id": "Aei6qE1jXnoXA81WvDk69kc8HfPV8kZTRgte38nHV6j8", "outcome": { "gas_burnt": 2389161082329, "logs": [ "Writing the string [ hello ] to the blockchain ..." ], "receipt_ids": [ "Gus1JVoJgBaccBtzHApDs22k6A5CZ8vEgRWHmEhVfQCs" ], "status": { "SuccessValue": "" } }, "proof": [] }, { "block_hash": "11111111111111111111111111111111", "id": "Gus1JVoJgBaccBtzHApDs22k6A5CZ8vEgRWHmEhVfQCs", "outcome": { "gas_burnt": 0, "logs": [], "receipt_ids": [], "status": "Unknown" }, "proof": [] } ], "status": { "SuccessValue": "" }, "transaction": { "actions": [ { "FunctionCall": { "args": "eyJhcGlSZXNwb25zZSI6ImhlbGxvIn0=", "deposit": "0", "gas": 100000000000000, "method_name": "setResponse" } } ], "hash": "7RnMJsHMVMUbArUdKcMaSBPPomjLzQxzhsuvwP3FXGyi", "nonce": 5, "public_key": "ed25519:4dYj4upxVxeaiQtV1EB5NvUjuK2axytiAaChr6xNCSgp", "receiver_id": "dev-jdvw47f9j", "signature": "ed25519:3UrQcvvjuGSWM2w5sQ5GNZYBVWxRMjFbFp1cPg59Qg1rL4PtWMi5s7HY7cgSdx5PB17aqX1nrjiUiK7xxxtBeu6M", "signer_id": "ajax" }, "transaction_outcome": { "block_hash": "A7NDYejEMJ5RNTdcVuoYDTezruYSh8QKPrVK4TkcUnqy", "id": "7RnMJsHMVMUbArUdKcMaSBPPomjLzQxzhsuvwP3FXGyi", "outcome": { "gas_burnt": 2291572068402, "logs": [], "receipt_ids": [ "Aei6qE1jXnoXA81WvDk69kc8HfPV8kZTRgte38nHV6j8" ], "status": { "SuccessReceiptId": "Aei6qE1jXnoXA81WvDk69kc8HfPV8kZTRgte38nHV6j8" } }, "proof": [] } }
Same thing but using base16, making it 2108 characters long { "receipts_outcome": [ { "block_hash": "76502aa865bf709e56b5856fb2bb082a97d5d187ad98d6dac1d7fc7057320f10", "id": "8f622511e512206b3e3b43904ef015281dc983ef971816ab8bbd70839db4757f", "outcome": { "gas_burnt": 2389161082329, "logs": [ "Writing the string [ hello ] to the blockchain ..." ], "receipt_ids": [ "ec6a0c0a0c895a2b4b592e34bfd4473c358f8e53b05252f281c68b322905eadc" ], "status": { "SuccessValue": "" } }, "proof": [] }, { "block_hash": "0000000000000000000000000000000000000000000000000000000000000000", "id": "ec6a0c0a0c895a2b4b592e34bfd4473c358f8e53b05252f281c68b322905eadc", "outcome": { "gas_burnt": 0, "logs": [], "receipt_ids": [], "status": "Unknown" }, "proof": [] } ], "status": { "SuccessValue": "" }, "transaction": { "actions": [ { "FunctionCall": { "args": "039377254c90bc33963184c3a8167da70438e3", "deposit": "0", "gas": 100000000000000, "method_name": "setResponse" } } ], "hash": "5f7f56aaa51736a77872b9fe291ca320611383d7e8bec68031817e6714e2da45", "nonce": 5, "public_key": "ed25519:35efd78157db6dda96a24988e60de9e7f5c1fefcaa246e87d930d2beda020cb1", "receiver_id": "dev-jdvw47f9j", "signature": "ed25519:7c0d6861845ae00e29fe50ebf449fa2eab2ae5d2b0959b57ded4024ab8ae82db7b455b744c91f293f24ad559d2301857d804bc5a7dff4530e7580c9d1a47d20e", "signer_id": "ajax" }, "transaction_outcome": { "block_hash": "875aa81349e154b01f53314e4ddabd9e920f0a5c36be2b9ffb291538f1da07b4", "id": "5f7f56aaa51736a77872b9fe291ca320611383d7e8bec68031817e6714e2da45", "outcome": { "gas_burnt": 2291572068402, "logs": [], "receipt_ids": [ "8f622511e512206b3e3b43904ef015281dc983ef971816ab8bbd70839db4757f" ], "status": { "SuccessReceiptId": "8f622511e512206b3e3b43904ef015281dc983ef971816ab8bbd70839db4757f" } }, "proof": [] } }

It is a 15% increase in size if we are switching to hex. I suggest we pick hex, which is the most common and readable format (we are not displaying full hashes and keys in our UI anyway) and the difference in bytes is negligible. Anything but hex would be a micro optimization.

MaksymZavershynskyi commented 4 years ago

AFAIU currently our RPC can be trivially abused by submitting large b58 encoded keys. I suggest we add a simple threshold cut-off in RPC that checks that b58-encoded key is <45 characters or returns deserialization error. We then switch to hex post-Mainnet since it is would require lots of changes across our Applayer.

MaksymZavershynskyi commented 4 years ago

Unassigning @evgenykuzyakov since I don't think he is working on it. Also changing the title of the issue to reflect the broader scope of the conversation.

abacabadabacaba commented 4 years ago

Now even Bitcoin is moving away from base58. Bitcoin Core 0.20, which was released today, uses Bech32 (a base32-based format) as default address format.

SkidanovAlex commented 4 years ago

What are the objections to postponing switching to another encoding post Phase I?

Do we have to decode/encode base58 today in any environment in which the quadratic complexity matters (e.g. do we do any b58 decoding in EVM)?

abacabadabacaba commented 4 years ago

@SkidanovAlex I'm not saying that we need to do it quickly.

ilblackdragon commented 3 years ago

Is this even possible at this point? :D

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

bowenwang1996 commented 3 years ago

@frol what is the status of this issue?

frol commented 3 years ago

There is no action item here. Feel free to re-open if necessary

abacabadabacaba commented 3 years ago

I think an action item here is to stop using base58 in any new protocols and APIs, and to deprecate (and eventually remove) any protocols and APIs that are using it.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.