paritytech / substrate

Substrate: The platform for blockchain innovators
Apache License 2.0
8.39k stars 2.65k forks source link

Substrate BIP-39 generate different seeds between Polkadot-API and Subkey #14631

Open Lohann opened 1 year ago

Lohann commented 1 year ago

Is there an existing issue?

Experiencing problems? Have you tried our Stack Exchange first?

Description of bug

The implementation of Substrate BIP39 is not consistent between polkadot-api and subkey for numerical junctions that have more than 64bits, example:

Using polkadot-api keyring:

import { Keyring } from '@polkadot/keyring';

const keyring = new Keyring({ type: 'sr25519', ss58Format: 42 });
const account = keyring.createFromUri("//18446744073709551616");
console.log(account.address);

// Output: 5FsxbdTHtNSLE5HCLzzGVB2oEzxZB6QJuakiHSoTNWEXD1qj

Using subkey:

$ subkey inspect --scheme=Sr25519 --network=substrate "//18446744073709551616"

Secret Key URI `//18446744073709551616` is account:
  Network ID:        substrate 
 Secret seed:       0xe95232125504305f665b8e8c891e1e1e87fcbb0b4eff38a9e100862207d2ce97
  Public key (hex):  0x11ea83f6705f1e3bd3b50695227a1c64dccd5c368648f17eb80f093695028daa
  Account ID:        0x11ea83f6705f1e3bd3b50695227a1c64dccd5c368648f17eb80f093695028daa
  Public key (SS58): 5CUCLVQzLh5oxNanu3699rqaxmkiQG4kNNd4WCDfJWiqwuYk
  SS58 Address:      5CUCLVQzLh5oxNanu3699rqaxmkiQG4kNNd4WCDfJWiqwuYk

This section of the documentation says:

purely numeric items are interpreted as integers, non-numeric items as strings.

However it doesn't define a max bitlength for the numerical value, the Javascript implementation consider integers of any size:

const RE_NUMBER = /^\d+$/;

RE_NUMBER.test(code)
        ? new BN(code, 10)
        : code

While the Rust implementation only accepts u64:

let res = if let Ok(n) = str::parse::<u64>(code) {
    // number
    DeriveJunction::soft(n)
} else {
    // something else
    DeriveJunction::soft(code)
};

Which implementation is the right one?

Steps to reproduce

No response

burdges commented 1 year ago

There were older variations in subkey too, so maybe we should document build steps for the important versions of subkey in a wiki somewhere.

Lohann commented 1 year ago

I think we need a formal specification for Substrate-BIP39 that excludes any ambiguity, also include a formal definition for junction derivation for all supported algorithms (ED25519, SR25519 and ecdsa), without this there's the risk of the user lost its funds when trying to migrate its mneumonic pharse to wallets with different implementations.

I'm implemeting a Dart Client for interacting with Polkadot/Substrate, and I found this issue while implemeting Substrate-bip39 in Dart. Without a formal spec I'm not sure what is the correct implementation:

final n = BigInt.tryParse(code, radix: 10);
final Uint8List bytes;
if (n != null &&
    n >= BigInt.zero &&
    n < BigInt.parse('18446744073709551616')) {
  // number
  bytes = U64Codec.codec.encode(n);
} else {
  // something else
  bytes = StrCodec.codec.encode(code);
}

or this:

final n = BigInt.tryParse(code, radix: 10);
final Uint8List bytes;
if (n != null && n >= BigInt.zero) {
  // number
  bytes = toLittleEndianBytes(n);
} else {
  // something else
  bytes = StrCodec.codec.encode(code);
}

Would be so much easier if the standard considered everything as a byte string, but now it would break compatibility.

Also would be nice to enforce some limits and constraints, it would make the implementation safer and more predictable, especially for constrained devices like hardware wallets, ex: 1 - A valid uri cannot have more than 32 junctions, or all complaint devices must support up to 32 junctions. 2 - If the uri starts with 0x it must be a raw hex encoded seed 3 - A valid URI and junction must contain only alphanumeric chars, ex: vessel ladder //$#(* would be an invalid uri 4 - A junction cannot have more than 32 chars in length (either for numeric or string junctions). 5 - A numeric junction must be 256bit unsigned integer LE encoded, otherwise is a string junction SCALE encoded.

Lohann commented 1 year ago

I think that everything that is substrate BIP-39 related must move to substrate-bip39 repo, including junction derivation logic, I had a lot of extra work trying to implement a complaint substrate-bip39 lib because the code is spread between substrate and substrate-bip39 repos. Having everything centralized in one repo also makes much easier for third-party libs do a native binding and support substrate-bip39.

Also if someone want to support a new signature scheme, like NIST Post Quantum ones, we can discuss how junction derivation should be implemented in one place.

davxy commented 1 year ago

If I don't miss something, currently the sources of knowledge about some key derivation details are:

Currently the wiki:

@burdges is there any RFC or "official" doc describing how our HDKD algorithm (not the bitcoin one since is not compatible) should behave for integer junctions with value ≥ 2^26?

cc @lisa-parity what about starting by adding these details to the wiki? Maybe should not be part of the subkey tool description but as a paragraph of https://docs.substrate.io/learn/accounts-addresses-keys/

burdges commented 1 year ago

I'd think the spec team should've written something by now, but maybe not.. @drskalman ?

You'll need the code in https://github.com/w3f/schnorrkel/blob/master/src/derive.rs too probably, but maybe wallets do their own hard derivations, not sure. Yes it should be speced it nobody did it.

We make several mistakes in substrate, some inherented from bitcoin and some new, but mostly impossible to fix.

We should've less logical similarity between hard and soft derivations, so they should somehow be much more separated than in bitcoin & similar.

In other words, the whole "seed phrase / phrase / phrase" thing makes sense for hard derivations. It's kinda a mess for soft derivations though. Ideally we'd deploy a "new fixed" soft derivation that works under an sr25519 key pair or DKG, but doing this is not on anyone's radar. I'm not even sure if it depends upon the DKG right now.

All the above says, there is limited value in the much of the currently deployed scheme, but like I hinted above there is great value in users being able to access their old accounts on niche signers like subkey. Imho the first question should be: Do we have adequate instructions for accessing every possible past variant under which someone maybe holds DOT/KSM? I suppose this is a question for the W3F tech ed team, but I want to stress that this is the part that matters.