Wishlist for BIP32/39/43/44/49 and SLIP44 replacement

Sjors commented 7 years ago

It would be useful to have list of requirements for a future replacement of the standards around hierarchical deterministic wallets and other uses of deriving keys from a mnemonic. Let me know if this is not the right place.

In order to turn it into a BIP / SLIP it needs more feedback and I'll need to make it bit less opinionated :-)

A good place to start is the mnemonic word list. @arachnid suggested several improvements a while ago (see below). Even though an updated word list will have overlap with the BIP39 list, new mnemonics can be generated in such a way to guarantee incompatibility with existing wallets. This intentional incompatibility provides an opportunity to change other rules.

I do not believe this is urgent, so there's time to do this thoroughly and develop a standard that last for a while.

Word list and incompatibility rule

Criteria @arachnid used in his word list generator draft:

all words are 4-8 characters
all 4-character prefixes are unique (very useful for hardware wallets)
no two words have edit distance < 2

Wallets need to be able to distinguish between the old and new standard, so un-upgraded BIP 39 wallets should consider all new mnemonics invalid. At the same time, some new wallets may not wish to support BIP39. They shouldn't be burdened with storing the old word list.

A solution is to sort the new word list such that reused words appear first. When generating a mnemonic, at least one new word must be present. A wallet only needs to know the index of the last BIP39 overlapping word. They reject a proposed mnemonic if none of the elements use a word with a higher index.

Other coins and versioning

BIP 44 is too detailed. E.g. it doesn't make much sense for non-UTXO coins such as Ethereum. It's also not very flexible, leading to the creation of BIP49 to add SegWit support. BIP43 on the other hand is too permissive. This makes it difficult for wallets to properly advertise their compatibility.

The community for each coin is probably most suited to figure out their own derivation scheme below the coin type level. I propose the following rules for coins to be accepted into the standard and for wallets to be able to claim compatibility.

the coin needs to have a BIP-like process and their derivation must be specified in such a "BIP" (which in turn could be as simple as "same as Bitcoin but with different coin type")
a wallet which claims to support this new SLIP-[X] standard can choose which coins to support
if a wallet claims to support SLIP-[X], then for each coin it supports, it must support this standard specify otherwise
SLIP-[X] starts at version 1.0 and wallets should communicate this version
New coin types can be added without a new version
Once a coin is accepted, it must wait for a new SLIP-[X] before allowing coins to be deposited on an address where existing wallets would not look
wallets which claim to be compatible with version N are assumed to be compatible for all coins they support, unless otherwise specified
coins may not change their standard in such a way that new wallets don't look in places where old wallets may have left coins

See also SLIP-0010 for non-secp256k1 coins.

GAP limit, etc

The rules for which addresses to scan should be coin specific.

Bitcoin

This discussion might be more appropriate for a BIP proposal, but I'm just putting it out there.

In my own experience the current limit of 20 has downsides. It may be a reasonable performance trade-off, but this should be evaluated.

There is often a delay between when a wallet user sends an address and when they receive payment. Sometimes they never receive payment. There are services such as exchanges which require you to give them an address, but you may end up never using it. For privacy reasons a wallet should not reuse such an address anywhere else.

As with BIP 44, change addresses don't need a GAP limit. Unless someone objects.

It would be a better user experience is empty accounts can be allowed, e.g. max 3 (again, assuming there's no unacceptable performance issue).

Perhaps by the time this standard goes live, all wallets default to SegWit. But if not, I suggest that when a wallet scans:

the receive chain: for each index, check the P2PKH address first. If nothing is found, check the P2SH-P2WPKH address. Once it finds coins on a P2SH-P2WPKH address, it should only check the P2SH-P2WPKH address moving forward
a separate bech32 receive chain. Since a wallet user needs to interact with older wallets, having a separate chain might be more practical then checking P2SH-P2WPKH and bech32 variants for each index
one change chain: for each index, check the P2PKH first, then the P2SH-P2WPKH address, then the bech32 address. Stop checking P2WPKH once you find a P2SH-P2WPKH address, stop checking that once you find a bech32 address

Ethereum

Again, more of an EIP discussion, but just one thought: consider hardened derivation for each independent "account". Private keys can be exported and this is often useful when different wallets have strongly differentiated features and development is in flux.

Mnemonic to a seed derivation

Can this can be improved? @sipa might have some ideas regarding error correction. Representing the words as integer values rather than literal strings might add more flexibility. I like how bech32 allows a wallet to pinpoint the location of a typo. Similarly it would be nice if it can pinpoint which word is wrong and suggest the right one. For 12 word mnemonics it's surprisingly easy to type a wrong word and still get a valid mnemonic (but an empty wallet).

The minimum number of words could also be reconsidered, but there is a trade-off regarding the likeliness that someone actually writes it down.

Encryption

12-24 word mnemonics are great for new users, but they're not great if someone gets their hand on your piece of paper. It would be nice if the seed can also be exported in a BIP38-like encrypted fashion, perhaps printed as a QR code. More generally, it should be possible to take advantage of hierarchical deterministic wallets without having to use the mnemonic.

Account / address (hardened) derivation

Can this be improved?

I vaguely remember some Bitcoin Core developers having doubts . @luke-jr do you remember who / why?

Duress passphrase

Personally I'm skeptical about this feature and I think it just confuses people. For duress, wouldn't it be better for software to suggest a slight variant of the mnemonic that's easy enough to remember?

Removing that feature would allow more flexibly in the derivation algorithm.

Other languages

I don't think it's a good idea to map word lists in other languages directly to the seed. This could create accidental vendor lock-in if only one wallet supports a certain language. I suggest mapping each word to English or directly to an integer value. It doesn't have to be the same meaning.

If a foreign language mnemonic supporting wallet ever becomes abandoned, the community can create a printable sheet with the mapping of each foreign word to the corresponding English word (again, meanings don't have to match at all).

In addition to a list of universal criteria, it may be useful to have an approval process for each new language. For example some sort of testimony from a linguist, or a native speaker with significant experience in bitcoin. Every language has its quirks which leads to things to avoid (e.g. tons of homonyms in Mandarin) and things to embrace (e.g. many 2 character words in Mandarin).

Other purposes

E.g.:

password generation
pointing to data on a distributed filesystem (hash of public key points to a resource, private key decrypts it)
etc

There should be a way to plug these new applications in. Perhaps through redefining "coin type" as "coin or application type"?

Name

I would suggest giving this standard a name that's as easy to recognize as USB. BIP44 caught on a little bit within the bitcoin tech savvy community, but it's not great to have a name tied to a specific BIP/SLIP number, even with versioning.

Other issues?

What else should be considered?

prusnak commented 7 years ago

BIP39: I am already thinking about creating a standard that will supersede BIP39. I want to support Shamir Secret Sharing Scheme (M out of N), where old mnemonic is just special case when M=1 and N=1.

BIP 44 is too detailed.

That is the main feature of it. We wanted to BIP44-compatible actually mean something. If you had one standard that would do normal addresses, segwit addresses, etc. You would need to distinguish between various variants of BIP44 and I think it is better to say, this is a BIP44+BIP49 compatible wallet than saying BIP44-normal and BIP44-segwit.

In my own experience the current limit of 20 has downsides.

Trust me on that increasing this limit or removing it completely is a suicide.

But if not, I suggest that when a wallet scans ...

I don't agree. I think we should treat P2PKH, P2SH-P2WPKH and Bech32 as separate chains, because they ARE separate address chains. You are introducing a lot of logic and hopefully most of this won't be necessary in the future once we migrate all coins to Bech32 addresses. If we did your way, we'd need to keep this logic forever.

Account / address (hardened) derivation: Can this be improved?

I don't think so. You would not be able to use XPUB for account.

Duress passphrase

Agree that this is bad. Plausible deniable passphrase is much better and already implemented.

Other languages

I was against foreign langauges wordlists in BIP39 and still am. The new mnemonic standard should have English only words.

saleemrashid commented 7 years ago

Haven't read this all yet but I wouldn't mind something where we could encode further information. e.g. A SSSS scheme where the mnemonics have the first word as shamir and the second word encoding the m for the m-of-n.

EDIT: Didn't even read the start of @prusnak's response, ignore me.

Sjors commented 7 years ago

@prusnak regarding separate receive chains: makes sense. Question: can P2SH-P2WPKH be derived from bech32?

Shamir Secret Sharing Scheme (M out of N) sounds really useful. That can be done on the existing BIP39 word set as well as a new set I assume? Does that get easier if mnemonic words are mapped to integer values instead of the literal strings they are now?

Arachnid commented 7 years ago

Agreed on pretty much everything in the initial post. I'd also suggest that new wordlist should also ensure all words have unique metaphone codes.

I personally quite like the idea of deriving the seed from the sequence numbers of the words rather than their text; this makes it possible to express the same mnemonic in different languages/dictionaries.

A while ago I wrote up a spec on 'extended mnemonics' that can encode additional data; an adaption of this may be useful for a future BIP39 replacement.

saleemrashid commented 7 years ago

/cc @ecdsa

saleemrashid commented 7 years ago

@Sjors A BIP39 mnemonic is the encoding of (number of words × 11 ÷ 8) bytes (which includes a 4 byte checksum) which would be what you want to encode for SSSS (as having (m - 1) parts would be dangerous if you encode the mnemonic itself)

saleemrashid commented 7 years ago

I think we should definitely version the mnemonic and make it incompatible with BIP39 and implementations should check that they support that version, else refuse to import it. Then we can add newer features with reduced risk of people not being able to import it in, e.g. 5 years, because they can't remember which piece of broken software they used.

Also, Electrum doesn't check the checksum for BIP39 mnemonics which is very problematic (because even if we use a new word list, Electrum would accept it as BIP39)

ecdsa commented 7 years ago

@saleemrashid Electrum does check the bip39 checksum (in git master).

@Sjors Electrum has versioned mnemonic seeds. I tried to propose this idea to the trezor team years ago, but they rejected it because it was going to slow down the commercialization of their product. If you want versioned seeds, you should use the Electrum standard instead of creating a new one.

prusnak commented 7 years ago

@ecdsa Could you point me to BIP standard where is your seed documented?

ecdsa commented 7 years ago

@prusnak http://docs.electrum.org/en/latest/seedphrase.html There is no BIP at this point, but we can create one

prusnak commented 7 years ago

Thank you for the link, no need to create BIP, though, because I do think we still need to create a new standard, which will take best of the both worlds and also adds M-of-N SSSS into the mix.

Also I don't like the fact the version stored in mnemonic defines the derivation scheme. IMO mnemonic should only encode entropy (or entropies) and a way how to generate master private key from it, not the derivation scheme.

I think this is the main philosophical difference between your and our approach. I see the benefits of your solution (you don't have to try several schemes), but at the same time I really like that our seed is "upgradable" and I think this feature is much better than having to try several schemes (and you only do this once - during restore procedure).

Let's take the current SegWit situtation. If you created Electrum seed 2 years ago and buried it in the garden 500 km from your house, you would need to go to that place today again to store the new SegWit-enabled seed. How about next year when native SegWit and Bech32 is widely used? What if you want to use the seed to generate U2F or SSH keys? My example is far stretched, maybe your garden is just 25 km from your house, but I think that illustrates my point.

I'd like to invite you to drafting a new standard with me, if we find a way how to make the seed upgradable (or if you decide that not storing a derivation scheme in seed is a good idea).

saleemrashid commented 7 years ago

Also I don't like the fact the version stored in mnemonic defines the derivation scheme.

What do you mean by derivation scheme? Are you talking about BIP-0032 or BIP-0044/49?

prusnak commented 7 years ago

I'm talking about this: http://docs.electrum.org/en/latest/seedphrase.html#list-of-reserved-numbers

Not sure if Electrum follows BIP-32/BIP-44/BIP-49 for these particular cases.

saleemrashid commented 7 years ago

Totally agree, there's no reason why you need to encode anything more than the entropy in the mnemonic.

prusnak commented 7 years ago

I am talking about the fact that once mnemonic encodes something more than the entropy (because it defines WHAT to do next with that entropy in order to derive keys to be used), you need to generate a new one every time you want to use it for something new.

Arachnid commented 7 years ago

also adds M-of-N SSSS into the mix

Please let's not hardcode just that, though - personally I'd much prefer a system like the one I linked, that permits different types of mnemonic, with different means of deriving the secret data.

I am talking about the fact that once mnemonic encodes something more than the entropy (because it defines WHAT to do next with that entropy in order to derive keys to be used), you need to generate a new one every time you want to use it for something new.

I agree, but I would like to be super clear that every (mnemonic type, network) tuple should have a single canonical derivation defined by an extension proposal. We're currently suffering from a glut of different derivation paths in Ethereum, and I wouldn't wish it on anyone.

saleemrashid commented 7 years ago

@Arachnid The way I see it, Bitcoin can be an "application" for this new derivation scheme and Ethereum can be another "application". Then we can have something like m/bitcoin/<coin_type>/<...> for Bitcoin, Testnet, Litecoin, etc. and m/ethereum/<chain_id>/<...> for Ethereum. (Of course, we won't use strings for it). Sounds good?

Arachnid commented 7 years ago

Yes, I agree - I'm just saying that we should do everything we can to make sure the derivation path(s) can be known based on the code and the context it's used in; any ambiguity there will lead to multiple competing derivation paths.

saleemrashid commented 7 years ago

@Arachnid Agreed. I've always thought it was a mistake to use BIP-0044 for Ethereum (and other non Bitcoin-like coins).

Sjors commented 7 years ago

@saleemrashid wrote:

A BIP39 mnemonic is the encoding of (number of words × 11 ÷ 8) bytes (which includes a 4 byte checksum)

Just to clarify, my concern is with this sequence: N × 11 ÷ 8 bytes -> N words -> pbkdf2(words + passphrase) -> key space. This makes the key space depend on the specific language, which creates a compatibility risk for non-english languages (and I find it inelegant, but that's not a strong argument).

I would like to see the following sequence instead: N words -> M bytes -> key space, where I don't have any preference as to whether M = N × 11 ÷ 8 or some other scheme. This allows for different ways to create the M bytes, e.g. words in another language, SSSS, something like BIP38 or a combination.

If people really want a passphrase, that could go either in the step N words -> M bytes or in the step M bytes ->key space.

Arachnid commented 7 years ago

@Sjors I agree entirely.

Sjors commented 6 years ago

@prusnak wrote:

In my own experience the current limit of 20 has downsides.

Trust me on that increasing this limit or removing it completely is a suicide.

The ValueShuffle proposal seems to require never using an address it it was revealed in a failed mixing attempt.

There's probably more use cases like this.

roconnor-blockstream commented 6 years ago

I would like a BIP-39 replacement to have the property that it be plausible (although not necessarily easy) to derive a master seed by hand, by rolling dice, or flipping coins, etc. For BIP-39 there are many reasonable ways of transforming dice rolls or coin flips into a uniform selection of of choices of words, and I don't think any particular method need be prescribed. However, the SHA-256 based checksum in BIP-39 is what kills the ability to generate a master seed by hand.

A BCH code, such as the one used for Bech32 in BIP-173 provides a checksum that I believe can be plausibly computed by hand, though I suppose I need to validate this by trying it.

For Bech32, the pencil and paper algorithm for generating a checksum works by operating on a string of 6 characters from the bech32 alphabet, starting with the expanded HRP data (which is a fixed value and can be safely precomputed and published). The checksum computation works by appending one data character to the end of this 6 character buffer and removing a character from the other end of the 6 character buffer. Using a precomputed printed lookup table of 32 entries (Table 1), one for each Bech32 character, one finds the corresponding entry for the removed character and it will have an associated 6 character (from the bech32 alphabet) entry. One needs to combine this value, with the current 6 character buffer using a second table (Table 2). Table 2 contains a 32x32 "addition" table for Bech32 characters. Working character-wise one "adds" each character from the current buffer to the corresponding character from the entry picked out from Table 1. This "sum" becomes the new 6 character buffer.

This process repeats until all the data character have be processed. After that, there is a bit of post processing work where you process 6 more 'q' characters in the same way and then you need to change the last character according to a third table, and you are done. The resulting 6 characters is the Bech32 checksum.

My point of writing out the above is to illustrate that it is plausible to do such computation by hand and still get a powerful error-correcting checksum like what Bech32 uses. Creating a checksum by hand isn't too dangerous. The failure scenario is that the checksum computed is incorrect. One would be expected to test one's master seed in the hardware device one is using before committing to it, or otherwise repeat the computation independently 2 or 3 times. In any case, no one would be forcing one to generated master seeds this way; I only want it to be possible for sophisticated users to be able to do this sort of calculation. The process doesn't even need to be documented as far as I'm concerned.

From what you have written above, it sounds like you were leaning towards using such a checksum anyways, which is great!

saleemrashid commented 6 years ago

@roconnor-blockstream Take a look at https://github.com/satoshilabs/slips/blob/master/slip-0039.md for the current status of this.

roconnor-blockstream commented 6 years ago

Thanks. Where is an appropriate place for me to comment on Slip-39?

Given the 10-bit word list, I think it would be nice to simply replace GF(256) with G(32) and make everything a multiple of 5 bits except for the result of the key derivation function. (We would want to round the master seed size to 130-bit / 255-bit (or maybe 230-bit). Then you could drop in Bech32's 30-bit checksum, since it too works over G(32) (though I do feel like having a checksum on the master seed itself is a bit overkill). Everything could plausibly be computed by hand (while I wouldn't expect anyone to do SSSS for by hand for any threshold other than M=1, I do think SSSS could be done by hand this way without more effort than a typical World War 2 spy would expend on a hand-cypher of that era.)

With 10-bit words, there is a canonical mapping between words and pairs of bech32 characters, allowing one to have a choice between compressed encoding (bech32) and word-list encodings of the share data, which is a nice property of your word list.

I think having the option of constructing master seeds without the use of modern digital computer is important, but not everyone may agree on this point.

saleemrashid commented 6 years ago

@roconnor-blockstream Bear in mind that it can be used for secret data that isn't a BIP32 master secret, so we don't want to put unnecessary restrictions on the length of the secret.

I agree with your last point, but I don't think you need to be able to generate the recovery seed. Being able to provide the entropy would be sufficient and you can always verify that the entropy gives the expected recovery seed. If you cannot do that for whatever reason, generating the recovery seed without a computer is equally useless because you cannot verify that private keys are being derived correctly.

roconnor-blockstream commented 6 years ago

I don't think anything I suggested would put any unnecessary restrictions on the length of secret data (though I'm unclear if by "secret" you mean the master secret or the resulting seed).

I agree with your last paragraph. But just to emphasize, there is a huge difference between trusting a device such as a Trezor with correctly computing deterministic private keys and deterministic signatures versus trusting such a device to produce genuine random data.

But yes, A college suggested it would be sufficient to write an app for whatever hardware device will be holding the secret anyways that computes checksums from data. As far as I'm aware there are no such apps for the Trezor, etc. Perhaps the onus is on me to create these apps if I think it is so important. There are other schemes where you could enter random data and get a certificate that proves that the given data was incorporated into the device's generation of the random seed, which would also work. Although ideally those certificates would need be such that one could validate them by hand, and I don't know if that is the case.

Still I'm a big fan of Bech32. I think error correcting codes is be great way for storing share data and why not use an ECC that you'll already be needing anyways?

saleemrashid commented 6 years ago

computing deterministic private keys and deterministic signatures versus trusting such a device to produce genuine random data

I think you misunderstood what I was saying.

To be clear, you can currently verify that entropy the connected computer provided (either from the CSPRNG or your own entropy) was used combined with the TREZOR's internal entropy because the device displays the internal entropy it used.

But I'm talking about a new feature that would allow you to securely provide all of the entropy. The TREZOR would compute the checksum and output a recovery seed as usual, but you could verify that the entropy you provided was used by the TREZOR.

As far as I'm aware there are no such apps for the Trezor, etc.

There are a huge number of BIP39 implementations and any of those should allow you to generate (the same) mnemonic from the entropy.

Still I'm a big fan of Bech32. I think error correcting codes is be great way for storing share data and why not use an ECC that you'll already be needing anyways?

Totally agree but, even if we switch to ECC, the usage of PBKDF2 and AES would still cause an issue for you.

jb55 commented 6 years ago

I think having the option of constructing master seeds without the use of modern digital computer is important, but not everyone may agree on this point.

+1, especially considering the latest cpu hardware exploit horror show

prusnak commented 6 years ago

I think it would be nice to simply replace GF(256) with G(32)

There are already nice implementations of GF(256), so I would rather stick to these, not go with GF(32).

Bech32's 30-bit checksum

This is just too much, effectively adding 3 extra words to the share.

I think having the option of constructing master seeds without the use of modern digital computer is important, but not everyone may agree on this point.

You can create a seed using a dice or whatever other means. It will not be a valid BIP39 seed, but you can still use it. I don't think you want to create SLIP39 mnemonics by hand.

why not use an ECC?

For mnemonics you don't want to use ECC. Mnemonic generated from a wordlist is itself already an error correcting code (if you see a word acadrrnic - it's most probably the word academic). Also you don't want to use ECC, because it would help an attacker to reconstruct the seed if they don't have the full information, which is something you don't want.

Also as Saleem indicated above, in case of TREZOR - two sources of entropy are mixed, so it is very hard for an attacker to meaningfully exploit both processors at the same time.

saleemrashid commented 6 years ago

Mnemonic generated from a wordlist is itself already an error correcting code

Good point!

roconnor-blockstream commented 6 years ago

Bech32's 30-bit checksum

This is just too much, effectively adding 3 extra words to the share.

I'm suggesting getting rid of the master secret checksum and only having share checksums. That is replacing your two 16-bit checksums with one 30-bit checksum.

There are a huge number of BIP39 implementations and any of those should allow you to generate (the same) mnemonic from the entropy.

No way am I going to run any of those implementations on a modern digital computer. The only digital device that is ever allowed to touch my mnemonic is my hardware wallet. Every other digital device is considered compromised for this purpose. Hence I need a hardware wallet app to help or preferably I want to have the ability to generate a SLIP39 mnemonics by hand.

If I can construct the entire mnemonic by hand I am (almost) completely assured there is no funny business going on in the seed creation. If I can construct the mnemonic prefix by hand and have the hardware wallet generate the checksum for me, then it is very easy to validate that there is no funny business going on because I can validate that the mnemonic prefix is identical to the words I generated with my dice or coins.

The TREZOR's display of internal+external entropy is inadequate because it gets the external entropy from a computer (AFAIU you cannot enter the external entropy using the buttons on the Trezor, but correct me if I'm wrong here). You are asking me either to trust that I have an uncompromised computer to plug my TREZOR device into to input my dice entropy, and I don't trust any of my digital computers for something this important, or trust the internal entropy generated by the TREZOR is secure, which I also don't want to have to trust because it is basically impossible to audit. I do trust my dice and my coin tosses (at least more that the alternatives).

But I'm talking about a new feature that would allow you to securely provide all of the entropy. The TREZOR would compute the checksum and output a recovery seed as usual, but you could verify that the entropy you provided was used by the TREZOR.

Yes that would be super. As long as it is designed so that it is plausible that I can verify the master seed was generated by my input entropy by hand. (For example, if we consider BIP 39, I would input a set of random words, and the feature would replace the last one with one with a correct checksum, and then I can by hand double check that the last word is in the same 256 word bucket as the last word I entered. This is plausible to do by hand.) I do not want to have to use a modern digital computer to do the validation because I cannot safely enter my master seed into such a device.

Totally agree but, even if we switch to ECC, the usage of PBKDF2 and AES would still cause an issue for you.

I don't see a problem. The master seed comes before the PBKDF2 function, so everything is fine. I'm willing to trust that my hardware wallet is properly deriving private keys from my master seed that I type into the device. I'm just not willing to trust the hardware wallet's random number generator.

Also you don't want to use ECC, because it would help an attacker to reconstruct the seed if they don't have the full information, which is something you don't want.

All checksums are in a sense error correcting codes. The difference between an proper ECC such as BCH and a truncated SHA256 sum is that with BCH you have some understanding on the minimum distances between valid codes in the space of all codes and have fast algorithms to reconstruct values given errors. With truncated SHA256 sum you have a randomly distributed valid codes with no known structure so you have no understanding of how easy or hard it will be for an attacker to reconstruct values from partial information and which partial information. (Error correction is simply finding the closest valid code to a given code according to some metric such as edit distance. Error correction can always be done using any checksum just by doing a search.)

Let's try to quantify these properties: IIUC (and someone should check my math here), Bech32 correct any 2 letter error (within a 71 letter message) with a very fast algorithm. Words from the SLIP39 word list correspond, entropy-wise, to a pair of Bech32 letters. So this means that a Bech32 code can quickly correct any single word error. From the attackers perspective he can quickly reconstruct the correct mnemonics even when any one of the words is missing or wrong.

Let's try to prevent this attack using a SHA-256 checksum instead. The attacker is given the master seed except that one of the words is incorrect and the attacker don't know which one. Given a 30 word mnemonic, the attacker must perform 30*1024 SHA-256 operations to generate a list of valid mnemonics within that edit distance. The attacker then can try every one of those valid mnemonics (of which there will only be a small handful, and very likely only one) to see if they have funds. Yes, 30,000 SHA-256 operations slow the attacker. Does it provide significant protection? I don't think it does. Is it worth the loss of easy error correction? In my opinion, no.

If you don't want attackers to reconstruct the seed from partial data, you need to remove the checksum entirely. If you want a checksum, then use a proper checksum with fast error correction and guarantees instead of one with slightly slower error correction and no known guarentees.

roconnor-blockstream commented 6 years ago

Anyhow. My important comment is that I would like a hardware wallet that lets me input entropy directly into the device and lets me verify that the generated master secret used that entropy without using a computer. Everything else I've commented on is less important.

I think the design of a BIP 39 replacement could facilitate this property, universally for every application using it, by making it plausible for one to generate the mnemonic, with checksum, by hand. I believe BCH checksums are amenable to hand computation, and are strictly superior to SHA-256 checksums even without the hand computation property. Bech32 doesn't have to be used, though it is convent that a pair of letters corresponds to a SLIP39 word, it is ready made, and would provide for an alternative bech32 style encoding of the master secret for people who want something compact. It seems likely that a GF(1024) BCH checksum would have more suitable error correction properties for shares (while more irritating to hand compute than Bech32's BCH checksum, I think it would still be possible).

prusnak commented 6 years ago

I'm suggesting getting rid of the master secret checksum and only having share checksums.

We do want to have master secret checksum, so the user knows they combined the shares correctly (i.e. there was no share from a different set) and don't freak out when something like this happens.

BCH checksums are strictly superior to SHA-256 checksums

I had a discussion with Daan at 34c4 and he suggested to replace CRC checksums with SHA, because CRC/BCH checksums have much less entropy than SHA checksum. His gut feeling was that mixing a high-entropy data (master secret) and low-entropy data (checksum) was not a good idea for SSS algo. But then you don't want to use checksum for the master secret, so you probably have not thought about that, but I really do want it to have better UX.

My important comment is that I would like a hardware wallet that lets me input entropy directly into the device

This is somewhere down in the roadmap.

and lets me verify that the generated master secret used that entropy without using a computer.

That's more tricky.

saleemrashid commented 6 years ago

That's more tricky.

It's also semi-useless as far as I understand it. It doesn't matter if the recovery seed is correct if the device isn't generating the correct receiving addresses.

If you don't trust the wallet, you must verify the addresses. If you're verifying the addresses, you can verify the entropy was used in the recovery seed.

joshmh commented 6 years ago

Ultimately it would be useful to have an encrypted/signed data transfer protocol standard for sharing important information between hardware devices. The data could be transferred through online computers (e.g. via USB) but it wouldn't matter because it would be end-to-end authenticated and encrypted.

Allowing a user to verify entropy, addresses, public keys, and tx info on different devices from different manufactures would help a lot.

saleemrashid commented 6 years ago

Allowing a user to verify entropy, addresses, public keys, and tx info on different devices from different manufactures would help a lot.

If all manufacturers add support for importing custom entropy, then users could import the entropy into multiple devices, verify the recovery seed generated is identical and verify the addresses the device generates.

joshmh commented 6 years ago

If all manufacturers add support for importing custom entropy, then users could import the entropy into multiple devices, verify the recovery seed generated is identical and verify the addresses the device generates.

Right, however securely entering things like entropy and public keys on multiple hardware wallets is inconvenient for users, so automating this would improve security because more people would actually do it.

saleemrashid commented 6 years ago

Right, however securely entering things like entropy and public keys on multiple hardware wallets is inconvenient for users

But the hardware wallet would then have to compromise the entropy and all decent hardware wallets will never give up the private keys after the initial recovery seed generation.

Once you have verified the entropy on at least one other hardware wallet, it is safe to assume the recovery seed was actually generated from your entropy. From that point, you can use the recovery seed to initialize other wallets.

roconnor-blockstream commented 6 years ago

I had a discussion with Daan at 34c4 and he suggested to replace CRC checksums with SHA, because CRC/BCH checksums have much less entropy than SHA checksum. His gut feeling was that mixing a high-entropy data (master secret) and low-entropy data (checksum) was not a good idea for SSS algo.

Interesting; my intuition is the opposite. I expect that we can prove that (asymptotically?) CRC/BCH checksums all occur equally as often as one another (due to their uniform algebraic layout), and therefore contain maximum entropy (a claim that should be verified). For SHA-256, this maximum entropy property is derivable from the random oracle assumption, but isn't something that we know how to prove about SHA-256. The key property of SHA-256 is that it is supposed to be infeasible to find collisions and preimages; something that isn't relevant to this application at all.

(Edit: I also think this isn't relevent to SSSS since SSSS hides information perfectly)

tayvano commented 6 years ago

Is there any possibility of still getting rid of the "plausible deniability" passwords or am I alone in this line of thinking?

While I appreciate the value of plausible deniability in concept, I have spent far more hours (probably a few hundred at this point) working with users to find coins across derivation paths AND passphrases and ultimately, especially as it concerns usability, this form of password does more harm than good.

Additionally, I have yet to encounter someone hit over the head with a wrench and utilizing their "plausible deniability password" to escape that dire situation. 😉

From a usability standpoint, there are a number of wallets and applications that hide these passwords under "advanced" settings for exactly this reason. We personally won't be encouraging users to use passwords with their phrases up front as we are not confident in new users ability to grasp that concept along with everything else we need to teach them during the onboarding process.

If the goal is to have as many people having password-protected phrases as possible, plausible deniability isn't the way to go.

Steve132 commented 6 years ago

12-24 word mnemonics are great for new users, but they're not great if someone gets their hand on your piece of paper.

This is what the 'password' part of bip39 is for in the 'master seed derivation' section. Because of the way pbkdf2 works, a given mnemonic is essentially encrypted with a secure stream cipher during generation if you use the password provisions of bip39.

It would be nice if the seed can also be exported in a BIP38-like encrypted fashion, perhaps printed as a QR code. More generally, it should be possible to take advantage of hierarchical deterministic wallets without having to use the mnemonic.

It is possible to take advantage of HD wallets without having to use the mnemonic. You can use the master seed directly (a-la bip32), which is entropy that stands alone and is not derived from a mnemonic. Most wallets don't support this, and they should, but the standard already does.

prusnak commented 6 years ago

We did some substantial improvements to our standard and we feel it's moving into right direction. Feel free to comment: https://github.com/satoshilabs/slips/blob/master/slip-0039.md

luke-jr commented 6 years ago

That looks like it should be a BIP, not a SLIP?

prusnak commented 6 years ago

That looks like it should be a BIP, not a SLIP?

There is a certain opposition against not using version and birthday fields in the mnemonics, and I'd rather not spend my time defending my decision why not to include them again and again. I think we've incorporated all meaningful feedback from the bitcoin-dev mailing list and I can perfectly live with this document not being a BIP, only a SLIP.

roconnor-blockstream commented 6 years ago

Thanks for the update. It does seem like a huge improvement. I'll have to study the proposal more carefully, but there is one thing that immediately jumps out at me.

In normal SSS you can prove information theoretic security by showing that for any N-1 shares and any message M there exists an Nth share such that those shares encode M. You can (and should) construct a program to generate this Nth share from M and the N-1 shares. However the way you use the PBKDF2 in this proposal precludes this possibility. For the same reason it precludes splitting any existing secret with this proposal. I don't know how I feel about this part of the design.

I'm also worried that the Bech32 30-bit checksum is too short. A longer checksum would allow for more error correction (see https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-June/016112.html and https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-June/016152.html) Perhaps you will argue that the use of a word list provides sufficient additional redundancy already. I'd have to think more about it. I'd kinda like to see the wordlist scraped for a Bech32 string with a longer checksum. This would make the amount of redundancy and error correction plain as day.

But let me stress how much of an improvement I think this new proposal is. Yes I appreciate I'm the one who advocated for Bech32 above and now I'm arguing for a longer checksum. However, I believe using Bech32 much better than the old truncated SHA-256.

Nit: Converting the mnemonic shares to master secret should mention verification of the checksum as a first step.

P.S. I have been mulling my own SSS proposal in my free time. I should post about it in order to make some comparisons.

prusnak commented 6 years ago

Thanks for the update. It does seem like a huge improvement.

Thank you!

In normal SSS you can prove information theoretic security by showing that for any N-1 shares and any message M there exists an Nth share such that those shares encode M.

Is there any good reason why we should care about this for this particular application? We use KDF to extract the shares to eliminate the possibility of the following scenario: an attacker knows the first half of the first share and second half of the second share. Because it's highly impractical to use GF(2^256), this attack is possible. Stretching shares with KDF eliminates this.

It precludes splitting any existing secret with this proposal. I don't know how I feel about this part of the design.

This is part of the design. Gmaxwell criticised BIP-39 as "a thinly disguised brainwallet" and making it very easy to encode existing entropy (and thus possibly insecure) into a valid BIP-39 via bruteforcing the weak checksum. Also, it's not possible to migrate existing BIP-39 seeds into SLIP-39 ones if the original one used a passphrase (i.e. the passphrase feature is not migrated), which kind of invalidates our desire to include splitting existing entropy into shares in our design.

I'm also worried that the Bech32 30-bit checksum is too short.

It's already a massive improvement over BIP-39 (8-bits) and even over the original SLIP-39 proposal (16-bits), I think I am quite happy about 30-bits of checksum already.

But let me stress how much of an improvement I think this new proposal is.

Thanks again!

Nit: Converting the mnemonic shares to master secret should mention verification of the checksum as a first step.

Sure, will do.

luke-jr commented 6 years ago

@prusnak BIPs don't require defending...

roconnor-blockstream commented 6 years ago

Is this scheme limited to M-of-M SSS? That seems like quite a restriction.

prusnak commented 6 years ago

Is this scheme limited to M-of-M SSS?

No, it's not.

satoshilabs / slips