[PROPOSAL] Encode restore height as 26th word of the mnemonic seed

dEBRUYNE-1 commented 4 years ago

The restore height is currently a value that has to be entered manually for wallets that are restored from either the keys or the mnemonic seed. The wallet will essentially ignore blocks (only pulling block hashes) before the restore height and start scanning (looking for transactions that belong to the wallet) from the restore height block.

User experience is degraded if the user accidentally sets a restore height that is too 'high' (i.e. after the first transaction to the wallet), as the wallet will 'miss' certain or all transactions, thereby causing an improper balance (as well as transaction history) to be displayed.

In order to improve user experience, we could encode an approximate restore height as additional word of the mnemonic seed. The restore height would then be set automatically upon restoring the wallet, thereby ensuring users will not inadvertently set an erroneous restore height.

I personally do not see many drawbacks of this proposal. Guides will have to be updated to reflect the new format and users need to be informed. Users further, initially, may be slightly confused due to two different seed formats being present. However, I think ultimately the proposal is net beneficial to user experience.

knaccc commented 4 years ago

I'm now in favor of the proposal, having thought about it more.

I just did a wallet restore with restore height=0, using the node.supportxmr.com remote node.

It took 1 hour 30 mins, and required a download from the remote node of 6.5 GB.

Since remote nodes provide by far the fastest way to get people up and running, this proposal would cut down the bandwidth requirements for remote nodes by 90% assuming the "eternal september" I referred to earlier. That's too great an advantage for me to ignore.

fluffypony commented 4 years ago

@knaccc welcome to the pro-26th-word club :)

ghost commented 4 years ago

I like philogoly's idea to encode the restore height in the checksum word. Round it down to the nearest xK blocks or month, whatever fits. This way the concept of the restore height is abstracted away.

@tobtoht I was only suggesting that the restore height also be checksummed not that it somehow be combined with the checksum word. If you were to combine the restore height with the checksum somehow let's say by XOR it would basically make the checksum useless since it would only lead to a corrupted restore height if any word is wrong. If that's what you were suggesting.

I have a radical idea for mnemonic seeds: while Monero doesn't have an easy way to store arbitrary data on the blockchain like other cryptocurrencies it could still be done by storing data in payment ids. Instead of the mnemonic phrase storing the entire private spend key with additional information such as a version and restore height it stores the decryption key and location of the private view key on the blockchain, this would require far fewer words but the cost would be users would have to pay a small fee in Monero to store the data. In the future if things were to be added one could encrypt that data as well for the newer wallet versions but the mnemonic seed phrase would have the exact same length.

Cactii1 commented 4 years ago

How would that work if you don't have access to the wallet that allows you to decrypt the encrypted Payment ID ?

fluffypony commented 4 years ago

@philogy apart from breaking the principle of "every transaction should appear indistinguishable" you also have the problem of having to scan the whole blockchain to find that transaction with your encrypted data, thus negating its value:-P

ghost commented 4 years ago

@fluffypony good point. I was also suggesting that you'd also store the location in the blockchain in the mnemonic phrase:

stores the decryption key and location of the private view key on the blockchain

How would that work if you don't have access to the wallet that allows you to decrypt the encrypted Payment ID ?

Can't you publicly see payment ids anyway?

ghost commented 4 years ago

Also the data would be encrypted before being stored in a payment ID so wouldn't the transactions still be indistinguishable?

tevador commented 4 years ago

If we are going to change the mnemonic seed format, we should consider replacing the 256-bit Ed25519 scalar with a 128-bit seed (that can be passed to a KDF to get the private key). The elliptic curve provides only 128-bit level of security anyways, so there are not many reasons to have 256 bits of entropy in the mnemonic seed.

The electrum wordlist has 2048 words (11 bits/word), so we could have a 14-word seed and encode:

4-bit version
11-bit restore height
128-bit private key seed
11-bit checksum

11-bit restore height is good for 170 years with a 1-month resolution.

knaccc commented 4 years ago

The elliptic curve provides only 128-bit level of security anyways

@tevador I asked about the necessity of 256-bit keys here: https://bitcoin.stackexchange.com/questions/72612/bip32-recommends-a-256-bit-seed-why-do-most-bitcoin-wallets-only-use-a-128-bit

(see the comment by Pieter Wuille)

It's not as satisfying an answer as I would like, but it was enough to convince me to err on the side of 252 bit keys.

tevador commented 4 years ago

@knaccc I read the stackexchange comments and didn't find a convincing reason to use more than 128 bits of entropy. The Pollard's Rho algorithm can calculate the private key from the public key in O(√n) time and O(1) space. It's absolutely a real attack [1], not just theoretical as Pieter Wuille's comment suggests.

With a 128-bit seed, there would be 2 possible attacks, both requiring the order of 2¹²⁸ steps. The KDF could be chosen in such way that the constant for the seed search attack is larger than the constant for the Pollar's Rho search, thereby not having any negative impact on the security of the private key.

rbrunner7 commented 4 years ago

If we are going to change the mnemonic seed format, we should consider replacing the 256-bit Ed25519 scalar with a 128-bit seed (that can be passed to a KDF to get the private key).

Just to fully understand: Would this mean that you can't ask an "old" wallet for a "new" short seed? If yes, I would count this as a point against it because it could make the transition harder.

knaccc commented 4 years ago

@tevador Just to be clear: are you disagreeing with Pieter, and saying that the Pollard's Rho attack does NOT require significantly more memory/CPU resources than a straight 2^128 exhaustive search?

tobtoht commented 4 years ago

The electrum wordlist has 2048 words (11 bits/word), so we could have a 14-word seed

I would love to to see this. Questions that stem from confusion about the restore height make up a decent chunk of the Monero-related support work I do, even though I make an effort to explain what it is in the guides I write. I also often see complaints about the length of the current mnemonic seed, because it is cumbersome to write down and verify 25 words.

For comparison: a Monero seed took me 2m 05s to write down and verify. A 12 word Electrum seed only 40 seconds. For two reasons: obviously length, and the fact that the Electrum word list contains easier English words. Compare the two and see for yourself: monero, electrum.

A shorter seed with embedded restore height would help in reducing the number of support queries related to seeds and is a great improvement in UX.

tevador commented 4 years ago

@rbrunner7 That's correct. The only way to "convert" an old seed to the new one would be to send all funds to the new address. It's a disadvantage, but I think it's more than compensated by the big UX improvement of a much shorter seed. We would need to support the old style seed anyways.

@knaccc I don't think Pieter was referring to the Pollard's Rho attack. There are other ways to compute discrete logarithms which require more memory. Pollard's Rho has O(1) space complexity (see the paper I linked above).

My point was that by choosing an appropriate KDF, the 2¹²⁸ exhaustive search could be made more expensive than simply breaking the public key. In the Pollard's Rho algorithm, each step requires a point addition and a cheap hash function call, which together take under 1 ms. The KDF could be made to take e.g. 1 second per seed, which would make the exhaustive search on average 1000x more costly.

knaccc commented 4 years ago

@tevador Ah, interesting, I didn't realize you were suggesting key stretching. That sounds appealing.

Talking of Pieter Wuille, there is an interesting talk he gave about how CRC-32 is appropriate for single bit errors, but that Bose–Chaudhuri–Hocquenghem codes are appropriate for symbol errors.

tevador commented 4 years ago

CRC-32 is appropriate for single bit errors, but that Bose–Chaudhuri–Hocquenghem codes are appropriate for symbol errors

I've been also thinking about this. These codes only work in GF(qⁿ), where q is a prime, so they wouldn't work with the 1626-word list, but would work with the the 2048-word electrum list. We could then use the much simpler Reed-Solomon codes because our seed is shorter than the alphabet size (Bech32 couldn't use Reed-Solomon because it only works for up to 31 symbols when using base32).

There would be these two options:

1-word checksum (11-bits), allowing any 1-word error to be detected.
2-word checksum (22-bits), allowing any 2-word error to be detected and a 1-word error to be corrected (e.g. if one word is unreadable or omitted).

The second option is probably unnecessary.

rbrunner7 commented 4 years ago

The electrum wordlist has 2048 words (11 bits/word), so we could have a 14-word seed

For what it's worth our friends over at Ryo Currency switched to 14 word seeds about 2 years ago, see e.g. this announcement, "with a proper CRC-12 checksum".

I was not able to find the corresponding code for this on their GitHub to check whether they went beyond switching the number of words and a better checksum algorithm, like we discuss here with version and height info encoded as well.

tevador commented 4 years ago

@rbrunner7 Interesting coincidence. I checked the source code and they don't seem to encode a version or restore height in their 14-word seed.

This is how they use the 154 bits in their mnemonic seed:

128-bit private key seed
12-bit CRC
1 bit: long/short address flag
13 bits wasted

I'd call it suboptimal.

tobtoht commented 4 years ago

1 bit: long/short address flag

If we are going to switch to a different mnemonic seed format we should reserve one or two bits for this as well. I know this is a discussion for a different time, but Monero's long addresses are another major UX hurdle because of how poorly they are displayed in just about every user interface, especially on mobile and on websites. Ideally the address should fit on a mobile screen without linebreaks. Having spare bits available in the new seed format would decrease the friction to implement such scheme at a later point in time.

tevador commented 4 years ago

Proof of concept: https://github.com/tevador/monero-seed

rbrunner7 commented 4 years ago

Reddit discussion thread specifically for @tevador 's proof-of-concept.

Probably also notewhorthy that the earlier Reddit thread about the proposal to add a checksum in general got only few downvotes and few posts speaking out against changing the seed scheme.

SChernykh commented 4 years ago

@tevador I'll duplicate my reddit post here: are you sure hardware wallets can handle 256 MB Argon2id to restore the wallet?

tevador commented 4 years ago

I checked the specs of Ledger Nano S and it seems to have only 10 KiB of RAM. In that case, the KDF would have to be downgraded to something like PBKDF2 with around 1000-5000 iterations, which should be suitable for hardware wallets. I think 1000 hashes should already exceed the costs of an EC point addition, which should make the Pollard's Rho search easier than brute force attack on the seed.

rbrunner7 commented 4 years ago

I am a little unsure how hardware wallets would enter the picture here. Would this be for restoring directly on a hardware wallet using a Monero seed?

ghost commented 4 years ago

Can we just use bip-39 and add an extra 11-bit integer, a word, either before or after the checksum? I know it might look stupid, but will it not work? I know you need two keys, but the second can be derived from the first no?

It's way easier for others who try to work with your software, if you can keep it close to a standard.

Last time I tried to implement your seed conversion algorithm was a failure, just speaking from my own experience.

fluffypony commented 4 years ago

@fuwa0529 we already derive both keys from a seed.

knaccc commented 4 years ago

Note that if we switch to the electrum wordlist, then when entering a seed, we'll need to enter the first 4 characters of each word to uniquely identify it.

The existing Monero wordlist only requires 3 characters of each word to be entered.

I understand that this is unavoidable so that the wordlist can be a power of 2 for Reed Solomon. So this comment is just an observation that the number of characters that need to be entered to restore a wallet will only drop from 75 to 56.

rbrunner7 commented 4 years ago

The existing Monero wordlist only requires 3 characters of each word to be entered.

For English 3 characters are enough. The other "European" languages need 4 characters already now :)

TheCharlatan commented 4 years ago

@tevador nice proof of concept! To weigh in quickly on the KDF memory issue: The hardware wallets at the moment anyway don't make use of the monero entropy to key scheme. They derive their keys from within a BIP32 key that in return is derived from some BIP39 entropy. However even Scrypt is too memory intensive for hardware wallets, so using PBKDF2 with a high amount of rounds is probably the best solution on low memory devices currently.

The new monero seed system could make proper use of the version byte and use it for encoding the type of KDF used by enumerating for example Argon, Scrypt and PBKDF2. Then a future low memory device could still generate a valid monero seed.

SChernykh commented 4 years ago

I am a little unsure how hardware wallets would enter the picture here. Would this be for restoring directly on a hardware wallet using a Monero seed?

Both restoring wallet from a seed and generating new wallet (new random seed) require calculating private key from the seed. If it's 256MB Argon2id then hardware wallets will have problems.

knaccc commented 4 years ago

However even Scrypt is too memory intensive for hardware wallets

If it's true that Hs(128-bit seed) provides a security level that is equivalent to a 252-bit seed, then why not either skip the key stretching, or just do 1024 rounds of keccak to add 10 bits of key stretching, and leave it at that?

ghost commented 4 years ago

With keccak 256, you probably only need one round.

https://en.wikipedia.org/wiki/HMAC#Design_principles

tevador commented 4 years ago

FYI, I changed Argon2id to PBKDF2 with 4096 iterations in my PoC. Should be fine even for low-end hardware.

If it's true that Hs(128-bit seed) provides a security level that is equivalent to a 252-bit seed

They are equivalent except for a constant factor. I still recommend to use some key stretching since it's cheap and can make a bruteforce attack much slower.

TheCharlatan commented 4 years ago

Another feature that I would like to at least have discussed is ciphering the entropy similar to what aezeed does (though not necessarily using the aez cipher). This way a user can choose a single additional password that is required in combination with the words. If the user exposes the password he can safely change it and subsequently the words without changing the underlying entropy and keys. I am not sure if that is desirable though, since it is mutually exclusive with BIP39 style '25th word' seed phrases.

tevador commented 4 years ago

I made some improvements to my PoC:

embedded network type (mainnet/stagenet/testnet) to prevent accidental misuse of the seed on a different network
built-in way to make seeds incompatible between different coins, e.g. a seed for Aeon cannot be accidentally used to restore a Monero wallet; this can become important if other coins adopt the same seed format
3 reserved bits instead of 2
removed explicit version field (version is implicit from reserved values)

One of the reserved bits could be used in the future to support some form of seed encryption as suggested by @TheCharlatan.

https://github.com/tevador/monero-seed

rbrunner7 commented 4 years ago

Nailing the seed to a particular coin is done in a clever way, definitely an improvement.

version is implicit from reserved values

I am not sure how you mean that. If finally you will give meaning to all 3 of these now-reserved bits, they can all be either 0 or 1. How would you reliably read a version out of this? And I note that already now we have at least candidate uses for all 3 of those bits, thus probably not a good idea to start to use the first one as a version bit - and "loose" it that way - sometime in the near future.

Encoding the network type is certainly a good idea, but I am really not sure whether it's worth to sacrifice an explicit version for this. It will be probably mostly devs and experienced people that will deal with stagenet and testnet wallets which will also know what they do. And restoring a wallet with the same keys on a different network might come handy for testing purposes. (Not impossible of course if you build in some "override" mechanism when restoring the seed on another network, but still.)

tevador commented 4 years ago

Encoding the network type is certainly a good idea, but I am really not sure whether it's worth to sacrifice an explicit version for this. It will be probably mostly devs and experienced people that will deal with stagenet and testnet wallets which will also know what they do.

You may be right. In that case, we could mark all 5 bits as reserved.

I am not sure how you mean that. If finally you will give meaning to all 3 of these now-reserved bits, they can all be either 0 or 1. How would you reliably read a version out of this? And I note that already now we have at least candidate uses for all 3 of those bits, thus probably not a good idea to start to use the first one as a version bit - and "loose" it that way - sometime in the near future.

We have just 5 bits to use for versioning/features. That's 32 possible values. If we used e.g. 2 bits as an explicit "version" field, we would instantly waste 8 out of the 32 possible values because the 3 reserved bits have no meaning when version = 00.

A better option is to use all 5 bits as a "version" field and assign functionality to each of the 32 versions.

IMO, an even better solution is to forgo the explicit version field and use just 5 "reserved" bits. This can work under the following two conditions:

Reserved bits are required to be 0. The software should return an error otherwise.
When defining a new feature bit, 0 should be the previous behavior.

Example:

In the second version of the seed format, we decide to introduce 3 additional key derivation functions (KDFs), e.g. bcrypt, and 2 variants of Argon2. We can allocate 2 feature bits for this as follows:

00 = the original KDF (PBKDF2-HMAC-SHA256)
01 = bcrypt
10 = Argon2 variant 1
11 = Argon2 variant 2

The remaining 3 bits stay reserved and must be set to 000.

As you can see, this update doesn't cause any issues:

v1 wallet receiving a v2 seed with Argon2 KDF will return an error (as per condition 1 above) instead of restoring the wrong key
v1 seed will work in v2 wallets because 00 is the original KDF from the first version (as per condition 2 above)

rbrunner7 commented 4 years ago

IMO, an even better solution is to forgo the explicit version field and use just 5 "reserved" bits.

After reading and thinking a bit this seems the best approach to me. No version, not explicit nor implicit, just feature bits, and there are 5 of those because 5 bits are left and not more.

Those feature bits come of course first, so we could change the whole "layout" of the seed, e.g. by defining a feature bit where a value of 1 means a better checksum is present that needs a few bits more, say 13 instead of 11, and we would get the 2 additional bits needed for that from making the restore height more coarse, reducing it from 10 bits to 8 bits.

And about that network story: If it becomes a real problem we could still use one of the feature bits where a value of 1 means "not usable for mainnet" or "not a mainnet seed" to guard against the most problematic case.

Adreik commented 4 years ago

If introducing a new seed system anyway, why not also introduce a 49/50 word seed standard and have the private view key generated non-deterministically if using that seed type?

fluffypony commented 4 years ago

@Adreik there's no real benefit from that, it's not like you can practically crack the spend key if you have the view key. Plus you can always generate the two keys non-deterministically right now using the CLI wallet, and back them up however you want. Someone is welcome to write a Javascript mnemonic encoder for such a task for the 2 people that will use it.

knaccc commented 4 years ago

@tevador It occurred to me that it would be useful if when a user creates a couple of new wallet seeds, those new seeds are likely to have different first words. This will make them easier to distinguish from each other, and also has the side effect of not falsely creating the impression that Monero seeds are supposed to always start with a particular word.

knaccc commented 4 years ago

@tevador I've just created a javascript implementation of your code. The test file shows how to use it. Run node test/test to run the tests.

I'm unable to parse your example test mnemonic though, so there must be a small difference between our implementations. It could be because we are using different Reed Solomon implementations.

According to your implementations, are these supposed to be valid RS encodings? 0,0,0,0,0,0,0,0,0,0,0,0,0,0 and 0,1,2,3,4,5,6,7,8,9,10,11,12,52

A quick way to test the RS JS implementation I'm using is to run this code:

const reedSolomon = require('reedsolomon');
const reedSolomonEncoder = new reedSolomon.ReedSolomonEncoder(new reedSolomon.GenericGF(2053, 2048, 1));
const mnemonicWordsLen = 14;
const mnemonicErrorCorrectionWordsLen = 1;
function reedSolomonEncode(unflaggedDataInt32Array) {
  let r = unflaggedDataInt32Array.slice();
  reedSolomonEncoder.encode(r, mnemonicErrorCorrectionWordsLen);
  return r;
}
var a = new Uint8Array(mnemonicWordsLen);
console.log(reedSolomonEncode(a)+'');
for(let i=0; i<a.length-1; i++) a[i] = i;
console.log(reedSolomonEncode(a)+'');

When you're applying the coin flag, you're doing that to the second mnemonic word, right?

rbrunner7 commented 4 years ago

@tevador It occurred to me that it would be useful if when a user creates a couple of new wallet seeds, those new seeds are likely to have different first words.

You mean if a program gets an order to produce 10 seeds, has produced 9 already, and then goes on to produce the 10th one, it would throw away candidates that are too similar to any of the previous ones and keep generating randomly until it get a really "different" seed?

Right now I can't see how, for single seeds produced independently, you can do better than just using a good source of randomness and hope for different first words.

tevador commented 4 years ago

@knaccc

It occurred to me that it would be useful if when a user creates a couple of new wallet seeds, those new seeds are likely to have different first words

If you test my implementation, you will see that the first word changes for different seeds. That's because the first word is actually the checksum. The second word encodes the flags and the high bits of the wallet birthday, so it will stay mostly the same for seeds created in the same year.

My implementation orders the coefficients in ascending order, i.e. the constant term is first, then the linear term etc. This may also explain why you are getting different results. Try reversing the order of the words.

When you're applying the coin flag, you're doing that to the second mnemonic word, right?

Correct.

knaccc commented 4 years ago

Right now I can't see how, for single seeds produced independently, you can do better than just using a good source of randomness and hope for different first words.

I had incorrectly assumed that the seed data would start with the 00000 reserved bits, followed by the 0000000000 birthday bits. This would have meant that all seeds created in the same month would have always started with the same first word.

Now I see that tevador's implementation is ordering the checksum word such that the first word would in fact be evenly distributed.

knaccc commented 4 years ago

@tevador I'm not a C coder, so I'm struggling to test your RS library. Please could you tell me:

If I start with the data 0,1,2,3,4,5,6,7,8,9,10,11,12, then after it has been RS encoded, what should the array look like? Currently mine looks like 0,1,2,3,4,5,6,7,8,9,10,11,12,52.

I've tried altering my code so that it looks like either 52,0,1,2,3,4,5,6,7,8,9,10,11,12 or 52,12,11,10,9,8,7,6,5,4,3,2,1,0 but that does not seem to fix the issue.

tevador commented 4 years ago

I'm getting {358,0,1,2,3,4,5,6,7,8,9,10,11,12}. My generator polynomial is {2,1}.

Have you tested 12,11,10,9,8,7,6,5,4,3,2,1,0,358?

knaccc commented 4 years ago

@tevador I'm out of my depth here with understanding the Reed Solomon implementation enough to know what my 'generator polynomial' is.

According to https://www.mathworks.com/help/comm/ref/rsgenpoly.html the default primitive polynomial for a Galois Field GF(2^11) is D^11 + D^2 + 1, which has an 'integer representation' of 2053.

I'd initialized my RS encoder with GenericGF(2053, 2048, 1), meaning primitive=2053, size=2048, generatorBase=1. The size appears to affect the size of the expTable and logTable.

I'm not sure if I am passing the correct values to GenericGF(). Can you suggest values what I should be using please?

This is the implementation I'm using: https://github.com/cho45/reedsolomon.js/blob/master/reedsolomon.js

Note that the implementation goes into an endless loop if I attempt to encode using GenericGF(2053, 11, 2) or GenericGF(7, 11, 2).

tevador commented 4 years ago

2053 is the primitive of the Galois Field. I was talking about the generator polynomial of the RS code, which is initialized here in the javascript implementation.

I managed to reproduce my result with the following code:

var rs = require('./reedsolomon.js');

var encoder = new rs.ReedSolomonEncoder(new rs.GenericGF(2053, 2048, 1));

const messageLength = 14;
const dataLength = 13;
var message = new Int32Array(messageLength);
for (var i = 0; i < dataLength; i++) message[i] = dataLength - 1 - i;

console.log('original');
console.log(Array.prototype.join.call(message));

encoder.encode(message, messageLength - dataLength);

console.log('rs coded');
console.log(Array.prototype.join.call(message));

output:

original
12,11,10,9,8,7,6,5,4,3,2,1,0,0
rs coded
12,11,10,9,8,7,6,5,4,3,2,1,0,358

Note that the coefficients are reversed compared to my implementation, but the values match.

knaccc commented 4 years ago

@tevador Thanks, this has helped me ensure my RS encoding matches yours.

I have another problem: I'm using your test seed test park taste security oxygen decorate essence ridge ship fish vehicle dream fluid pattern.

The first data word is park, which is word index 1282. I then unflag this word by adding 0x539 mod 2048, which gives me 1282+1337-2048=571. In binary, 571 is 01000111011. This means the reserved bits are 01000... but the reserved bits should all be zero, right? Am I missing something? The bits following the first 5 reserved bits are 111011, which are the first bits of the correct quantized timestamp 1110111101, so it looks like I'm almost getting the correct result, but not quite.

In my code, I get the entire 143 bits of the data (after unflagging) as: 01000111011110111100011100001011010011110010001110010000100110101010111001100110001100000101011110111110001111010000101000101100110110100001010 or 23bde385a791c84d5733182bdf1e85166d0a in hex.

Does that look right to you?

monero-project / monero

[PROPOSAL] Encode restore height as 26th word of the mnemonic seed #6639