Allow recovering stored passwords

palant commented 6 years ago

We should support paper backups for stored passwords as well. All Passwords page should list a "recovery code" for stored passwords. This recovery code should contain the following data: salt to derive encryption key from master password, IV, encrypted password (around 40 bytes of data). Encryption should still depend on password name and site, so that the recovery code shouldn't be usable elsewhere. It should be presented in some way that is somewhat easy for humans to type in. When users click "Add stored password" they should have the option to enter a recovery code.

palant commented 6 years ago

RFC1751 is one solution to this problem. Its downsides are low efficiency (40 bytes would be encoded as 120 characters) and merely two parity bits (it's meant to encode merely 8 bytes of data, so this could be addressed by encoding each 8 bytes chunk separately). Also, English words might not be a good match for people who don't have English as their native language.

palant commented 6 years ago

Never mind, there is https://www.google.ch/patents/US5892470 - a patent on the very thing that RFC1751 is about. And even if the RFC is clearly older than the patent, using this approach in a project is too risky even if it were a good match. Unfortunately, I cannot find any other standards that would apply here, meaning most likely that we have to roll our own.

ghost commented 6 years ago

Do you mean legacy passwords? I think they can be simply printed in cleartext. It's not easypasswords main role to handle them especially when it involves some voodoo magic to keep it secure.

Anyway backups should be kept in safe place, not next to your monitor.

palant commented 6 years ago

There has been a terminology change, "legacy passwords" are now just "stored passwords." And we might get into situations where there are lots of them, e.g. on a change of password generation algorithm (what we've done just now, once the transitional period is over) or if users change their master password (still to be implemented). Even if not, there are always some that cannot be avoided (I have two, used to be more). The point of this issue is creating a safe paper backup for stored passwords as well. Nothing wrong with a safe place, but a paper backup is very hard to corrupt.

ghost commented 6 years ago

I still think it's better to print them in clear. Paper backups are for emergency. Printing long strings of random bytes which then you have to type somewhere isn't much user friendly.

palant commented 6 years ago

Yes, paper backups are for an emergency - like your computer with all its data being unavailable for some reason (exploded and took all the backups with it). Having a piece of paper around that would help you out in that kind of emergency isn't a bad thing, having passwords printed on it in the clear on the other hand is an option we support, but it definitely isn't everybody's thing.

ghost commented 6 years ago

Yeah but we're talking only about legacy passwords which aren't created with easypasswords algorithm. I'm not advocating for always printing normal passwords because recovering them is obvious and easy.

In case of non-deterministic password managers like keepass,pass,1password etc. the only way of storing paper backups is to print passwords in clear. So easypasswords can easily give you at least same security in case of legacy passwords and much more in case of normal passwords.

Ability to print recoverable data instead of legacy passwords in clear will be very unique and superior feature for easypasswords but as it's technically challenging and unavailable in alternative managers I doubt this should be near the top priorities.

palant commented 6 years ago

It isn't challenging at all, we have all the mechanisms necessary already. The only somewhat tricky part is making sure that entering 40 bytes of binary data is doable for a human. Which in particular means splitting it up in smaller blocks and checksumming each of them so that the program can indicate typos early in the process. Still not going to be a nice and easy process but it shouldn't be something that needs to be done often.

As to the actual encoding, Base58 is a binary-to-text encoding that is meant to be good enough for human entry. The efficiency is slightly lower than Base64, but maybe avoiding non-alphanumeric characters is worth it (typing letters is generally easier). Splitting the data into small blocks, adding checksums and presenting it nicely (groups of four letters I guess) should be doable.

Another option is Base32 which is easier to implement and isn't case-sensitive, but is even less compact.

For reference, 40 bytes are 54 characters in Base64, around 55 characters in Base32 and 64 characters in Base32. RFC1751 isn't meant to be compact and would require around 120 characters.

palant commented 6 years ago

I am strongly leaning towards Base32, not having to press Shift while entering this data seems to be more important than the shorter representation. Wikipedia lists several Base32 variants:

RFC 4648: avoids using digits 0, 1 and 8 due to similarities with letters O, I and B. These letters still exist in the alphabet however, so people might still mistake O for 0 because they don't know that 0 isn't used. Automatically changing any 0 into O on entry is possible but the confusion is still suboptimal.
z-base-32: is meant to be transcribed by hand, which is why it avoids using not merely 0 and l, but also v and 2. The feature isn't really relevant for us, the data will always be printed. It also reorders the alphabet so that the last letter is easier to type - not really relevant either if we have 64 or 80 letters in total.
Crockford's Base32: excludes letters I, L and O due to similarities and U to avoid obscenities (IMHO not too successful, people who want to see obscenities don't need vowels). Like with RFC 4648, the similar digits remain in the alphabet so that the confusion remains. It also proposes adding a single checksum symbol to the data, but the checksum symbol might be one of four punctuation characters - that somewhat defeats the purpose of Base32.

It seems that we need to use our own alphabet. We'll print recovery codes in upper-case letters because these are easier to read. So we need to exclude the letters I and O as well as the digits 1 and 0: ABCDEFGHJKLMNPQRSTUVWXYZ23456789."

I miscalculated the size of the raw data originally. What we really have is:

1 byte to encode a version number (users won't need to indicate it manually if we change the format later)
16 bytes salt to derive encryption key
12 bytes initialization vector
16 bytes AES-GCM authentication tag
typically 8 bytes for the actual encrypted password

That's 53 bytes. We can split these into blocks of at most 9 bytes. Add one checksum byte per block and we get 10 bytes that can be encoded as 16 Base32 characters - four groups of four characters, one line. It seems that we will typically have six such lines (yes, that's 96 characters which is quite a lot).

As to the checksum, we don't want to use CRC because it is geared towards bit flipping due to transmission errors. We don't want a check digit either because users won't be entering binary data directly but rather a Base32 encoding of it - hard to introduce transposition errors that way. So a non-cryptographic hash function seems to be the best choice here, Pearson hashing will do. A permutation table for it is easily generated deterministically (I chose two small prime numbers randomly from the list):

for (let i = 0; i < permutations.length; i++)
    permutations[i] = ((i + 379) * 467) & 0xFF;

The checksum shouldn't be calculated on the last 9 bytes only but rather on all the data preceding it, this will allow catching transposed rows as well.

palant commented 6 years ago

I tested the visuals and it seems that increasing block size from 9 bytes to 14 bytes looks better. We get four rows with 24 characters this way. For comparison:

6 rows with 16 characters 4 rows with 24 characters

We have fewer checksum bytes here meaning that up to 56 bytes will fit into these four rows (was 54 bytes with 16 charachters per row and six rows). I'll test recovery code entry once I have everything together and make a final decision then.

palant commented 6 years ago

One aspect that I initially didn't recognize as important: encoding the data as proposed would leak information about the password length. I solved this by zero-padding the password before encrypting so that the last row is always full. As a result, any passwords with 11 or less characters will result in the same amount of data, and passwords with 12 to 25 characters won't be distinguishable either.

recovery codes screenshot

palant / pfp

Allow recovering stored passwords #64