[feature request] Deterministic mode

mrdomino commented 1 month ago

This is a really cool project. I really like the regex-like language for specifying password schemas.

The main use case I have for a tool like this is as a deterministic password generator (like lesspass or passacre, or see also ssh-keydgen.) Basically I want to be able to use a master password to generate all of my site-specific passwords, so I only need to remember the one master password and don't need to save anything except the schemas, which can be public.

The general idea is to use (the moral equivalent of) bcrypt(master-password + site-url + reset-counter) as an entropy seed, feed that into (the moral equivalent of) a Keccak sponge, and take bits out of that instead of using /dev/urandom or what-have-you. site-url can optionally contain a username, and reset-counter gets incremented every time you need to change a password for one site.

If this is of interest / not undesirable, then I might be able to take a crack at implementing it.

xfbs commented 1 month ago

Hey @mrdomino, I appreciate your feedback!

I can give you some pointers for this, if you want to take a crack at it.

Currently, the source of randomness is already abstracted away. We have that passgen_random struct in include/passgen/util/random.h:

/// Randomness source.
typedef struct {
    /// Ring buffer to hold random data in.
    uint8_t buffer[PASSGEN_RANDOM_BUFFER_LENGTH];

    /// Current position in the ring buffer.
    size_t pos;

    /// Device to read random data from.
    void *data;

    /// Function used to read more random data.
    passgen_random_read_func *read;

    /// Function used to close randomness source.
    passgen_random_close_func *close;
} passgen_random;

The closest thing to this at the moment would be to use the (very insecure) xorshift PRNG, which you can give a seed for deterministic password output:

$ passgen -r xorshift:1234 -p apple1
T2y-cwY-yWq-1gS

This means that if you use the same seed, you always get the same password back (at least, unless there are major changes in passgen). However, I mainly implemented this PRNG for deterministic unit tests, it should not really be used to generate strong passwords. What you could do is implement your own source of randomness, which takes as input the master-password, sire-url and reset-counter, and inside that implement the scheme you have proposed (bcrypt + keccac).

You would do this in src/util/random.c. Basically, the way these randomness utilities work is by just filling a buffer with random data and passgen will pull it out, and ask the implementation to refill the buffer when needed. It should be fairly straightforward, all you need to impement is the initial seeding logic and a passgen_random_read_func.

You'd probably also want to think about how to activate this. Currently, there is a command-line argument -r to set the source of randomness. If you add a parser to passgen_random_open_parse for your scheme, you could have it enabled like this:

$ passgen -r master:mymasterpass:github.com:0 "..."

However, this might be somewhat awkward to use. It should also be possible to make this more of a first-class feature and add custom command-line arguments for it, so that you can enable it as such:

$ passgen -m mymasterpassword -d github.com -r 0 "..."

To be honest, I think this would be a neat feature and I think it would make sense to have first-class support for it.

mrdomino commented 1 month ago

Great! Exciting that you're interested.

I would probably not want to support giving the master password on the command line, since that makes it too easy for it to get into a shell history file. Not having given too much thought to it yet, something like psql’s -p flag (where you pass the flag without argument and it prompts for a password on stdin) might be appropriate. Then a flag for site URL (if not passed, that is also prompted for), username (unset if not passed), and reset counter (0 if not passed.) Also maybe a flag to prompt for confirmation, though you can mostly get the same result by just running the command twice and seeing if the output matches. Possibly also store those things in a dotfile (I haven't looked yet at what passgen does on that front.)

The point of bcrypt is basically just a work factor to prevent people brute-forcing the master password from enough site-specific passwords; what passacre does is it just feeds 10mb or so of /dev/zero into the Keccak sponge (Keccak being the basis of SHA-3.) If we want to minimize dependencies and if that provides acceptable security, then that's probably the thing to do; otherwise I might look into using argon2d.

xfbs commented 1 month ago

Makes sense! One thing that might also make sense is storing the (bcrypt/argon2d)-hashed version of the master passphrase in some config file. When you use passgen, it'll prompt you for it:

passgen -m github.com "..."
Master passphrase: *************

All that does it verify that you spelled the master passphrase correctly, to make sure you don't accidentally have a typo in it (and thereby generate a password you cannot recover).

Besides that, I think the next steps to get this rolling are:

Implement custom master-password based randomness source. For that, figure out if there is some simple implementations of bcrypt/keccac/argon2d that can be vendored, or if this should be included as git submodules or CMake external dependencies (perhaps with a feature flag)
Implement the command-line arguments to make it usable
Add some unit tests to verify that it works as expected (output should be stable).

mrdomino commented 1 month ago

Makes sense! One thing that might also make sense is storing the (bcrypt/argon2d)-hashed version of the master passphrase in some config file.

The one wrinkle with that is the salt parameter. You would be committing to always being able to retrieve that, on pain of losing all of your passwords. But that might actually be fine in a lot of cases since you're presumably already storing your list of per-site schemas somewhere.

Yeah I agree re next steps. My inclination is to try to vendor everything as much as possible, but I'll see what makes sense. I will get on that.

mrdomino commented 1 month ago

Oh wait, no, I see what you're saying. Store the hash with salt, but take it on the command line to fill the sponge with. Yep, great idea. I like it. :)

mrdomino commented 1 month ago

How would you feel about passgen incurring a dependency on openssl? It seems like openssl has both Keccak and Argon2 these days, so it would be fairly expedient from that perspective relative to trying to depend on separate Keccak and Argon2 libraries. But it's openssl...

(openssl is also already packaged for cosmopolitan libc, which is congruent with an ulterior motive that I have to see a cosmopolitan build of passgen.)

xfbs commented 1 month ago

I would also like to see a cosmopolitan build of passgen! I think that is a great idea.

As for openssl, my current aim is to try to not have any dependencies, for portability. I like having statically-linked binaries. But this is not a hard requirement, I’m happy to depend on it if it gives a benefit. Perhaps that should be an optional feature, so that it is still possible to build static binaries.

One thing I have done in the past is vendor crypto things from PolarSSL/mbedTLS. Their implementations are fairly easy to pull out, are permissively licensed. You don’t get the most performance, but you get good portability. If we only need a few algorithms (keccak, argon2) then that might be an idea. It does create a bit of maintenance (having to update it). Passgen is not vulnerable to any timing side-channel attacks, so we don’t have to be too worried, only that the implementation is correct and reproducible.

If you want to play with it, I’m happy to accept patches either way. I think new features that are useful are always welcome 😊

On Fri 17. May 2024 at 02:38, Jōshin @.***> wrote:

How would you feel about passgen incurring a dependency on openssl? It seems like openssl has both Keccak and Argon2 these days, so it would be fairly expedient from that perspective relative to trying to depend on separate Keccak and Argon2 libraries. But it's openssl...

(openssl is also already packaged for cosmopolitan libc, which is congruent with an ulterior motive that I have to see a cosmopolitan build of passgen.)

— Reply to this email directly, view it on GitHub https://github.com/xfbs/passgen/issues/3#issuecomment-2116428882, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAF775AO5MUU7HBH6J36QADZCVGOXAVCNFSM6AAAAABHSLWAJ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJWGQZDQOBYGI . You are receiving this because you commented.Message ID: @.***>

mrdomino commented 1 month ago

Yeah, looks like mbedTLS has Keccak in it, though I can't tell yet how accessible the raw sponge is. (The extendable-output functions SHAKE128 and SHAKE256 are apparently still sitting in an open PR.)

I would be fine with doing it that way, cosmopolitan libc actually seems to vendor an older version of mbedtls already fwiw.

mrdomino commented 1 month ago

All right, I have a hokey minimal toy example that implements all of the relevant cryptographic primitives and nothing else:

https://github.com/mrdomino/cosmopolitan/blob/pw/examples/pw.c

Notably that's all just out of base cosmopolitan libc, with no external dependencies.

Apparently BLAKE2 is a better alternative to Keccak; BLAKE2X is an XOF constructed out of it, that is similar but not identical to what that file does. (That file uses an argon2 hash instead of a BLAKE2 hash as H₀, and uses 32- rather than 64-byte digests because BLAKE2B256 is what cosmo provides.)

xfbs commented 1 month ago

I've started playing around a little bit. My idea was basically use this as a source of randomness (it is a stream cipher):

XChaCha20(key = Argon2id(password), iv = Blake2b(site || counter))

I'm not sure what you think of this, my thinking was basically that I like XChaCha20 as a stream cipher, and it was easy to make a randomness source out of it :D

So far, I've implemented Salsa20 (the predecessor of XChaCha20) as a randomness source for Passgen. I used DJB's public-domain implementation for it. I've also found an implementation of Argon2id that is similarly licensed. I am not too worried about performance or side-channel attacks, so it does not need the most optimized implementation, I think.

Next steps would be:

Implement raw XChaCha20 as randomness source
Get Argon2id + BLAKE2 hash functions
Implement the randomness source (using masterpass + domain + counter as inputs)
Implement command-line arguments for it
Write some tests for it to make sure the output is fixed

Do you think this scheme is reasonable? I thought it was more "simple" than (ab)using keccak as a stream cipher (if I understoof that correctly).

mrdomino commented 1 month ago

I'm not sure what you think of this, my thinking was basically that I like XChaCha20 as a stream cipher, and it was easy to make a randomness source out of it :D

Yep, totally. Any good stream cipher with a known key ought to do as a source of pseudorandom bits.

XChaCha20(key = Argon2id(password), iv = Blake2b(site || counter))

Why argon2id rather than just argon2d? It doesn't seem that there's any need to concern ourselves with side channel attacks in this case, as you say.

I'm not all that smart about what to stick where as far as IVs and salts and all go; my thinking was to just use everything relevant to the site as the "password" input to argon2d, maybe even including the schema.

I guess the main question is just what needs to be covered by the work factor in order to resist brute-force attacks, and whether the password with a salt of zero is good enough. I think it is - if you're going to use a password that could show up in a rainbow table with this scheme, then you're already doing it very wrong. But if we want to try to save those people from themselves, then including the URL, reset counter, and schema alongside the master password as a pseudo-salt is a marginal step up from salt of zero.

I am not too worried about performance or side-channel attacks, so it does not need the most optimized implementation, I think.

I agree. Fwiw the cosmopolitan implementations of argon2 and blake2 are basically optimized versions of the references, and are licensed permissively; you would just want to include the copyright notice in any distributions somehow. (With cosmopolitan this is already handled for you - like it sticks the copyright notices into the output binary for you, which few other libraries do.)

Do you think this scheme is reasonable? I thought it was more "simple" than (ab)using keccak as a stream cipher (if I understoof that correctly).

Yeah it seems good to me. The blake2 XOF is basically just CTR mode for a block cipher, and CTR mode is basically just a way to turn a block cipher into a stream cipher at the end of the day.

I agree re next steps.

mrdomino commented 1 month ago

Ok well, I just started a new job - while you might think that would impinge on my ability to do this patch, what it actually means is that I now have a use case (generating my corp password) for this, so I will probably be putting some more effort into it.

xfbs commented 1 month ago

Just today I made some progress on this. I might be able to make an MR later today. I found a very lightweight crypto library that is permissively licensed, it gives us blake2, argon2 and xchacha20. Should be relatively straightforward to implement this mode with these primitives. I will keep you updated.

Cheers, Patrick

mrdomino commented 1 month ago

Awesome! Eager to see it.

xfbs commented 1 month ago

I just pushed some stuff, I've added the Monocypher library and using that, I implemented the passgen_random_chacha20_argon2 randomness source. You can check it out here: https://github.com/xfbs/passgen/blob/master/src/random/deterministic.c#L117

You can also use it. I don't yet have it nicely exported, but this works:

passgen -r chacha20-argon2:yourmasterpass:example.com:1 -p apple2
DLlKDC-OaA5X8-nxPT40

It works, but I'm not fully confident in the code yet, I think I need to write some more tests to make sure that everything is sane. There are some obvious stack overflows in there. But I think it achieves what we want, and it was not even that much code to write.

There are three parameters that chacha20-argon2 takes:

master passphrase
domain
token

Both domain and token are optional (actually, all three are optional, but if you don't supply a master passphrase, it is a bit useless). That means you can use the token as a counter, or you can use it to disambiguate between multiple logins. For example:

passgen -r chacha20-argon2:mymasterpass:example.com:user1 -p apple2
zkCTUp-UFVBvL-2Uulj3

passgen -r chacha20-argon2:mymasterpass:example.com:user2 -p apple2
p9Rz1Q-sWMFAO-PvRncl

Try it and see if this makes sense! If you think this implementation is good enough, the next step would be to implement some command-line switches to make it more usable.

xfbs / passgen

[feature request] Deterministic mode #3