Option to show pattern info when generating a token

mk-fg commented 1 year ago

Hi,

When manually generating a token according to some selected pattern in the shell, I find that it's tempting to try tweaking the pattern to see how it'd change from e.g. adding one more word or removing it. Problem though, is that z-tokens does not display such information about the pattern when generating the token.

Maybe adding some -v/--verbose option to "generate" command would help with that?

I.e. so that manual tinkering like this:

% z-tokens g -p eff-short:6
stop twins disco slain spool puma
% z-tokens g -p eff-short:8
elude elope diner hazel swirl tiara straw shirt
% z-tokens g -p eff-short:4
tint graph sepia salon

Can look something like this instead:

% z-tokens g -vp eff-short:6
stop twins disco slain spool puma
[ pattern=eff-short:6 length=33c entropy=62.0b ]
% z-tokens g -vp eff-short:8
elude elope diner hazel swirl tiara straw shirt
[ pattern=eff-short:8 length=44c entropy=82.7b ]
% z-tokens g -vp eff-short:4
tint graph sepia salon
[ pattern=eff-short:4 length=23c entropy=41.3b ]

Don't think I'll make a PR for it myself, unfortunately. Thanks.

cipriancraciun commented 1 year ago

This is an interesting feature request, and one that is easy to implement.

I'll have to think a bit about what information to include, because besides the length and entropy, I could also say if it contains letters/numbers/specials/etc., but also what it's suitable for (authentication/short-term-encryption/archival/etc.).

About the tweaking, note that the patterns subcommand allows you to list patterns that fit certain characteristics / use-cases, like for example: z-tokens patterns --for-authentication --entropy-min 64 --shortest will show you all the patterns (with an usable example) that have at least 64 bits of entropy and are recommended for online-authentication.

Also note that the z-tokens g subcommand accepts directly the pattern as z-tokens g eff-short:6 (i.e. without the -p), and any other options you might want can be put afterwards (like for example z-tokens g eff-short:6 -c 10). (The generate subcommand does require the -p flag.)

mk-fg commented 1 year ago

About the tweaking, note that the patterns subcommand allows you to list patterns that fit certain characteristics / use-cases

Yeah, it might be just me, but once you pick a pattern and generate a auth-token, it's only then you wonder how making it shorter/longer will impact entropy estimation and overall ease of typing.

Opening "patterns" again for that is definitely an option, but changing number in last command seem to be a lot easier with a typical shell/terminal interface, at least for me, and first thing I reached out was adding -h to see if there was some kind of verbosity toggle listed at the top or bottom of option list.

all the patterns (with an usable example)

It wasn't immediately obvious for me that it is a usable example, and not e.g. a hardcoded one. Though I'd probably still end up going to generator option to make a couple patterns, to pick one with most interesting/suitable words for the use-case (which likely lowers entropy in practice - ideally you'd not add bias like that).

z-tokens patterns --for-authentication --entropy-min 64 --shortest

I'd reached for z-tokens p --for-authentication | g eff-short in this case, after tweaking parameter a couple times in the command and checking for option to print extra info there, which isn't difficult to do either, though feels like it shouldn't be necessary to run different command for this, and bundled all-in-one output would be simplier and less error-prone too.

But again, might be just me thinking this particular way.

(The generate subcommand does require the -p flag.)

Yeah, I noticed that, iirc from some example usage, but then when noticed that option exists, started adding it, with the logic going something like this: "idk how argument works, it doesn't seem to be documented, and option is there, so better use it, in case argument does something similar but different".

cipriancraciun commented 10 months ago

OK, after a long time, I've implemented this like so:

>> z-tokens g cvs:4 --describe

**  ~~~~~~~~  cvs-lower:4
\_  aliases:  cvs:4
\_  labels:   cvs-lower cvs cv letters password pronounceable memorable
\_  bits:     50.5754
\_  length:   16  (with spaces)
\_  length:   16  (without spaces)
\_  characters:
    \_  letters:  16
    \_  l. upper: 0
    \_  l. lower: 16
    \_  digits:   0
    \_  symbols:  0
    \_  no space: 16
\_  usable for:
    \_  cryptography         !! NO !!      with    -77.42  bits of margin
    \_  authentication          OK         with    +18.58  bits of margin
    \_  archival storage     !! NO !!      with    -49.72  bits of margin
    \_  long term storage    !! NO !!      with    -26.26  bits of margin
    \_  short term storage   !! NO !!      with    -19.42  bits of margin
\_  bruteforce time:
    \_  MD4                     --         now
    \_  MD5                     --    0.1  seconds
    \_  SHA1                    --    0.3  seconds
    \_  SHA2-256                --    0.8  seconds
    \_  SHA3-256                --    3.3  seconds
    \_  PBKDF2-HMAC-MD5         --    6.1  minutes
    \_  PBKDF2-HMAC-SHA1        --   14.6  minutes
    \_  PBKDF2-HMAC-SHA256      --   31.5  minutes
    \_  PBKDF2-HMAC-SHA512      --    1.5  hours
    \_  scrypt                  --   27.2  days
    \_  GPG                     --   10.4  minutes
    \_  AES-128                 --    0.8  seconds

fonuhitocejidojo

If one wants less verbose output, the z-tokens p ... command has more fine-grained options.

@mk-fg is this output suitable for your use-case?

mk-fg commented 10 months ago

Yes, it looks awesome, thanks!

That's indeed what I'd always want to use manually, with couple patters that I remember and/or have in shell history, to remind me what those were for, if nothing else.

Maybe also worth adding actual target values to "usable for" categories like "cryptography", so that one somewhat familiar with those, can sanity-check if they're up-to-date and up to their definitions of these use-cases at a glance, for example:

\_  usable for:
    \_  cryptography (128b)  !! NO !!      with    -77.42  bits of margin
    \_  authentication (32b)    OK         with    +18.58  bits of margin
...

I'd also worry about "bruteforce time" section having not enough context:

Which year is it roughly according to.
Which hardware/tool is used? - single Nvidia 4080 GPU from 2023? An ex-crypto-mining room full of those? Some ASIC?
How many iterations for *KDF algos, scrypt parameters, etc.

And afaik bruteforcing credentials is a bit of a cottage-industry craft too, with other less obvious variables in there, but idk, also easy to ignore of course, just a bit worrying that it might be quite misleading, especially couple years down the line.

EDIT: it's not "no context" as I initially wrote, somehow forgetting the actual output while writing the comment, but I think still not quite enough of it to understand those values for me.

mk-fg commented 10 months ago

Also, a note on the list of bruteforce times for common password-storage algorithms above:

Afaik a bunch of Linux distros settled on "yescrypt" algo for storing passwords in /etc/shadow - https://www.openwall.com/yescrypt/ (as implemented in libxcrypt used by Linux PAM, scrypt-like, with 1-11 cost factor, 5 being default in pam_unix).
Linux go-to full-disk-encryption tools use argon2id KDF in LUKS2 device headers (via cryptsetup/libcryptsetup).

Default parameters in Arch today seem to be "Iteration time: 2000, Memory required: 1048576kB, Parallel threads: 4" - which on somewhat quaint Ryzen 5600X translates to 10 iterations (cryptsetup benchmark --pbkdf argon2id).
OpenSSH keyfile-encryption uses bcrypt (16 rounds default in OpenSSH_9.6p1).

Maybe those are also worth adding there? They seem to be practically useful to know about, albeit in a linux-specific use-cases.

Though guess at some point maybe such list of algos (esp. with diff parameters) would be way too long to display, not sure if it kinda is already. In that case, maybe it'd be useful to have an idea behind the list, i.e. something like "likely worst-case $1M-hardware scenario in [year-this-was-last-updated]" as (clearly stated) context for time values, and "top-3 algorithms most commonly used for each of [X], [Y] and [Z]" (e.g. FDE, OS password storage, web pw storage, etc) for picking algos on the list, to limit the latter from growing indefinitely in multiple pareto-dimensions.

cipriancraciun commented 10 months ago

Maybe also worth adding actual target values to "usable for" categories like "cryptography", so that one somewhat familiar with those, can sanity-check if they're up-to-date and up to their definitions of these use-cases at a glance, for example:
\_  usable for:
    \_  cryptography (128b)  !! NO !!      with    -77.42  bits of margin
    \_  authentication (32b)    OK         with    +18.58  bits of margin
...

It's a good idea, I'll try to tackle these when I have some more time. (BTW, one could deduce at the moment the target bits, by adding the reported entropy bits and the remaining bits of margin.)

With regard to bruteforce, I've started a discussion thread at #24.

volution / z-tokens

Option to show pattern info when generating a token #18