Option to sort all patterns by entropy

mk-fg commented 1 year ago

Hi,

z-tokens patterns tends to produce a long list of different tokens, and as a user, my goal is to pick pareto-optimal token by multiple criterias in that table. Namely, something that will be sufficiently readable, not too long/hard to type, not too confusing-looking, and with high-enough entropy.

These goals are in conflict, so --entropy-min/--entropy-max or similar options are not very useful, since regardless of what I decide to pick as --entropy-min, it's quite possible that none of the options presented there will meet other criterias.

So ideally, I'd like to have a list, sorted by non-subjective criteria, like "entropy", where e.g. all options around 80-bits are all right next to each other, so that I can easily visually compare subjective parameters like length/readability/visual-confusion/etc between those, to pick the best combination.

But at least with current z-tokens 0.3.0 release, there is no option to do that - patterns are presented in alphabetical order, which is not useful for the purpose above. It is of course possible to sort the list using e.g. unix "sort" tool, but it's a complicated table with many fields, so I think maybe some simple option like z-tokens patterns --sort-by entropy would be a nice addition?

Don't think I'll make a PR for it myself, unfortunately. Thanks.

mk-fg commented 1 year ago

Current workaround that also shows the output that I meant above, using sort from coreutils: z-tokens patterns --for-authentication | sort -nk 4,7 | less

cipriancraciun commented 1 year ago

OK, sorting by entropy shouldn't be hard, and in fact I could add other sorting options. Any suggestions? I'm thinking at least:

by entropy;
by length (in characters);
??

Also, please note that peach pattern has a few "labels" attached, which are shown with z-tokens patterns --show-all and search for the "labels" line.

Then by using --label password or --label passphrase or --label memorable you can select to show only those patterns that match the given label. (Which can then be combined with --for-authentication, etc.)

For example:

>> z-tokens patterns --for-authentication --label memorable --shortest

::  cv-lower:3              :   40.2 b :   14 c ::    liho zipe wemu
::  cv-plus-a:3             :   40.1 b :   14 c ::    bamu zuqo 0166
::  cv-plus-b:3             :   40.2 b :   14 c ::    xila xupe HB79
::  cv-plus-c:3             :   41.8 b :   14 c ::    wage hali HO1#
::  proquint-lower:2        :   32  =b :   11 c ::    dutum bopos
::  koremutake-a:3          :   42  =b :   14 c ::    deni mune biga
::  koremutake-b:2          :   42  =b :   15 c ::    rafrygy kybruna
::  mnemonic:3              :   32.0 b :   19 c ::    saint marvin choice
::  bip0039:3               :   33  =b :   19 c ::    minimum farm praise
::  skey:3                  :   33  =b :   13 c ::    nip want name
::  pgp:2                   :   32  =b :   35 c ::    flytrap hamburger glucose informant
::  eff-large:3             :   38.7 b :   27 c ::    hatchery regulate willpower
::  eff-short:4             :   41.3 b :   18 c ::    tray tug card etch
::  eff-unique:4            :   41.3 b :   30 c ::    dagger utmost suave coffeecake

mk-fg commented 1 year ago

other sorting options. Any suggestions?

Some kind of "human readability criteria" maybe?

It's more complicated than simple entropy/length, and definitely subjective, but I'd probably add metadata tags with fixed subjective-weight to each existing pattern, like these:

language-words=10 words=5 groups=3
letters-only=3 letters=2 no-similar-glyphs=4
mixed-case=-1 numbers=-1 symbols=-3

And then sort by sum of those for that readability metric.

("words" as in human-language-like consonant-vowel intermingling, "no-similar-glyphs" as in no "0"/"O", "l"/"I", and other easy-to-mistake-for-something-else characters in the pattern, like in Crock's Base32)

I think when generating password for manual use (and not password manager), it might be way more useful than entropy ordering, because you basically don't care about any patterns that'd be less than halfway down the list, as words and cons-vowel patters have an overwhelming advantage for this use-case.

mk-fg commented 1 year ago

please note that peach pattern has a few "labels" attached, which are shown with z-tokens patterns --show-all and search for the "labels" line.

Oh, that already does same thing as with the "tags" idea described above. Sorting by those with some ad-hoc weights will already do the trick, and I think would be easier for a general "I don't know/remember all the labels, just show me readable stuff" use-case.

I'd probably have "memorable" sort-criteria separate from "readable", making length to also be a major ordering factor there, while leaving "readable" to be "label weights, then pattern" (with diff-length patters with similar readability grouped together).

mk-fg commented 1 year ago

I think general difficulty for me with the current interface is that it seem to be designed for precision, where you have to know and specify most stuff in advance (labels, entropy and length limits, etc).

Which is great for automated use, and that's probably the main goal behind the tool, but when using it manually, it's not very useful to learn the names and precise values of all this stuff - just show me examples in roughly right order, and I'll figure it all out visually within seconds!

cipriancraciun commented 1 year ago

Indeed, the main problem of the password generator tool is that one has to know in advance what he wants. (That's why I've implemented so many patterns, that from a security point of view are completely equivalent, given the same entropy in bits.)

However, as you've said, any ordering or labeling is highly subjective.

With regard to computing the "weighted metric" you've proposed I think is highly problematic; not that it's a technical problem, but that when implemented I think the order would be quite wrong and random looking. I can't say this now without trying (I'll probably do so in the weekend), but I'm basing this on the experience with fzf and the way it evaluates the score for each find; most of the times I find it completely counter-intuitive, but regardless how hard I try, I can't seem to find a proper algorithm that gets everything right...

Regarding the actual score computation, there are two options to merge the weights:

summing, as you seem to have suggested, however I don't think it's easy to properly scale the weights to get a proper order;
multiplying, which makes more sense from a "theoretical" point of view, but which I also bet it doesn't fare well with regard to ordering;

Perhaps a better approach would be an interactive system that lets one browse or fine-tune what he needs?

mk-fg commented 1 year ago

not that it's a technical problem, but that when implemented I think the order would be quite wrong and random looking

Yeah, but imo it's perfectly fine for a visual table, as long as unreadable base64 gobbledygook ends up after word-separated consonant-vowel patterns. Ordering between roughly similar stuff might as well be entirely random.

Dunno if fine-tuning or overthinking it in other ways is worth bothering with, when the idea is to mostly "let the user sort it out visually, without having to code their own sorting algorithm, knowing all the parameters up-front". Again, imo some really basic ordering that makes sense to anyone (even if not everyone) will work for such purpose, be it with addition or multiplication :)

volution / z-tokens

Option to sort all patterns by entropy #17