nbutton23 / zxcvbn-go

zxcvbn password complexity algorithm in golang
MIT License
379 stars 49 forks source link

Use bigint Binomial for Choose function #29

Open jpgoldberg opened 6 years ago

jpgoldberg commented 6 years ago

A comment in the original said,

I am surprised that I have to define these. . . Maybe i just didn't look hard enough for a lib.

The choose function is also known as the binomial coefficient, and this exists in math/big for Ints.

What

This PR replaces the contents of NChoseK() with a call to math/big's Binomial(). It does some conversions so that no changes need to be made in how NChoseK is called.

Why?

Why make this change?

  1. Mostly because I saw the comment quoted above
  2. The particular implementation in the original is inefficient for large numbers. Binomial coeffients are better implemented using the log gamma function instead of calculating factorials the slow way. Though perhaps using a big integer library loose some of that.
  3. It may be cleaner to use integer arguments instead of float64 at this point, and this change now will make that easier to do sometime in the future.

What this doesn't do

  1. It doesn't correct the spelling of "NChoseK" (which should be "NChooseK". "Choose" is a really strange word and tempting to misspell). I wanted to make a small change, only touching the one function.
  2. It does not adjust the calling code to make use of integers. I do not understand why float64 is used so much in this package, and so it would be silly for me to tinker with that before understanding the reasons for the current choices.

Possible objections

  1. The move to using big Ints may be a performance hit. (I don't think it will be, but I have not benchmarked the two on the sorts of numbers used.
  2. It requires yet another import: math/big

Although I do acknowledge those as legitimate objections, I wouldn't be submitting this PR if I didn't think the benefits outweigh them.

nbutton23 commented 6 years ago

I didn't even notice that I spelled choose wrong. I have some time coming up later this week and wanted to spend some time on this library, so i will take a deeper look at this (and other PRs) then.

for why floats are used. . . I can't remember. I wrote this when I was first learning Go and it has many many issues that I need to go back and fix.

nbutton23 commented 6 years ago

So other than the benchmarks it looks fine

func Benchmark_NChoseK(b *testing.B) {
    for n := 0; n < b.N; n++ {
        NChoseK(100,2)
    } 
}

func Benchmark_NChoseKOld(b *testing.B) {
    for n := 0; n < b.N; n++ {
        NChoseKOld(100,2)
    } 
}
Benchmark_NChoseK-4              5000000               368 ns/op
Benchmark_NChoseKOld-4          300000000                4.42 ns/op

To see how this would impact the who algorithm i created these benchmarks.

func Benchmark_1(b *testing.B) {
    for n := 0; n < b.N; n++ {
        GoPasswordStrength("zxcvbn", nil)
    } 
}

func Benchmark_2(b *testing.B) {
    for n := 0; n < b.N; n++ {
        GoPasswordStrength("Tr0ub4dour&3", nil)
    } 
}

func Benchmark_3(b *testing.B) {
    for n := 0; n < b.N; n++ {
        GoPasswordStrength("neverforget13/3/1997", nil)
    } 
}

func Benchmark_4(b *testing.B) {
    for n := 0; n < b.N; n++ {
        GoPasswordStrength("briansmith4mayor", nil)
    } 
}

func Benchmark_5(b *testing.B) {
    for n := 0; n < b.N; n++ {
        GoPasswordStrength("Ba9ZyWABu99[BK#6MBgbH88Tofv)vs$", nil)
    } 
}

func Benchmark_6(b *testing.B) {
    for n := 0; n < b.N; n++ {
        GoPasswordStrength("rWibMFACxAUGZmxhVncy", nil)
    } 
}

With Old

Benchmark_1-4               2000            657936 ns/op
Benchmark_2-4               2000            711202 ns/op
Benchmark_3-4               2000            941439 ns/op
Benchmark_4-4              10000            142220 ns/op
Benchmark_5-4               2000           1073096 ns/op
Benchmark_6-4              10000            147925 ns/op

With New NChoseK

Benchmark_1-4               2000            659010 ns/op
Benchmark_2-4               2000            724277 ns/op
Benchmark_3-4               2000            965038 ns/op
Benchmark_4-4              10000            147744 ns/op
Benchmark_5-4               1000           1107561 ns/op
Benchmark_6-4              10000            152362 ns/op

@jpgoldberg What do you think?