sqids / sqids-go

Official Go port of Sqids. Generate short unique IDs from numbers.
https://sqids.org/go
MIT License
492 stars 8 forks source link

Fuzz testing of the library #6

Closed peterhellberg closed 10 months ago

peterhellberg commented 11 months ago

The Go toolchain has built in support for fuzzing, so we should at least add a test of sqids.Encode and possibly also sqids.Decode

(I intend to open a PR for this in a bit)

peterhellberg commented 11 months ago

A quick experiment shows that a corpus input such as:

go test fuzz v1
string("³102")
int(1)
string("77\xb17\x927\x9b777777\xf5\x9b7\xf077\xc4\xff\x82\xe17\xed\xe3\xe9\xce7\xc677")
uint64(94)

Would hang or terminate unexpectedly.

4kimov commented 11 months ago

This is a great catch. I'm not sure I understand how those integers play a role, but decoding a string properly is always tricky and if a bug is caught, we should propagate it up to the spec too so other libs can fix.

peterhellberg commented 11 months ago

I'll open a PR for this as mentioned, but probably not tonight :)

peterhellberg commented 11 months ago

(Just keeping some notes)

A fuzz test such as this:

func FuzzNewCustomSqidsEncodeDecode(f *testing.F) {
    f.Add(DefaultAlphabet, DefaultMinLength, strings.Join(Blocklist(), ","), uint64(1))

    f.Fuzz(func(t *testing.T, alphabet string, minLength int, b string, u uint64) {
        blocklist := strings.Split(b, ",")

        o := Options{
            Alphabet:  &alphabet,
            MinLength: &minLength,
            Blocklist: &blocklist,
        }

        s, err := NewCustom(o)
        if err != nil {
            return
        }

        id, err := s.Encode([]uint64{u})
        if err == nil {
            s.Decode(id)
        }
    })
}

When run like this: go test -run=FuzzNewCustomSqidsEncodeDecode -fuzz=FuzzNewCustomSqidsEncodeDecode

Will currently trigger a panic (runtime error: slice bounds out of range [-1:]) with the following (minimized) input:

go test fuzz v1
string("0127µ\xc5")
int(0)
string("0")
uint64(0)
peterhellberg commented 10 months ago

Using v0.3.1 and running this results in an id that decodes into another uint64 than what was encoded. (92 instead of 95)

package main

import (
    "fmt"

    "github.com/sqids/sqids-go"
)

func main() {
    alphabet := "0\x841278"

    s, err := sqids.New(sqids.Options{
        Alphabet: alphabet,
    })
    if err != nil {
        panic(err)
    }

    id, err := s.Encode([]uint64{95})
    if err != nil {
        panic(err)
    }

    d := s.Decode(id)
    if len(d) != 1 {
        panic("expected length of slice to be 1")
    }

    id2, err := s.Encode([]uint64{d[0]})
    if err != nil {
        panic(err)
    }

    fmt.Printf("%d %q %q\n", d, id, id2)
}

$ go run main.go 
[92] "78088\x84" "01\x841\x847"
peterhellberg commented 10 months ago

Encoding 99 using the alphabet ë1092 results in 029\xc399 which can not be decoded with v0.3.1 (same issue with for example v0.2.0)

The playground encodes into 900ëëë00 (which can also be decoded back into 99)

peterhellberg commented 10 months ago

It should be noted that len("abcd€") in Go does not return 5 but 7, since that string is 7 bytes long, but 5 runes.

4kimov commented 10 months ago

Looks like ë might be a multibyte char of 0xc3 0xab? That's something we don't support.

Edit: Actually this is a good reminder that perhaps it's worth introducing a check for multibyte chars in alphabet within the constructor to warn the user we don't support those.

peterhellberg commented 10 months ago

@4kimov Yes, good to know that we only need to support single byte characters in alphabets. Will add a check for this in the constructor.

(Will probably continue to write down any findings in this issue as I find them via fuzzing)

peterhellberg commented 10 months ago

@4kimov So the "Bars" example alphabet on the playground should not be valid then? Since ▁▂▃▄▅▆▇█ are all multibyte characters.

4kimov commented 10 months ago

Yes, I think so. It will be. Right now just removed it for the new algo: https://sqids.org/playground/simpler