Closed peterhellberg closed 10 months ago
A quick experiment shows that a corpus input such as:
go test fuzz v1
string("³102")
int(1)
string("77\xb17\x927\x9b777777\xf5\x9b7\xf077\xc4\xff\x82\xe17\xed\xe3\xe9\xce7\xc677")
uint64(94)
Would hang or terminate unexpectedly.
This is a great catch. I'm not sure I understand how those integers play a role, but decoding a string properly is always tricky and if a bug is caught, we should propagate it up to the spec too so other libs can fix.
I'll open a PR for this as mentioned, but probably not tonight :)
(Just keeping some notes)
A fuzz test such as this:
func FuzzNewCustomSqidsEncodeDecode(f *testing.F) {
f.Add(DefaultAlphabet, DefaultMinLength, strings.Join(Blocklist(), ","), uint64(1))
f.Fuzz(func(t *testing.T, alphabet string, minLength int, b string, u uint64) {
blocklist := strings.Split(b, ",")
o := Options{
Alphabet: &alphabet,
MinLength: &minLength,
Blocklist: &blocklist,
}
s, err := NewCustom(o)
if err != nil {
return
}
id, err := s.Encode([]uint64{u})
if err == nil {
s.Decode(id)
}
})
}
When run like this: go test -run=FuzzNewCustomSqidsEncodeDecode -fuzz=FuzzNewCustomSqidsEncodeDecode
Will currently trigger a panic (runtime error: slice bounds out of range [-1:]
) with the following (minimized) input:
go test fuzz v1
string("0127µ\xc5")
int(0)
string("0")
uint64(0)
Using v0.3.1
and running this results in an id that decodes into another uint64 than what was encoded. (92 instead of 95)
package main
import (
"fmt"
"github.com/sqids/sqids-go"
)
func main() {
alphabet := "0\x841278"
s, err := sqids.New(sqids.Options{
Alphabet: alphabet,
})
if err != nil {
panic(err)
}
id, err := s.Encode([]uint64{95})
if err != nil {
panic(err)
}
d := s.Decode(id)
if len(d) != 1 {
panic("expected length of slice to be 1")
}
id2, err := s.Encode([]uint64{d[0]})
if err != nil {
panic(err)
}
fmt.Printf("%d %q %q\n", d, id, id2)
}
$ go run main.go
[92] "78088\x84" "01\x841\x847"
Encoding 99
using the alphabet ë1092
results in 029\xc399
which can not be decoded with v0.3.1
(same issue with for example v0.2.0
)
The playground encodes into 900ëëë00
(which can also be decoded back into 99
)
It should be noted that len("abcd€")
in Go does not return 5
but 7
, since that string is 7
bytes long, but 5 runes.
Looks like ë
might be a multibyte char of 0xc3 0xab
? That's something we don't support.
Edit: Actually this is a good reminder that perhaps it's worth introducing a check for multibyte chars in alphabet within the constructor to warn the user we don't support those.
@4kimov Yes, good to know that we only need to support single byte characters in alphabets. Will add a check for this in the constructor.
(Will probably continue to write down any findings in this issue as I find them via fuzzing)
@4kimov So the "Bars" example alphabet on the playground should not be valid then? Since ▁▂▃▄▅▆▇█
are all multibyte characters.
Yes, I think so. It will be. Right now just removed it for the new algo: https://sqids.org/playground/simpler
The Go toolchain has built in support for fuzzing, so we should at least add a test of
sqids.Encode
and possibly alsosqids.Decode
(I intend to open a PR for this in a bit)