minio / simdjson-go

Golang port of simdjson: parsing gigabytes of JSON per second
Apache License 2.0
1.81k stars 86 forks source link

overflowing uint64 being parsed as float64 #30

Closed karlmcguire closed 3 years ago

karlmcguire commented 3 years ago

I'm using 3d975b7 as the last commit and here's how to reproduce the issue:

package main

import (
    "fmt"

    "github.com/minio/simdjson-go"
)

func main() {
    data := []byte(`{
        "number": 27670116110564327426
    }`)

    parsed, err := simdjson.Parse(data, nil)
    if err != nil {
        panic(err)
    }

    iter := parsed.Iter()
    iter.Advance()

    tmp := &simdjson.Iter{}
    obj := &simdjson.Object{}

    _, tmp, err = iter.Root(tmp)
    if err != nil {
        panic(err)
    }

    obj, err = tmp.Object(obj)
    if err != nil {
        panic(err)
    }

    // convert to map[string]interface{}
    m, err := obj.Map(nil)
    if err != nil {
        panic(err)
    }

    // prints: 2.7670116110564327e+19
    //
    // should overflow uint64
    fmt.Println(m["number"])
}

Interestingly, I found an article from 2018 where Dgraph was making the same mistake.

klauspost commented 3 years ago

@karlmcguire That is by design. Numbers are untyped in JSON, so we find the closest working format.

If we rejected this input we would reject perfectly valid JSON, as your example, eg: https://play.golang.org/p/3x0KdffDwTY

If you expect certain fields to only have integers you must add that validation to your code.

karlmcguire commented 3 years ago

I see. I guess what I need is dec.UseNumber() from encoding/json but for simdjson-go.

klauspost commented 3 years ago

@karlmcguire

Technically the tape format could be modified to include a reference to the original ascii value in the input. That would be the only reasonable way I see to provide access to that data. But that is a rather large change.

klauspost commented 3 years ago

@karlmcguire Thinking about it, we do have extra bits available on the tape for float values. We can't provide the original number, but we can provide a flag "this is a float converted from integer notation, because it overflowed".

klauspost commented 3 years ago

@karlmcguire See #31 as a proposal.

karlmcguire commented 3 years ago

That's exactly what I need. Thank you!