tidwall / gjson

Get JSON values quickly - JSON parser for Go
MIT License
14.1k stars 846 forks source link

gjson.Get() seems to truncate all letters in front of 1st digit letter #256

Open shiatcb opened 2 years ago

shiatcb commented 2 years ago

This issue happens when we perform a validation task: https://github.com/coinbase/rosetta-cli/issues/267.

Briefly, it seems when we pass a json object whose value is a string with numbers, e.g {"a":a6ed68a1cc964b430e1e40254347367f08e4eb5eeaf0852d5648022873b50c07}, to gjson.Get(), the function will go through the value, find a number '6', mark num = true, and ignore the previous letters: https://github.com/tidwall/gjson/blob/master/gjson.go#L2183. Here let's say json = {"a":a6ed68a1cc964b430e1e40254347367f08e4eb5eeaf0852d5648022873b50c07}, key = "var", we use gjson.Get(json, key) to get the output.

I've tried a few more inputs such as "abc6d", they all have the same symptom, aka "abc6d" outputs "6d" when we call gjson.Get().

It looks to me the logic to mark a value as number may need change: https://github.com/tidwall/gjson/blob/master/gjson.go#L2183. Can someone please advise? Many thanks!

Shi

tidwall commented 2 years ago

Gjson return predictable values on valid json. Your json is not valid:

{"a":a6ed68a1cc964b430e1e40254347367f08e4eb5eeaf0852d5648022873b50c07}

I recommend that you can either, A. Check your json before using gjson.Get by using gjson.Valid Or B. use the @valid modifier with gjson.Get such as gjson.Get(json, “@valid.a”).

shiatcb commented 2 years ago

Thank you @tidwall, yes we also realized the issue is due to the invalid json format, and if we quote the "a6e..." value in json, it will work as expected.

However, this invalid json format is coming from return value of function call: sjson.GetRaw(). Here is a simplified version of how we use sjson and gjson to set/get the value:

package main

import "log"
import "github.com/tidwall/gjson"
import "github.com/tidwall/sjson"

func main() {
    output, _ := sjson.SetRaw("", "a", "abc6d")
    res := gjson.Get(output, "a")
    log.Println("output is: " + output + ", value is: " + res.Raw)
}

The terminal prints out:

2021/12/24 12:16:10 output is: {"a":abc6d}, value is: 6d

As far as we can see, it seems SetRaw returns a json without double-quoting the value (I guess it makes a bit sense because the value could be bool, integer etc which we don't want a string quote), and pass the value to gjson for processing will cause the problem.

I am thinking maybe this is not the correct way to use sjon/gjson to set/get the string value, or there might be a work around for that. Do you have any suggestion if we can avoid this issue using existing sjon/gjson api without coming up with a customized string processing logic?

Many thanks!

tidwall commented 2 years ago

I recommend using sjson.Set instead of SetRaw. The Set takes care of converting the input, in the case of a string it adds quotations. SetRaw is for setting "raw" json.

shiatcb commented 2 years ago

Thank you @tidwall and happy new year! Sorry for the late response, I was on vacation last week.

I've tested locally and Set indeed adds double quote for string. Although I am wondering if there is any case that can be handled by SetRaw but not Set? Also, I am not very sure what is the difference b/w 'raw json' vs 'json'.

Would you mind advising?

tidwall commented 2 years ago

The SetRaw is useful for quickly embedding a block of raw json, usually an object or array.

{
    "id": 98120398,
    "info": "something"
}
jsondoc = sjson.SetRaw(jsondoc, "info", `{ "name": { "first": "Tom", "last": "Anderson" }`)
{
    "id": 98120398,
    "info": { "name": { "first": "Tom", "last": "Anderson" }
}

This can be useful if you want to just take one json document and plop it into a field of another one.

data, _ := os.ReadFile("info.json")
jsondoc = sjson.SetRaw(jsondoc, "info", string(data))

Or for mixing with other json encoders.

type Info struct {
    Name struct {
        First string `json:"first"`
        Last string `json:"last"`
    } `json:"name"
}
var info Info
info.Name.First = "Tom"
info.Name.Last = "Anderson"
data, _ := json.Marshal(info)
jsondoc = sjson.SetRaw(jsondoc, "info", string(data))
shiatcb commented 2 years ago

Thank you @tidwall , that makes sense!

I did a quick experiment using Set and SetRaw to set value for raw json, unfortunately we can't just replace SetRaw with Set because in some cases, we need to 'unmarshal' the json object, meaning converting json object into another type (e.g map[string]interface{}). In this case, if we use Set for raw json, the conversion will fail.

This is the code to explain the idea:

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "github.com/tidwall/gjson"
    "github.com/tidwall/sjson"
    "log"
)

// UnmarshalInput Following scenario does not work for Set()
// ../../Workspace/rosetta-sdk-go/constructor/job/job.go
//  value := gjson.Get(j.State, variable)
//        if !value.Exists() {
//                return ErrVariableNotFound
//        }
//
//        return UnmarshalInput([]byte(value.Raw), output)
func UnmarshalInput(input []byte, output interface{}) error {
    // To prevent silent erroring, we explicitly
    // reject any unknown fields.
    dec := json.NewDecoder(bytes.NewReader(input))
    dec.DisallowUnknownFields()

    if err := dec.Decode(&output); err != nil {
        return fmt.Errorf("%w: unable to unmarshal", err)
    }

    return nil
}

func main() {
    json1 := ""
    json1, _ = sjson.SetRaw(json1, "info", `{ "name": { "first": "Tom", "last": "Anderson" }`)
    v1 := gjson.Get(json1, "info").Raw

    log.Println("SetRaw - value: " + v1)
    data1 := make(map[string]interface{})
    UnmarshalInput([]byte(v1), &data1)
    log.Println(data1)

    json2 := ""
    json2, _ = sjson.Set(json2, "info", `{ "name": { "first": "Tom", "last": "Anderson" }`)
    v2 := gjson.Get(json2, "info").Raw

    log.Println("Set - value: " + v2)
    data2 := make(map[string]interface{})
    UnmarshalInput([]byte(v2), &data2)
    log.Println(data2)
}

The output is:

$ go run main.go 0s 2022/01/04 20:47:24 SetRaw - value: { "name": { "first": "Tom", "last": "Anderson" }} 2022/01/04 20:47:24 map[name:map[first:Tom last:Anderson]] 2022/01/04 20:47:24 Set - value: "{ \"name\": { \"first\": \"Tom\", \"last\": \"Anderson\" }" 2022/01/04 20:47:24 map[]

As you can see if we use Set, the map will be empty.

A hacky way to resolve this issue is to check the input and use Set only for string type (not raw json, boolean, integer etc). But I wonder if there is a better solution. Do you have any comments?

tidwall commented 2 years ago

Not sure if this helps but you can use a map type with Set. Set is pretty flexible with value types.

var m map[string]interface{}
json.Unmarshal([]byte(`{ "name": { "first": "Tom", "last": "Anderson" } }`), &m)
jsondoc, _ := sjson.Set("", "info", m)
println(jsondoc)