tidwall / gjson

Get JSON values quickly - JSON parser for Go
MIT License
14.31k stars 854 forks source link

Handling decimal numbers better #189

Open rickb777 opened 4 years ago

rickb777 commented 4 years ago

It's well known that base-2 IEEE754 floats are not appropriate for truly-decimal values such as money. There is an excellent Decimal package (github.com/shopspring/decimal) that handles decimal numbers (including integers).

It would be possible to reduce how often JSON numbers are converted internally. By using decimal.Decimal as the intermediate value, i.e. Result.Num would be a decimal.Decimal, it would be possible to obtain the integer value, float64 value, string value, or of course the decimal itself. The main advantage of this would be reduced risk of rounding / base conversion issues.

An extra method would provide access:

func (t Result) Decimal() decimal.Decimal

For JSON Numbers, this would return the value unchanged. The other conversion methods would need amending to use the Result.Num.

There might occasionally be a small performance difference, sometimes a gain or sometimes a loss.

The alternative approach would be for users wanting Decimals to write adapter functions like the String(), Float() etc methods but this would be a lot more clumsy.

rickb777 commented 4 years ago

I've realised that this would be a breaking change for any code that uses Result.Num and relies on its float64 type.

Maybe something for v2?

tidwall commented 4 years ago

Perhaps for v2.

But for now it should be possible to create an custom external function, such as:

func toDecimal(r gjson.Result) decimal.Decimal {
    n, _ := decimal.NewFromString(r.String())
    return n
}
rickb777 commented 4 years ago

It would also be possible to add a method without a breaking change, such as

func (t Result) Decimal() decimal.Decimal {
    switch t.Type {
    default:
        return decimal.Decimal{}
    case True:
        return decimal.NewFromInt(1)
    case String:
        n, _ := decimal.NewFromString(t.Str)
        return n
    case Number:
        return decimal.RequireFromString(t.Raw)
    }
}

This relies on t.Raw being defined whenever the type is a Number (which appears to be true).

The disadvantage of the adapter function you suggested is that the number is converted from input digits to a float and then to a string and then to a decimal. The above doesn't do as many conversions and can't lose precision (errr... I think)

Would you like a PR?

rickb777 commented 4 years ago

An improved adapter function is of course possible

func toDecimal(t Result) decimal.Decimal {
    switch t.Type {
    default:
        return decimal.Decimal{}
    case True:
        return decimal.NewFromInt(1)
    case String:
        n, _ := decimal.NewFromString(t.Str)
        return n
    case Number:
        return decimal.RequireFromString(t.Raw)
    }
}

but the new method would be more elegant, I think.

rickb777 commented 4 years ago

It has occurred to me that this should only be a v2 change and that you could go further: for v2 you don't need the Result.Num field.

Removing Num would offer several benefits, mostly due to simplifications. In terms of performance, some cases would benefit from this lazy evaluation (especially in large documents containing many numbers that are ignored) and other cases might suffer (e.g. repeated access to the same number nodes).

The Float() float64 method covers this case. It would now be changed to include the lazy float conversion

    ...
    case Number:
        n, _ := strconv.ParseFloat(t.Raw, 64)
        return n
    }

There would of course be other simplifications, notably to the tonum function (it doesn't need to use strconv.ParseFloat).

There is already quite a lot of lazy evaluation in terms of AST traversal (arrayOrMap method and the arrayOrMapResult). This is interesting; for comparison the Java Jackson API allows tree-based manipulation more explicitly but I think your approach is a good one.

Having a more significant Float() method would perhaps mean adding the Decimal() method we discussed above would be the logical next step.