twpayne / chezmoi

Manage your dotfiles across multiple diverse machines, securely.
https://www.chezmoi.io/
MIT License
13.36k stars 493 forks source link

JSON numbers are being converted to decimals #3325

Closed mcexit closed 11 months ago

mcexit commented 1 year ago

Describe the bug

This is sort of my fault for switching my data to JSONC. However, I noticed when converting my data variables to other formats like INI or TOML, that it is adding decimals to my numbers. This is reproducible with JSON data variables as well.

To reproduce

Create a basic JSON or JSONC data file like so:

example.json

{
    "example": {
        "number": 1
    }
}

Run the following:

chezmoi execute-template '{{ toIni .example }}'

The result is: number = 1.000000

You can also do:

chezmoi execute-template '{{ toToml .example }}'

The result is: number = 1.0

Expected behavior

I'm hopeful that it can represent the number as is, since some of my configs require it.

Like: number = 1

Output of chezmoi doctor

```console $ chezmoi doctor RESULT CHECK MESSAGE warning version v2.40.4, built at 2023-10-30T07:09:38Z ok latest-version v2.40.4 ok os-arch linux/amd64 (Arch Linux) ok uname Linux rooW9wai 6.5.9-zen2-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Thu, 26 Oct 2023 00:51:53 +0000 x86_64 GNU/Linux ok go-version go1.21.3 (gc) ok executable /usr/bin/chezmoi ok config-file no config file found ok source-dir ~/.local/share/chezmoi is a directory ok suspicious-entries no suspicious entries ok working-tree ~/.local/share/chezmoi is a directory ok dest-dir ~ is a directory ok umask 022 ok cd-command found /bin/bash ok cd-args /bin/bash info diff-command not set ok edit-command found /usr/bin/vim ok edit-args /usr/bin/vim ok git-command found /usr/bin/git, version 2.42.1 ok merge-command found /usr/bin/vimdiff ok shell-command found /bin/bash ok shell-args /bin/bash info age-command age not found in $PATH ok gpg-command found /usr/bin/gpg, version 2.2.41 info pinentry-command not set info 1password-command op not found in $PATH info bitwarden-command bw not found in $PATH info bitwarden-secrets-command bws not found in $PATH info dashlane-command dcli not found in $PATH info doppler-command doppler not found in $PATH info gopass-command gopass not found in $PATH info keepassxc-command keepassxc-cli not found in $PATH info keepassxc-db not set info keeper-command keeper not found in $PATH info lastpass-command lpass not found in $PATH ok pass-command found /usr/bin/pass, version 1.7.4 info passhole-command ph not found in $PATH info rbw-command rbw not found in $PATH info vault-command vault not found in $PATH info vlt-command vlt not found in $PATH info secret-command not set ```

Additional context

It looks like Go's decoder stores JSON numbers as float64 but that it may have an option to unmarshal a number into an interface{} as a Number instead of as a float64:

https://pkg.go.dev/encoding/json#Decoder.UseNumber

I'm not sure if that will fix the problem, or if the option will even be exposed through the package imports being used. In the meantime I can store these specific data files as TOML or YAML, or convert to YAML & back before converting to INI.

halostatue commented 1 year ago

JSON does not define an integer type, only IEEE754 double precision floats (like JavaScript itself). decoder.UseNumber will keep the value as a string (which is technically an interface{}), but if you need to do math with it, then you are likely to have issues.

Consider whether you need the value to be a number or not, and you can simply store it as a string if you don't.

I suspect that for most cases, using decoder.UseNumber, there would be no negative impact…but I think that this change would be a compatibility break.

twpayne commented 1 year ago

Yeah, as @halostatue says, JSON itself doesn't support integers, only doubles.

If you need an exact integer value then you can either use a string (after all, the value will be substituted into a template in the end, where it will be converted to a string anyway), or use a format that supports integers, e.g. TOML or YAML.

mcexit commented 1 year ago

Thanks and totally understand that this is the expected behavior. The only other request I have is if you'd consider making this configurable. Whether that be a PR that I submit, or someone else.

I may end up switching to another data format later on where this will no longer be an issue, but for now it is nothing more than a slight annoyance of needing to use a different data file format for specific variables.

twpayne commented 1 year ago

I would accept a tidy PR that makes this configurable, but I think it will be very difficult to create such a PR for the following reasons:

  1. There's a chicken-and-egg problem. chezmoi's own config file can use JSON, so you have to parse JSON before you know how to interpret doubles-that-can-be-represented-as-ints.
  2. chezmoi uses a centralized set of format encoders/decoders so changing it to use json.Decoder.UseNumber will likely break a lot of things. If you move away from the centralized JSON encoder/decoder, then you then need to decide which encoder/decoder to use every time you encode/decode a JSON file. This will likely be extremely messy to configure.

An alternative might be to use the int template function to convert a value that you know should always be an int to an actual int.

Ultimately, though, JSON is just not a good choice if you want to represent ints. What's particularly insidious with JSON is that it silently corrupts your ints once they get large enough.

twpayne commented 12 months ago

Hopefully this is now resolved. Please re-open if needed.

mcexit commented 11 months ago

I meant to say thanks sooner for your detailed responses and now better understand the issues. I've tried to learn some Go over the last few days. I'm still a beginner all things considered but hope to learn more as I can.

I made an attempt to create a DecodeWithNumbers method that implements a custom decoder. However, I'm not sure if it is any more efficient than just using decoder.Decode(value) and then ranging through and converting the numbers after. Please let me know your thoughts if you have time.

func (formatJSON) DecodeWithNumbers(data []byte, value any) error {
    decoder := json.NewDecoder(bytes.NewReader(data))
    decoder.UseNumber()

    var tokenize func(*json.Decoder, any) error
    tokenize = func(decoder *json.Decoder, value any) error {
        token, err := decoder.Token()
        if err != nil {
            return fmt.Errorf("error reading JSON token: %w", err)
        }

        switch token := token.(type) {
        case json.Delim:
            switch token {
            case '{':
                object := make(map[string]interface{})
                for decoder.More() {
                    token, err := decoder.Token()
                    if err != nil {
                        return fmt.Errorf("error reading JSON object key: %w", err)
                    }
                    key, ok := token.(string)
                    if !ok {
                        return fmt.Errorf("expected string for JSON object key, got unexpected type: %T", token)
                    }
                    var innerValue interface{}
                    if err := tokenize(decoder, &innerValue); err != nil {
                        return fmt.Errorf("error decoding value for key '%s': %w", key, err)
                    }
                    object[key] = innerValue
                }
                if _, err := decoder.Token(); err != nil {
                    return fmt.Errorf("error reading closing delimiter of JSON object: %w", err)
                }
                switch val := value.(type) {
                case *map[string]interface{}:
                    *val = object
                case *interface{}:
                    *val = object
                }

            case '[':
                var array []interface{}
                for decoder.More() {
                    var innerValue interface{}
                    if err := tokenize(decoder, &innerValue); err != nil {
                        return fmt.Errorf("error decoding array element: %w", err)
                    }
                    array = append(array, innerValue)
                }
                if _, err := decoder.Token(); err != nil {
                    return fmt.Errorf("error reading closing delimiter of JSON array: %w", err)
                }
                switch val := value.(type) {
                case *[]interface{}:
                    *val = array
                case *interface{}:
                    *val = array
                }
            }

        case json.Number:
            if i64, err := token.Int64(); err == nil {
                *value.(*interface{}) = i64
            } else if f64, err := token.Float64(); err == nil {
                *value.(*interface{}) = f64
            } else {
                return fmt.Errorf("failed to convert json.Number '%v' to int64 or float64: %v", token, err)
            }

        case string, bool, nil:
            *value.(*interface{}) = token
        default:
            return fmt.Errorf("unexpected JSON token type: %T", token)
        }

        return nil
    }

    return tokenize(decoder, value)
}
twpayne commented 11 months ago

I meant to say thanks sooner for your detailed responses and now better understand the issues. I've tried to learn some Go over the last few days. I'm still a beginner all things considered but hope to learn more as I can.

This is way beyond beginner Go code!

I made an attempt to create a DecodeWithNumbers method that implements a custom decoder. However, I'm not sure if it is any more efficient than just using decoder.Decode(value) and then ranging through and converting the numbers after. Please let me know your thoughts if you have time.

Given that performance is unlikely to be a concern, I think your idea of ranging through and converting the numbers after is the best one. I'll open a PR for your review.