pelletier / go-toml

Go library for the TOML file format
https://github.com/pelletier/go-toml
Other
1.72k stars 209 forks source link

default values for objects in variable length arrays #857

Open cfal opened 1 year ago

cfal commented 1 year ago

Hi there, I realize that v2 doesn't support default values - but I'm wondering the best way to do this, or if it's possible even in combination with go-defaults or other custom code.

I am trying to decode a TOML document that contains a variable length array of objects - and I'd like to prefill the field values as recommended in the readme.

eg, in this example, i'd like C to be set to -1 when the key is not present:

package main

import (
        "bytes"
        "fmt"
    "github.com/pelletier/go-toml/v2"
)

func main() {
    type varData struct {
            A int
            B int
            C int
    }
    type data struct {
        VarData []varData
    }

    var d data
    tomlBlob := []byte(`
[[VarData]]
A = 1
B = 2

[[VarData]]
A = 2
B = 1
`)

    toml.NewDecoder(bytes.NewReader(tomlBlob)).Decode(&d)
    fmt.Printf("%+v\n", d)
}

it seems like with a library like go-defaults, you'd still need to call a SetDefaults() method, and there isn't a good way to do this in advance for variable length array objects.

An option seems to be to use encoding.TextUnmarshaler perhaps? i was thinking something like this:

func (v *VarData) UnmarshalText(text []byte) error {
  v.SetDefaults()
  return toml.NewDecoder(text).Decode(&v)
}

..but I was also curious with the TextUnmarshaler approach (if it works?), how you would know that the provided text []bytes parameter is a TOML string and due to a go-toml invocation, and not from decoding other formats - ie, this would error out if we were trying to decode a JSON string instead.

Thank you!

cfal commented 1 year ago

i stumbled upon #484 and its test, and noticed that in fact UnmarshalText does not get called when items are defined in this manner:

package main

import (
    "fmt"
    "github.com/pelletier/go-toml/v2"
    "strconv"
)

type Integer struct {
    Value int
}

func (i Integer) MarshalText() ([]byte, error) {
    return []byte(strconv.Itoa(i.Value)), nil
}
func (i *Integer) UnmarshalText(data []byte) error {
    fmt.Println("NEVER CALLED")
    conv, err := strconv.Atoi(string(data))
    if err != nil {
        return err
    }
    i.Value = conv
    return nil
}

type Config struct {
    Integers []Integer
}

func main() {
    raw := []byte(`
[[Integers]]
Value = 3

[[Integers]]
Value = 4
`)
    var cfg Config
    fmt.Println(toml.Unmarshal(raw, &cfg))
    fmt.Printf("%#v", cfg)
}

results in:

<nil>
main.Config{Integers:[]main.Integer{main.Integer{Value:3}, main.Integer{Value:4}}}
Program exited.

Playground

pelletier commented 1 year ago

Sorry for taking so long to respond!

The encoding.TextUnmarshaler interface is only used to decode a specific type from a TOML string (same behavior as encoding/json). To achieve what you want, we probably need to bring back a toml.Unmarshaler interface, that would behave like the first example you gave:

func (v *VarData) UnmarshalTOML(text []byte) error {
  v.SetDefaults()
  return toml.NewDecoder(text).Decode(&v)
}

I'll think about it. I dropped the support for it when switching to v2 because it wasn't really used and poorly defined at the time, but it may be worth rebuilding this feature in the new codebase.

cfal commented 1 year ago

thanks for replying! makes sense, realized that this wouldn't work as i expected. I saw that BurntSushi/toml also has a UnmarshalTOML(interface{}) function that allows for this behavior - but it does require quite a bit of extra boilerplate.

it seems like even with UnmarshalTOML ini your example, we wouldn't be able to set the defaults properly for variable length arrays - since we have no idea how long the array will end up being.

one solution would be if it was somehow possible to define how new objects are constructed (eg being able to instruct the decoder to call a NewVarData function), but i'm not familiar with go reflection and not sure if that's possible.

another hacky idea is to parse twice:

func main() {
    raw := []byte(`
[[Integers]]
Value = 3

[[Integers]]
Value = 4
`)
    var cfg Config
    fmt.Println(toml.Unmarshal(raw, &cfg))

        // at this point, we know the length of cfg.VarData,
        // apply all defaults
        varDataLen := len(cfg.VarData)
        for i := 0; i < varDataLen; i++ {
          cfg.VarData[i].SetDefaults()
        }

        // now parse again
    fmt.Println(toml.Unmarshal(raw, &cfg))
    fmt.Printf("%#v", cfg)
}

haven't yet tested if this works.. 🙂

pelletier commented 1 year ago

Since we are talking about hacks, how about using a "serialization field" that is only used to know whether there is an actual value in the document, and set the default or value after unmarshaling?

package main

import (
    "fmt"
    "github.com/pelletier/go-toml/v2"
)

type Integer struct {
    OptValue *int `toml:"Value"`
    V        int
}

type Config struct {
    Integers []Integer
}

func main() {
    raw := []byte(`
[[Integers]]
Value = 3

[[Integers]]
Value = 4

[[Integers]] # should have a default
`)
    var cfg Config
    fmt.Println(toml.Unmarshal(raw, &cfg))

    for i, x := range cfg.Integers {
        if x.OptValue == nil {
            cfg.Integers[i].V = -1
        } else {
            cfg.Integers[i].V = *x.OptValue
        }
    }
    fmt.Printf("%#v", cfg)
}

# main.Config{Integers:[]main.Integer{main.Integer{OptValue:(*int)(0xc0000b6048), V:3}, main.Integer{OptValue:(*int)(0xc0000b6050), V:4}, main.Integer{OptValue:(*int)(nil), V:-1}}}

https://play.golang.com/p/vzo_qqbQKAK

I'm curious how people do it with encoding/json, since I'd like to keep emulating the behavior of stdlib.

cfal commented 1 year ago

https://play.golang.com/p/vzo_qqbQKAK

ah, that also works well.

the reason i had started investigating this is because we were doing something similar for a codebase i work on, and was wondering if there's a simpler way; instead of having it in the same struct though, we had two different structs, one with all pointers and one with no pointers. and we'd deserialize into the pointer struct, and then compare if it's nil in order to know when to set defaults when copying over all the values into the non-pointer struct :sweat_smile:

I'm curious how people do it with encoding/json, since I'd like to keep emulating the behavior of stdlib.

I think for encoding/json, https://pkg.go.dev/encoding/json#RawMessage.UnmarshalJSON works well - unlike TOML, there's no "tables" concept or multiple ways to define an array. you'll always be passed an array like [{"Value": 3}, {"Value: 4}] as the []byte array in UnmarshalJSON - vs the TOML [[Integers]] tables.

and because of this, I think if implementing UnmarshalTOML(text []byte) - it's not going to be very clear what text is going to be filled with, since it could be both [3, 4], or

[[Integers]]
Value = 3

[[Integers]]
Value = 4

.. and seems inconsistent since the latter has the Integers label, while the former only has the values, ala UnmarshalJSON. it wouldn't be possible to pass both of these cases to toml.Unmarshal without some preprocessing.

I wonder if using an interface like BurntSushi/toml would make more sense: https://github.com/BurntSushi/toml/blob/master/example_test.go#L284

pelletier commented 1 year ago

Hm at that point why not unmarshing into a an interface{} or map[string]interface{}? Is it to avoid a post-processing phase?