ugorji / go

idiomatic codec and rpc lib for msgpack, cbor, json, etc. msgpack.org[Go]
MIT License
1.83k stars 294 forks source link

Handling of map[interface{}]interface{} with Canonical=true #360

Closed mxmauro closed 3 years ago

mxmauro commented 3 years ago

Hi Ugorji, little story

I had to deal with a scenario where I have to decode a map inside a msgpack, remove one item, and reencode it. But the map had nested maps, some using string and others using uint64 as the key types.

Because of this, I was forced to use map[interface{}]interface{} but then I was unable to encode it again using the canonical sort because the code does not handle interface{} keys.

I ended doing this piece of code that might help anyone:



package main

import (
    "bytes"
    "go/format"
    "io/ioutil"
    "log"
    "os"
    "strings"
    "text/template"
)

// -----------------------------------------------------------------------------

type TemplateConfig struct {
    Types  []string
}

// -----------------------------------------------------------------------------

func main() {
    funcMap := template.FuncMap{
        "capitalize": func(s string) string {
            // NOTE: no need to extract runes because always ansi characters are used
            return strings.ToUpper(s[:1]) + s[1:]
        },
    }

    // Generator template
    tmpl := template.Must(template.New("").Funcs(funcMap).Parse(`package codec

// Code generated by go generate; DO NOT EDIT.

import (
    "errors"
    "reflect"
)

// -----------------------------------------------------------------------------

// RecreateMap tries to identify the key type for each map (root or nested) and builds a new map using the proper key type.
func RecreateMap(src interface{}) (dest interface{}, err error) {
    defer func() {
        if r := recover(); r != nil {
            err = r.(error)
        }
    }()
    dest, err = recreateMapInternal(src)
    return
}

func recreateMapInternal(src interface{}) (interface{}, error) {
    var ok bool

    if reflect.ValueOf(src).Kind() != reflect.Map {
        return src, nil
    }

{{range $type := .Types}}
    _, ok = src.(map[{{$type}}]interface{})
    if ok {
        return recreateMap{{capitalize $type}}(src.(map[{{$type}}]interface{}))
    }
{{end}}

    var srcMapAsInterface map[interface{}]interface{}
    srcMapAsInterface, ok = src.(map[interface{}]interface{})
    if !ok {
        return nil, errors.New("unsupported map type")
    }
    for key := range srcMapAsInterface {
        switch key.(type) {
{{range $type := .Types}}
        case {{$type}}:
            return recreateGenericMap{{capitalize $type}}(srcMapAsInterface)
{{end}}
        }
        break
    }

    return nil, errors.New("unsupported map type")
}

{{range $type := .Types}}
func recreateMap{{capitalize $type}}(src map[{{$type}}]interface{}) (interface{}, error) {
    var err error

    dest := make(map[{{$type}}]interface{})

    for key, value := range src {
        dest[key], err = recreateMapInternal(value)
        if err != nil {
            return nil, err
        }
    }
    return dest, nil
}

func recreateGenericMap{{capitalize $type}}(src map[interface{}]interface{}) (interface{}, error) {
    var err error

    dest := make(map[{{$type}}]interface{})

    for key, value := range src {
        keyName, ok := key.({{$type}})
        if !ok {
            return nil, errors.New("mixed key types on map")
        }

        dest[keyName], err = recreateMapInternal(value)
        if err != nil {
            return nil, err
        }
    }
    return dest, nil
}
{{end}}`))

    tmplConfig := TemplateConfig{
        Types:  []string{
            "string",
            "uint8",
            "uint16",
            "uint32",
            "uint64",
            "int8",
            "int16",
            "int32",
            "int64",
        },
    }

    output := &bytes.Buffer{}
    err := tmpl.Execute(output, tmplConfig)
    if err != nil {
        log.Fatalf("Error executing template [err=%v]", err)
    }

    var data []byte

    data, err = format.Source(output.Bytes())
    if err != nil {
        log.Fatalf("Error formatting generated code [err=%v]", err)
    }

    err = ioutil.WriteFile("generated_map_rebuilder.go", data, os.ModePerm)
    if err != nil {
        log.Fatalf("Error writing generated file [err=%v]", err)
    }
}```

Kind regards,
Mauro.
ugorji commented 3 years ago

I'm not following.

Canonical should work for interface{} keys, and you should not have to do anything special.

Please attach a reproducer so I can reproduce your problem, or point out how to fix it (if not a bug).

Ping me on this, using @ugorji and I will follow-up (including re-opening the issue if it is a bug).

mxmauro commented 3 years ago

Hi @ugorji I created an example:

sample.zip

The code decodes and reencodes a msgpack-based block of the Algorand's blockchain and the result is different than the expected.

Things to take into account:

  1. You cannot use map[string]interface{} (which works almost all the time) for decoding because this particular block has an internal struct of type map[uint64]...
  2. The source is canonical but the reencoded data isn't. The main struct has two inner structs, block and cert. When reencoding, the cert is added first because the code is unable to sort items.
  3. Some context: Packing of the original block is generated with this code https://github.com/algorand/go-algorand/blob/master/protocol/codec.go . Despite it is a fork, using your updated master leads to the same result.

Regards, Mauro.

ugorji commented 1 year ago

@mxmauro

Thanks for the reproducer.

That's a tough test.

For pragmatic reasons, we haven't been disciplined in maintaining the format over bug fixes, by nature of bug fixes e.g. canonical previously would treat all numeric types by looking only at the kind, ignoring the fact that a user might have created extensions for it, or implemented binaryMarshaler or codecSelfer or otherwise. We fixed that bug. Consequently, on reencoding, the stream may be more correct.

Canonical is not used by most users - meaning it didn't get the majority of users using the support and finding issues, especially across the different paths of reflection and code-generation. I think it's in a much better place now, and as bug fixes in it reduce, the encoded stream given a set of EncodeOptions should stay consistent.

Consequently, a more complete test would be to validate that, on decoding, it's the same input object, and on re-encoding, its the same byte stream. i.e.

Hope this helps.

BTW, I'm interested in your notes above

  1. Are you saying that the value is a map, whose key is also a map or a struct that contains a map i.e. map[uint64]XXX? That's will be a challenge for canonical. Is there a reason why canonical is important?
mxmauro commented 1 year ago

Hi @ugorji you are welcome.

The reason for canonical is because, you generate a hash of the encoded data and that becomes the ID of the block, transaction, whatever, and that cannot change.

About the map of the map, see these links:

https://github.com/algorand/go-algorand/blob/0111cb102c870eb5c107479e13c34cb6d62cb32c/data/transactions/teal.go#L36 https://github.com/algorand/go-algorand/blob/0111cb102c870eb5c107479e13c34cb6d62cb32c/data/basics/teal.go#L73

The comments above each marked line contains the description of the key.

ugorji commented 1 year ago

Argh ... tough. Completely agree ... that CANNOT change.

Two things I updated in the code:

If you didn't implement extensions and you didn't those interfaces, then canonical shouldn't change due to the bug fixes. However, your omitemptyarray is automatically honored by omitempty.

Question for you: Is there a reason why you added a new flag (omitemptyarray) as opposed to just extending the support of omitempty?