ugorji / go

idiomatic codec and rpc lib for msgpack, cbor, json, etc. msgpack.org[Go]
MIT License
1.86k stars 295 forks source link

Question: For CBOR are Maps with keys of Major type 2 possible? #286

Closed eikeon closed 5 years ago

eikeon commented 5 years ago

I've not been able to work out if it's possible to codec maps with key of major type 2 re: CBOR. I was converting my keys from []byte into string so that I can use them as keys in my map. But they are then getting tagged as major type 3.

ugorji commented 5 years ago

Thanks.

There's a quick hack fix, but I don't like it.

I instead want to take a second stab at how we handle golang string type during encoding, and also how we decode in corresponding context.

This may cause the fix to take a week or so. I will be in touch once fixed.

eikeon commented 5 years ago

Thank you!

I will look forward to the enhancement and can test it once it’s ready for testing.

ugorji commented 5 years ago

@eikeon I couldn't wait to try out my fix.

Please test out and let me know that it solves your concerns.

eikeon commented 5 years ago

Yes, with the EncodeOptions.StringToRaw = true the map keys are now turning into major type 2.

Question: If my map values are also string should the StringToRaw option also apply to them? The resulting CBOR I'm now getting is a map where the keys are encoded with type 2 and the values with type 3. Which in my current use case is fine, I think, as they are 'proper' strings. But want to understand how the determination is made for the map values?

Seems like Go's type system is letting us down a bit here :(

ugorji commented 5 years ago

It's should be working as expected. But I can only be sure with a reproducer.

Send me a reproducer (code I can run with go test or go run), and I can explain what is happening, or fix if something isn't working right.

ugorji commented 5 years ago

@eikeon ^^

ugorji commented 5 years ago

@eikeon seems there is a path within the code which doesn't take the StringToRaw flag into consideration. Fixing now.

eikeon commented 5 years ago

@ugorji, The clarification in the doc is very helpful. I am now seeing StringToRaw being consistently applied.

I am not able to determine how I can use encoding.TextMarshaler so that my type gets encoded as a UTF-8 string. Here's my explainer:

package main

import (
    "bytes"
    "encoding/hex"
    "log"

    "github.com/ugorji/go/codec"
)

type Path string

type PathT string

func (p PathT) MarshalText() (text []byte, err error) {
    return []byte(p), nil
}

type Hash string

type Manifest map[Hash]interface{}

func (m Manifest) Bytes() []byte {
    var ch codec.CborHandle
    var b []byte
    ch.BasicHandle.EncodeOptions.Canonical = true
    ch.BasicHandle.EncodeOptions.StringToRaw = true
    enc := codec.NewEncoderBytes(&b, &ch)
    err := enc.Encode(m)
    if err != nil {
        log.Fatal(err)
    }
    return b
}

func main() {
    bufA, _ := hex.DecodeString("da8a33")
    bufB, _ := hex.DecodeString("5ea3f0")

    m := Manifest{Hash(bufB): Path("b"), Hash(bufA): Path("a")}
    mm := Manifest{Hash(bufB): PathT("b"), Hash(bufA): PathT("a")}

    if bytes.Compare(m.Bytes(), mm.Bytes()) == 0 {
        log.Println("not different")
    }
}
ugorji commented 5 years ago

see https://godoc.org/github.com/ugorji/go/codec#hdr-Custom_Encoding_and_Decoding

Type must implement both sides of the symmetry i.e. MarshalText and UnmarshalText

ugorji commented 5 years ago

@eikeon

BTW, for performance and ergonomic reasons, I would write the code snippet as below (see https://godoc.org/github.com/ugorji/go/codec#hdr-Usage ):

type Manifest map[Hash]interface{}

var ch codec.CborHandle // handles maintain cache of things on first use - make shared, see docs above
func init() {
    ch.Canonical = true // leveraging ergonomic embedding rules of go
    ch.StringToRaw = true // leveraging ergonomic embedding rules of go
}
func (m Manifest) Bytes() (b []byte) {
    enc := codec.NewEncoderBytes(&b, &ch)
    err := enc.Encode(m)
    if err != nil {
        log.Fatal(err)
    }
    return
}