polydawn / refmt

Object mapping for golang.
MIT License
48 stars 14 forks source link

cbor output is not canonical #16

Closed whyrusleeping closed 6 years ago

whyrusleeping commented 6 years ago

Keys in maps need to be sorted first by length, then lexicographically within that, ref: https://tools.ietf.org/html/rfc7049#section-3.9

If I can get a pointer on where to look to add this, i'll take a hack at it.

warpfork commented 6 years ago

If I'm not mistake, you should be able to get canonical CBOR out, but if working on structs, you may have to be fairly explicit about it, and yes, helper methods for that are in order.

Internally, things are a linear stream of Token objects, so mostly what you're asking is about stuff under the obj/ package.

Maps are sorted already.

Structs are iterated in the order defined by in the AtlasEntry, so if you build the AtlasEntry in canonical order, you should get canonical order overall. This could use some helper methods though (in particularly since autogenerate does struct field definition order, and there's currently no way to change that!).

We also could (should?) add a mode to the cbor encoder and decoder to sanity check that things are in canonical ordering. I dunno about doing a sort on that end though. It's definitely more efficient to do things in the right order on the token source side when possible, because that saves another intermediate memory allocation for the sorting.

whyrusleeping commented 6 years ago

@warpfork what would the best way to make this change an option be?

why@manwe ~/g/s/g/p/refmt> git diff
diff --git a/obj/marshalMapWildcard.go b/obj/marshalMapWildcard.go
index 8eba1b7..1dd2fd4 100644
--- a/obj/marshalMapWildcard.go
+++ b/obj/marshalMapWildcard.go
@@ -44,7 +44,7 @@ func (mach *marshalMachineMapWildcard) Reset(slab *marshalSlab, rv reflect.Value
                mach.keys[i].rv = v
                mach.keys[i].s = v.String()
        }
-       sort.Sort(wildcardMapStringyKey_byString(mach.keys))
+       sort.Sort(wildcardMapStringyKey_cborCanonical(mach.keys))

        mach.index = -1
        return nil
@@ -96,3 +96,15 @@ type wildcardMapStringyKey_byString []wildcardMapStringyKey
 func (x wildcardMapStringyKey_byString) Len() int           { return len(x) }
 func (x wildcardMapStringyKey_byString) Swap(i, j int)      { x[i], x[j] = x[j], x[i] }
 func (x wildcardMapStringyKey_byString) Less(i, j int) bool { return x[i].s < x[j].s }
+
+type wildcardMapStringyKey_cborCanonical []wildcardMapStringyKey
+
+func (x wildcardMapStringyKey_cborCanonical) Len() int      { return len(x) }
+func (x wildcardMapStringyKey_cborCanonical) Swap(i, j int) { x[i], x[j] = x[j], x[i] }
+func (x wildcardMapStringyKey_cborCanonical) Less(i, j int) bool {
+       li, lj := len(x[i].s), len(x[j].s)
+       if li == lj {
+               return x[i].s < x[j].s
+       }
+       return li < lj
+}
warpfork commented 6 years ago

Holy smokes. I just blazed by the section on sorting in the RFC and assumed it did what I would expect.

But yep: "If two keys have different lengths, the shorter one sorts earlier". Uffdah. And the results are quite different; for example:

[Alpha Bravo Delta Ga Go Gopher Grin]   vs
[Ga Go Grin Alpha Bravo Delta Gopher]

I spent some time pondering whether we should just make this sorting the default; but no, I think not. I'm not even sure I agree with this definition of "canonical"; and since the spec also says "Those protocols are free to define what they mean by a canonical format", FWIW, I think most of my applications will choose a definition that does string sort on strings. :woman_shrugging: It's much less confusing to the eye and my feeling is that string sort makes more consistent sense for other serial formats as well (the CBOR RFC's suggestion for sorting with lengths like this is apparently driven by desire to support maps with mixed-type keys, but doing so in a way that entangles sorting definitions with serial form of e.g. integers anyway... ew).

... so! Yeah, how to make it an option.


I think we touch a new file under obj/atlas/ and put a builder in there, similar to the ones already in that package for struct mappings and transforms. Then we hang another method to get one of those builders off of *BuilderCore for a consistent usage pattern. And then we should be off to the races; we can put an enum for sorting styles into the builder state. Does that make sense?

Some of the type names in that package could maybe do with a refactor while we're there. type StructMap is a bit of a mouthful and rather confusing since it has nothing to do with maps, for example.

warpfork commented 6 years ago

This should now work on master; tests added in 656b0c9.

The syntax is admittedly verbose, as builder patterns are wont to be, but here it is:

atlas.BuildEntry(map[string]string{}).MapMorphism().
    SetKeySortMode(atlas.KeySortMode_RFC7049).
    Complete()
whyrusleeping commented 6 years ago

Does that also work for map[string]interface things?

On Tue, Dec 19, 2017, 3:21 PM Eric Myhre notifications@github.com wrote:

This should now work on master; tests added in 656b0c9 https://github.com/polydawn/refmt/commit/656b0c9361d610707bd3d20bdfef8bb02e86c2a4 .

The syntax is admittedly verbose, as builder patterns are wont to be, but here it is:

atlas.BuildEntry(map[string]string{}).MapMorphism(). SetKeySortMode(atlas.KeySortMode_RFC7049).Complete()

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/polydawn/refmt/issues/16#issuecomment-352916894, or mute the thread https://github.com/notifications/unsubscribe-auth/ABL4HIUJ3heXFS80OjqwyP19YmqSsFb-ks5tCETsgaJpZM4Q-UnM .

whyrusleeping commented 6 years ago

Basically, any time theres a map[whatever]ANYTHING i need it to use the right key sorter.

warpfork commented 6 years ago

Re map[whatever]anything, non-string whatevers actually do need a little more code to support, but, I'd like to punt that to a later issue/feature. Maps do have configurable sorting support now as of 656b0c9, and now we've got structs too as of #19 -- so I'm gonna say "PROGREZ!" and close this :)