ugorji / go

idiomatic codec and rpc lib for msgpack, cbor, json, etc. msgpack.org[Go]
MIT License
1.86k stars 295 forks source link

omitempty handling for arrays disagrees with the standard encoding/json #386

Closed dop251 closed 1 year ago

dop251 commented 1 year ago

As of this commit: https://github.com/ugorji/go/commit/ebaaab4fb661237a0e4f376a751ae0eb26445b3d arrays with zero values ar treated as empty. Whether this is correct or not is up for debate (see https://github.com/golang/go/issues/29310), however it makes the current behaviour inconsistent with the standard json.Marshal() which makes it more difficult to use ugorji as a replacement.

I personally would argue that array of zero values is not empty. It's true that for Go arrays there is no distinction (meaning a Go array cannot be empty), however when doing un-typed unmarshalling the difference is between an undefined/missing value and an array of zeros.

ugorji commented 1 year ago

@dop251

Thanks so much for this issue.

It's a challenge, and my answer is a bit sad.

The intent for omitempty should have been simple to reason:

omitempty means we omit an empty value. Go has very good definition of an empty value that leads itself to easy optimization and reasoning. It's the native zero value of a type (basically the value represented as a contiguous sequence of zero bytes in memory), custom zero value of a type (e.g. time.Time with an IsZero method), or a sequence/mapping of length 0 (slice, string, map, chan).

unfortunately, encoding/json has been "frozen" since before go 1 was released, and thus, bug-for-bug compatibility has to be maintained unless the documentation is changed. It's telling that these issues keep coming up where not-so-great workarounds of omitzero, omitgonul, omitemptyarray, etc have continued to be raised.

Note that the struct tags mostly make sense for go's use, and the developer has the option to place or remove the 'omitempty' from the struct tag.

Finally, go/codec library was designed to try to replicate most of the same usage model and functionality as encoding/json, but not to be limited by conservative decisions in the library. See https://pkg.go.dev/github.com/ugorji/go/codec#JsonHandle for different features we support above and beyond encoding/json.

Our goal is that omitempty does what makes sense for most people. The test for us is that a bi-directional encode/decode sequence works as is i.e. take A of type T, encode into a stream of bytes B, decode B into a zero value of type T called C, compare A and C and they should be equal.

When we omit a zero array value or write it out (e.g. in json as [] or as null), we will always be able to read it back as a zero value. For users of this library, this is completing a gap that always existed there.

In closing, This request was made by folks doing Algorand, where they needed to omit zero'ed values of a [16]byte array. omitempty surprisingly missed this, and the size reduction they expected was not materializing. I empathize with their need, and it was time.

I wish I could comfortably have a different answer, but I can't in good faith.

dop251 commented 1 year ago

Thanks for the response. It's not really such a big deal, but I think it should be highlighted in the documentation (unless it already is and I'm missing it).

Or maybe it's worth considering making it an option, in the end an empty value is not the same as a zero value.