kisielk / og-rek

ogórek is a Go library for encoding and decoding pickles.
MIT License
60 stars 16 forks source link

Encoder 2 #52

Closed navytux closed 6 years ago

navytux commented 6 years ago

More encode fixes wrt protocol versions + (I appologize) fixup for fuzz breakage.

This drains up the queue except for adding support for Python bytes.

On that front I have something working, but since initially I started connecting py bytes to Go []byte it showed up as a problem: in Python bytes, as strings, are immutable, and thus can be used as dict keys. In Go []byte cannot be used as map keys. And usage of bytes as map keys is not artificial.

For this reason I'm currently tending to add type ogórek.Bytes string and connect that to Python bytes. And []byte then should be mapped <-> bytearray in Python.

I will think about this a bit more, but just in case, if you have feedback on this topic, please let me know.

Thanks beforehand, again, Kirill

kisielk commented 6 years ago

Maybe @dgryski has some thoughts about the byte / string issue.

dgryski commented 6 years ago

My projects have switched to protobuf since we've replaced Python entirely.

However, from a serialization point of view, given that we want to be able to roundtrip data structures it seems reasonable to have our own Bytes type to handle the case described. It could be controlled by an encoder and/or decoder option.

navytux commented 6 years ago

@dgryski thanks for feedback. Indeed for 100% backward compatibility an option could be used (it should be only for Encoder, since Decoder currently cannot produce []byte at all nor handle incoming python bytes).

However if practically possible I would like to avoid option here, or at least make it the default for encoder to handle all kind of "bytes" properly and have an optional way to peserve current behaviour where []byte is encoded as string.

I think it should be reasonable since there should be (imho) not many ogórek users, and for open-source kind I went through all of them via "Uses" of NewEncoder on godoc:

https://sourcegraph.com/github.com/kisielk/og-rek/-/blob/encode.go#L48:6=&tab=references:external

From what I can see there none of them are passing []byte to encoder.

For the reference my environment is ZODB/Zope-based projects (https://www.erp5.com/) which has pickled data in the database and python logic part is likely going to stay. That's why I need robust data interoperability for my Go parts there.

navytux commented 6 years ago

( @kisielk, there is nothing related to bytes in this merge-request -- only uncontroversial fixes. Please consider merging. Thanks )

navytux commented 6 years ago

ping @kisielk. One of the patches here is needed for follow-up patches that fix #48. Please consider merging hereby pull-request so that I could continue and send the next bits.

Thanks beforehand, Kirill

navytux commented 6 years ago

Thanks for merging.