proto2 pblite vs object boolean serialization

GoogleCodeExporter commented 9 years ago

goog.proto2.PbLiteSerializer serializes booleans in numeric form (i.e. 1 for 
true, 0 for false) [1].

goog.proto2.ObjectSerializer serializes as javascript booleans (i.e. true, 
false) [2].

Using the numeric form saves bytes during network transmission. For consistency 
would you be willing to use the numeric form in ObjectSerializer [3] as well?

[1] See getSerializedValue and getDeserializedValue in pbliteserializer.js

http://closure-library.googlecode.com/svn/trunk/closure/goog/proto2/pbliteserial
izer.js

[2] See getSerializedValue and getDeserializedValue in serializer.js. 
ObjectSerializer uses the default implementation from goog.proto2.Serializer.

http://closure-library.googlecode.com/svn/trunk/closure/goog/proto2/serializer.j
s

[3] code review: http://codereview.appspot.com/5024045/

Original issue reported on code.google.com by ahochh...@samegoal.com on 15 Sep 2011 at 2:05

GoogleCodeExporter commented 9 years ago

For reference, please see this cld thread:

http://groups.google.com/group/closure-library-discuss/browse_thread/thread/3e9b
b70bbd5889fd

Original comment by ahochh...@samegoal.com on 15 Sep 2011 at 5:50

GoogleCodeExporter commented 9 years ago

Original comment by pall...@google.com on 13 Oct 2011 at 8:04

Added labels: Type-Enhancement
Removed labels: Type-Defect

GoogleCodeExporter commented 9 years ago

Peter, you mentioned you wanted to fix this in the thread. Are you still 
interested in fixing it?

Original comment by chrishe...@google.com on 9 May 2012 at 11:10

GoogleCodeExporter commented 9 years ago

I'm a bit worried about fragmentation. Changing the JS implementation alone is 
not enough: the Python, C++ and Java parsers also have to updated to understand 
0 and 1.
Saving 3 or 4 bytes on the wire doesn't compensate the hassle.

If you need more compact wire format, PbLiteSerializer is the best solution.

Original comment by pall...@google.com on 10 May 2012 at 9:13

Added labels: Priority-Low
Removed labels: Priority-Medium

GoogleCodeExporter commented 9 years ago

Hi Peter, Thanks for your comments.

I agree that changing the JS alone is not enough (for most use cases). If 
Google were to open source their plugins/customized protoc I would happily 
provide patches for Py/C++/Java. Without access to that code, I would add 
support to my open source C++ (de)serialization plugins 
(http://code.google.com/p/protobuf-plugin-closure/).

One way to not fragment the formats in the long term:
   1) Update all of the deserializers (JS/Py/C++/Java) to support both boolean encoding formats (on the fly -- without application level config necessary)
   2) Wait N months until all code using ObjectSerializer to deserialize in all languages has been re-compiled and deployed
   3) Update all of the serializers (JS/Py/C++/Java) to use the numeric boolean encoding
   4) Wait N months until all code using ObjectSerializer to serialize in all languages has been re-compiled and deployed
   5) Update all of the deserializers to only understand the numeric boolean encoding

Steps 4 and 5 are optional depending on how clean a deployment is desired. 
Alternatively, the change could be done in a single step by updating any 
serializers to add an off-by-default option to use the numeric encoding and 
update any serializes to understand both formats. Then individual projects 
could opt-in to the numeric boolean encoding as they saw fit (once they knew 
that all components of their system supported the numeric format). If you are 
willing to consider any of these deployment plans, I will update this change 
per your direction.

In terms of using ObjectSerializer vs. PbLiteSerializer, I think the most 
important factor to consider is if your messages contain a larger number of 
sparsely populated fields. For example, {100:true} is more efficient than [,,,, 
... ,,,,1]. However, the cost savings of numeric boolean encoding could apply 
to both formats and depending on your use case could be a material savings. 
Granted, this is a contrived case, but I think the core of the argument still 
holds. Additionally, applications can always work around this limitation by 
using an int encoding where they really want booleans, but that really isn't 
ideal either.

From my perspective, it seems like a series of decisions that add 3-4 bytes per 
field could stack up to have a material impact (at least for some use cases). 
(Given how PbLiteSerializer is written I think someone else at Google might 
agree with this viewpoint.) Since this is the library level, it is a chance to 
get that savings for all applications without them needing to worry about the 
wire encoding.

At the end of the day, I can always just subclass ObjectSerializer and add 
support to my (de)serializtion plugin so this isn't a show stopper for me. Feel 
free to mark as "Won't Fix" if the cost does not justify the deployment expense 
in your opinion.

Thanks!
-Andy

Original comment by ahochh...@samegoal.com on 10 May 2012 at 4:46

hugg95 / closure-library

proto2 pblite vs object boolean serialization #370