#to_protobuf return Bytes

didactic-drunk commented 5 years ago

The default #to_protobuf method introduces subtle bugs.

test = Test.new
test.lots_of_data = ...

test2 = Test.from_protobuf test2.to_protobuf
# test2 is empty

#rewind on the returned IO::Memory is always necessary to make use of the value and it's easy to forget potentially causing data loss.

test = Test.new
test.lots_of_data = ...
mio = test.to_protobuf
bytesize = mio.bytesize
# Write frame header.
fileio.write_bytes bytesize
# Copies nothing.
mio.copy_to fileio

When combining this with HMAC or compression IO's where you may pass over the data more than once you need to rewind each time otherwise you read 0 bytes. Each pass is a possible bug.

Most serializers #to_* methods return String or Bytes. It's also the safer alternative.

Would a PR be accepted to change the default type to something less error prone and more in line with crystal norms?

jeromegn commented 5 years ago

I'd be happy to consider such a PR. This is very old stuff and things have changed a lot in the Crystal ecosystem.

I think keeping a method that returns the IO would be good though (like to_io or something).

jgaskins commented 5 years ago

I 100% understand why #to_protobuf was chosen as a name but I believe it's actually a misnomer. It's already a Protobuf. Conventions exist for converting objects to slices and strings with #to_slice and #to_s(io).

I don't know if anyone's using this shard in production but, if so, it might make sense to add @Deprecated to #to_protobuf instead of changing or removing it.

didactic-drunk commented 5 years ago

#to_json, #to_yaml, #to_msgpack. All of them return String or Bytes. I have multiple cases where I need to make more than one pass at the returned object. I can't use to_protobuf(IO)

#to_slice wouldn't let you choose between different serialization formats.

Example:

# Public API uses JSON.
test = Test.from_json network_socket
# Internal API is passed and parsed multiple times using protobuf for speed.
test.to_protobuf message_bus

jgaskins commented 5 years ago

#to_json, #to_yaml, #to_msgpack. All of them return String or Bytes.

The use case for those methods is to serialize arbitrary objects into those formats. The use case for protobufs is to serialize objects whose entire purpose for existing is the serialization format. A Protobuf message isn't a domain model.

The equivalent use case to those methods would be something like this:

# An example domain model
class Product
  # This mixin would provide methods to deserialize and serialize the domain model
  # via the specified `Protobuf::Message` type.
  include Protobuf::Serializable(MyProtobufNamespace::Product)
end

Then you'd call Product#to_protobuf and that would make perfect sense, but even then, the that leaves the question, should the return value there be a MyProtobufNamespace::Product or the over-the-wire serialized bytes?

didactic-drunk commented 5 years ago

Converting between JSON and protobuf seems common enough. I don't see a problem sharing the same serialized class or struct. See #26 for an example.

jgaskins commented 5 years ago

We shouldn't encourage messages to share code with anything else because they're typically generated from .proto files, not written directly in the target language.

Either way, there's no confusion about what Protobuf::Message#to_s(io) and Protobuf::Message#to_slice would do based on their names. They're more accurate and expressive than Protobuf::Message#to_protobuf. I also think changing return values of existing methods to incompatible types should be done sparingly as not to break existing code.

didactic-drunk commented 5 years ago

I think I agree. @jeromegn thoughts?

jeromegn commented 5 years ago

My Crystal is very far away in my mind. I've mostly been writing Rust for over a year!

I will rely on the 2 of you. If you agree, sounds good enough for me. From reading the discussion, I think I agree too!

jeromegn / protobuf.cr

#to_protobuf return Bytes #29