Closed jimblandy closed 6 years ago
Thanks for the issue you are right. I'll try to answer to most of the points raised.
Vec
).Using the maximum size to decide where to break cycles would be an improvement, but it still seems a little obscure.
Google's protoc
generates C++ that boxes every message that is contained in another message, whether optional, repeated or required. If pb-rs
were to inline all required messages, and box all optional messages, that would be a step up from protoc
, and still provide a simple relationship between protobuf types and Rust types.
Yes, repeated and Vec
go naturally together - especially since a zero-capacity Rust Vec
doesn't actually allocate any heap storage.
Fine. I am not a big fan of losing performance, in particular in proto3 files where fields are optional.
On the other hand guaranteeing a stable generated code is probably more important (questionable here too as you probably don't want to expose the internal deserialization details).
Anyway, in absence of real proof I guess following what other libraries are doing is probably wise and in general one should try to avoid cyclic messages if possible.
I have revamped (again) the break cycle algorithm, to identify all strongly connected components first then box all their optional fields.
close #119
The
pb-rs
code generator lacks a stable rule for deciding whether a field of a Rust structure representing a protobuf message should be boxed or not. Reordering the declarations in a.proto
file, or upgrading the version ofpb-rs
, could change the generated Rust types in a way that would break the code that uses them.There is an simple and stable rule that the generator could follow, instead of its present algorithm, that would let users easily anticipate where the language will box messages and where it will not.
In the current code, the
FileDescriptor::break_cycles
function traverses all the message types in the specification, looking for points where a message type might contain instances of itself, and introducingBox
types to avoid defining types that are infinite in size. However, this algorithm is pretty subtle, and introduces boxes based on where it happens to detect cycles. If the algorithm were to change, the placement of boxes might change as well.For example, on the following input:
pb-rs
generates the following struct types:But changing the order of the two
message
declarations in the.proto
file:changes the generated Rust types to:
The type of
B::g
changes fromOption<Box<A>>
toA
.However, any cycle of message inclusion in a
.proto
file must include at least one field that isrepeated
oroptional
. Otherwise, well-formed messages would need to be infinite in length. Ifpb-rs
were to always boxoptional
fields, and simply report an error whenever a cycle exists that was not broken by arepeated
oroptional
field, then users would be better able to predict Rust types, and the code generator could be simplified.Boxing optional fields seems desirable in general, simply because doing so lets the Rust values avoid spending memory on sub-messages that are absent. The size of a Rust
Option<T>
is at least that ofT
.