Closed SimonSapin closed 4 years ago
@SimonSapin I've created a patch (pyfisch/keyboard-types#5) that uses FromStr
instead of derive to implement deserialization. Can you check if the code size is reduced by this?
This doesn’t seem ideal in terms of run-time cost
This doesn’t seem ideal in terms of run-time cost
Which additional runtime cost do you foresee?
@dtolnay suggested a way for both deserialization and serialization to avoid allocating another string.
FromStr
likely searches linearly for the matching string (I didn't check the asm) but this can be improved by applying binary search.
Probably the string comparisons and the impact on serialized size. Bincode would otherwise serialize only a small integer for the unit variants.
Right, that’s what I meant.
@dtolnay The offending function is a method of the Visitor
trait, on a type which (looking at the code generated by derive with --pretty=expanded
) is passed to https://docs.rs/serde/1.0.111/serde/trait.Deserializer.html#tymethod.deserialize_enum which is documented as a "hint". Is it possible to give fewer hints to serde and have the serialization still based on integers? (Even if slightly less efficient)
I opened https://github.com/pyfisch/keyboard-types/pull/6 to cut the code size without affecting behavior.
The numbers in that PR look great, thanks!
I was wondering if the 47k lines from serde_derive
were due to some quadratic behavior (which could potentially be improved), so I used Python to generate Rust code for 10 enum variants per line to easily comment out. I tried a few numbers of variants between 10 and 500, and the number of LLVM lines turned out perfectly linear.
print(''.join(' %s,\n' % ', '.join('V%s' % (i+j*10) for i in range(10)) for j in range(50)))
In a minimal bincode program based on the one in https://github.com/pyfisch/keyboard-types/pull/6 I also noticed that __Visitor::visit_enum
has four copies/monomorphizations reported by cargo llvm-lines
, where I would have expected one since we’re only using one (bincode) deserializer. Are multiple copies necessary?
I would also expect only one copy. It's worth looking in bincode for where those instantiations happen and hopefully fixing it.
It looks like bincode 1.3 fixed this and only generates one copy: https://github.com/servo/servo/pull/27063. Possibly in https://github.com/servo/bincode/pull/310, which I learned about via https://dos.cafe/blog/bincode-1.3.html.
visit_enum
for Key
still has the same size, but with one copy instead of four it goes from the 7th largest contributor of LLVM lines in Servo’s script
crate to the 56th or so.
At this point I’m happy to close this issue. Maybe serde_derive
could generate code closer to that in https://github.com/pyfisch/keyboard-types/pull/6 (is there a trade-off?) but to some extent being monomorphization-heavy and generating non-trivial code size is inherent to serde’s fundamental design, compared to something like miniserde.
One of the top functions in the output of
cargo llvm-lines
for Servo’sscript
crate (https://github.com/servo/servo/issues/26713) is this:That function code is generated by
serde_derive
for this enum: https://docs.rs/keyboard-types/0.5.0/keyboard_types/enum.Key.htmlThis enum has 286 variants so some code size is expected, but 47k lines still seems like a lot. Is there something we can do to reduce it?
The enum has one
Character(String)
variant, and all others variants are unit variants (without fields). This is only a random guess but would the generated code be simpler/smaller if the enum only had unit variants? (With theString
case moved to a separate type)CC @pyfisch, author of the
keyboard-types
crate