pinterest / elixir-thrift

A Pure Elixir Thrift Implementation
https://hexdocs.pm/thrift/
Apache License 2.0
214 stars 44 forks source link

Support schema annotations to opt-in to copy strings out of the binary #446

Open isaacsanders opened 5 years ago

isaacsanders commented 5 years ago

Hey all,

I have been working with this library for the past month or so, and it recently it occurred to me that the thrift messages may be using references to sub-binaries.

this is potentially fine for most people, but If one is process larger amounts of data and they want to copy the binaries instead, there isn't really a recourse for that currently.

I don't know if anyone who works on this cares, so if not, let me know and I might work on it myself at some point.

fishcakez commented 5 years ago

Hi @isaacsanders. Blindly copying all binaries is an anti-pattern because it causes unnecessary copying. This is especially true in newer versions of the BEAM VM where binaries can be more efficiently handled. If we want to support this it should via schema annotations so that we only copy specific binaries that are causing issues.

isaacsanders commented 5 years ago

Silently accruing refc binaries also feels like an anti-pattern that my team's use case has to always work around. We prefer to copy because our fields are almost always smaller than the refc limit.

fishcakez commented 5 years ago

For reference see the guide line comment in the docs (http://erlang.org/doc/man/binary.html#copy-1) by the OTP team. We could definitely support an annotation at any level of a schema to binary copy anything underneath that. This might be per file, per struct, per field etc.

isaacsanders commented 5 years ago

Yeah, we’ve read that. We have GB size/volume of binaries that were getting parsed by JSON libraries and not copying, which were handled by long lived processes. Most of our data is from uninterruptible streaming processes. Most of the Elixir community thinks about their performance differently than we do, in the three years we’ve been using it.

This is an advanced request. I know this is your due diligence.

fishcakez commented 5 years ago

Sure for clarity we don't want to pay the runtime overhead of always copy by default, checking an option at runtime to handle this, or generating twice the code. Once we have a new AST will try to fill do this feature as proof that AST/annotations works as we hope ;).

isaacsanders commented 5 years ago

No worries, I hope it didn’t seem I was asking for a default. :)

fishcakez commented 5 years ago

If you want to fork in the meantime, I think enforcing the copy is changing 5-10 lines in https://github.com/pinterest/elixir-thrift/blob/master/lib/thrift/generator/struct_binary_protocol.ex.