Open GoogleCodeExporter opened 8 years ago
I disagree with your implementation, at least for the serialization. That's not
asynchronous at all: the serialization itself blocks the current thread - even
if it's quite fast -, can allocate a potentially very large buffer, and doesn't
support streaming at all.
That said, it's a good alternative for a slow stream such as NetworkStream in a
server situation. However, if called from an UI thread, the Task.Run version is
way better. Since the best way depends on your current application, I wouldn't
advocate including this method directly inside the protobuf-net library itself.
A real asynchronous implementation would require major changes in the core
(ProtoReader/ProtoWriter classes) to be really efficient in every scenario.
Regarding Github or equivalent, I totally agree, it would be wonderful to be
able to send pull requests for protobuf-net.
Original comment by mrj...@gmail.com
on 21 May 2014 at 2:29
I disagree with your disagreement :-) Async APIs should be chunky, not chatty.
The one-async-call-per-field implementation you imagine would actually result
in disastrous performance, because there's huge overhead for every awaited
method call.
The CPU load is negligible. Nobody is sending gigabytes over the network.
People will mostly use this API to send small packets of data. The disk/network
latency is what we are optimizing here.
Memory allocations are bounded. The allocated buffer cannot be larger than the
data structure being serialized. It is therefore an acceptable use of RAM. Not
to mention that 99.9% of the time, we are talking about small messages of a few
KB each.
While my code could use some optimizations (MemoryStream has quite large
initial buffer allocation, for example), the overall chunky approach is optimal.
Original comment by robert.v...@gmail.com
on 21 May 2014 at 3:12
I don't imagine an async-call-per-field: as you said, that would be horrible
performance-wise. Buffering has to be done somewhere and the general problem is
quite hard to solve correctly. I suppose that's why there are no real async
serializers available out there.
The CPU load really depends on several factors. If the model hasn't been
compiled yet, it can be quite costly. I personally really dislike asynchronous
APIs that can still block the caller thread. Look at the old implementation of
Socket.BeginConnect which used to do DNS resolution on the caller side, then
only connecting asynchronously. If a good asynchronous API can't be written,
I'd rather only have the synchronous one and let the caller decide what's best
for him.
Regarding the allocated buffer, even with a perfect one, that's still twice the
memory. I do agree that in most cases we're only talking about a few kilobytes
though. The official protobuf page clearly states that protocol buffers aren't
a good solution for gigabytes of data.
However, let the user of the library decide that and implement buffering if
he/she needs to. Your methods are perfectly fine in some scenarios, but are a
no-go in others. If I had to use them today instead of a Task.Run call in a
project I'm working on, that would get me a nice UI freeze the first time data
is loaded. On the contrary, using Task.Run in a high-load server scenario would
just worsen performance by potentially spawning additional threads for nothing.
Now, regarding the implementation itself, you could use a buffer pool
(protobuf-net already manages one so that could be reused if it's public).
Besides, avoid calling ToArray() on the MemoryStream: it's better to reset the
stream position to zero and use CopyToAsync instead, to avoid allocating a
second buffer.
Original comment by mrj...@gmail.com
on 21 May 2014 at 4:06
It is really, really hard to do proper Async serialization / deserialization;
the "await" would need to propagate to basically every line of code (including
all generated code). Messy and inefficient. It is rightly noted that just doing
regular serialization then Async IO of buffered data gives a quite misleading
message about the Async nature of the call. I have yet to think of a great way
of handling this; I don't think this proposal is "it", sadly.
Original comment by marc.gravell
on 21 May 2014 at 4:15
Async is not supposed to guarantee non-blocking operation. Only threads can do
that. JIT compiler will break any async promise. I don't think UI developers
care about the time it takes to compile serializers. If they did, they would
have precompiled and ngen-ed their serializers.
The reason there are no async serialization libraries is that serialization
itself is CPU-bound while async was designed primarily to eliminate blocking on
disk and network. I only expect protobuf to perform I/O asynchronously, not
serialization, and I think most people expect the same. Async is not expected
to behave like a thread.
Buffering chunks of input/output as you suggest is not an option, because you
never know which field serializer will hit the end of the chunk. You then have
to make all field serializers async. I think there are only two possible
solutions: my proposal and the inefficient field-at-a-time technique.
As for application-specific choice, I think it's about reasonable defaults. 99%
of applications will be happy with one standard async implementation. The point
is to have one authoritative implementation that can be collectively improved
through fixes and optimizations. Duplicating half-baked async code in all
applications just makes protobuf harder to use and less efficient in practice.
Original comment by robert.v...@gmail.com
on 21 May 2014 at 5:10
For large data, you really don't want to buffer everything unless there's an
absolute need.
What I am *tempted* to do is to switch the flush code to a one-outstanding
write-op model; i.e. when it flushes, *instead of* calling Write, it checks
some iar token, and if that is non-null calls EndWrite; then it calls
BeginWrite and stashes the new iar (unless it completed synchronously). It
would still want to start the write via a Task.Run, because it would by
necessity involve writes on the serialize thread; I don't think you can avoid
that. But there's an interesting performance improvement to be had there by
overlapping the CPU and IO - completely unrelated to the async serialize step,
though.
The main point I'm trying to emphasize though: serialize to a buffer then async
write is *not* my preferred option here. The CPU work and IO work will be
directly proportional to each-other: any time the CPU work is trivial, frankly:
so is the IO.
Original comment by marc.gravell
on 21 May 2014 at 10:07
Network is usually about 1,000-times slower than protobuf serializer. I/O can
catch up with serializer only in the case of fast SSD disks. Network I/O can be
non-trivial even for really small messages. TCP buffers are only a few KB big.
Readers often hang on reads for minutes before setver sends new updates over
long-lived connection.
The problem with your iar approach is that your serializer has no way to stop
while I/O is in progress. Unless of course you run the serializer on dedicated
thread that can enter wait state at any moment. That's essentially identical to
running existing synchronous methods via Task.Run. You don't even need to
bother with iar if you have allocated dedicated thread already.
I could have implemented the async methods by simply calling Task.Run, but I
didn't do that, because Task.Run is very inefficient. Async methods shouldn't
need to eat into the thread pool, which is a scarce resource that is easily
depleted. .NET thread pool takes ages to allocate new threads, about one second
per thread once you exhaust reserved threads.
While you might not like the serialize-then-write approach, I am arguing all
the time that there is no other option.
Original comment by robert.v...@gmail.com
on 22 May 2014 at 2:27
I'm still not massively "sold" on that argument, but given that this is
*basically* what `SerializeWithLengthPrefix` already does, it isn't one that is
going to keep me awake at nights. The impl can be improved significantly, note,
but I can worry about that. The other "big issue" is framework targeting, but
then a .NET 4.5 specific build is probably overdue. I do **not** propose to
create a .NET 4.0 build that uses the Microsoft.Bcl.Async stubs.
Original comment by marc.gravell
on 22 May 2014 at 7:13
Thanks. I would be happy to see these methods in protobuf-net. .NET 4.0 build
is not necessary, because people using .NET 4.0 are used to workarounds WRT
await/async. I am aware the implementation can be improved. That's a part of
the reason I want it in protobuf-net.
Original comment by robert.v...@gmail.com
on 22 May 2014 at 10:50
Original issue reported on code.google.com by
robert.v...@gmail.com
on 20 May 2014 at 8:06