ProtoReader blocks on network stream when at end of message

GoogleCodeExporter commented 9 years ago


What steps will reproduce the problem?
1. A project that demonstrates the issue is attached.
2. When the CLOSE_STREAM preprocessor directive is not defined the ProtoReader 
never finishes processing it's read bytes.

What is the expected output? What do you see instead?
Expected output is to parse all the fields encoded in the binary message.

What version of the product are you using? On what operating system?
v2.0.0.668 targeting .net 4.5 on Win 8.1.

Please provide any additional information below.
The problem comes from the ProtoReader.TryReadUInt32VariantWithoutMoving 
method, it tries to ensure that there are at least 10 bytes left in the message 
before trying to read from the buffer.  Unfortunately if the remainder of the 
message fits into less than 10 bytes and there is nothing else coming down the 
network stream (and it's not closed) it will block indefinitely.

Perhaps we need to call Ensure() progressively as we read each byte out of the 
buffer so as to only call it when we really need to read more data from the 
underlying stream?

Maybe when calling TryReadUInt32VariantWithoutMoving from other methods we need 
to specify the maximum number of bytes that can be inspected - so that when we 
call it from ReadString we can set it at something sensible.

On the other hand, it's just as likely i've misunderstood how I should be 
encoding the message.

Original issue reported on code.google.com by mendel.m...@gmail.com on 18 Feb 2014 at 10:44

Attachments:

ProtobufNetBug.zip

GoogleCodeExporter commented 9 years ago

I will check the sample tomorrow, but this sounds like simply the well-known 
"protobufs are not self I terminating". This was a deliberate design decision 
by Google that allows messages to be merged by appending. As such, a single 
message serialized by itself indeed doesn't end: it has to read to EOF (or a 
known length) to know when to stop. To work around this, protobuf-net provides 
the *WithLengthPrefix methods - these work correctly even when (for example) 
sending multiple separate messages down a single open socket.

If this is your scenario: I suggest switching to the *WithLengthPrefix methods 
(note: both serialize and deserialize must use the same approach, obviously)

Original comment by marc.gravell on 18 Feb 2014 at 11:33

GoogleCodeExporter commented 9 years ago

Have reviewed the code - it is *basically* the same scenario, except of course 
you aren't using Deserialize (not sure why; but... whatever). Because 
protobuf-net knows that it can only be terminated by EOF or a pre-defined 
length, it has no reason not to use that. In your case, I would *strongly* 
suggest prefixing each message with the length. For example, if you chose to 
prefix with varint, this would be (in ReadRow):

    int len = ProtoReader.DirectReadVarintInt32(stream);
    ProtoReader reader = new ProtoReader(stream, null, null, len);

However, there are other DirectRead* methods for other common layouts. When 
`reader` knows the length, it will not get stuck on "Ensure".

Original comment by marc.gravell on 19 Feb 2014 at 8:42

GoogleCodeExporter commented 9 years ago

Thanks for the reply Marc.  Btw, I'm not using Deserialize because there's no 
model object to be deserialise.  My data is stored in a table-like structure 
that stores by column and I want to ship individual updates to cells/rows.

My idea was to create a ProtoReader once and have it sit listening to a stream, 
processing any incoming data and then block again on ReadFieldHeader().  This 
is extremely similar to the DataTableSerializer example except over a 
NetworkStream.

Maybe protocol buffers is not suited to this, I'm going to have a closer look 
at the specs.

Original comment by mendel.m...@gmail.com on 19 Feb 2014 at 10:04

GoogleCodeExporter commented 9 years ago

It'll work... But it needs to know whether to work to EOF or length-bound 
rules. Making the read work in a speculative way - well, theoretically 
possible, but a pain to do efficiently, and never needed in the core 
specification

Original comment by marc.gravell on 20 Feb 2014 at 1:08

GoogleCodeExporter commented 9 years ago

In order to know the length of the message before writing it I'd have to either 
calculate the size of all the individual fields before they are written or 
write to a temporary buffer.  In the first case the protobuf encoding is less 
abstracted away from my code I'm more coupled to the protobuf-net 
implementation, whilst the second case is inefficient.

The other alternative is not to write the message elements myself but rather 
create a temporary structure which defines a cell update and then use the 
*WithLengthPrefix methods. I have to think about whether I can do this 
efficiently (i.e. avoiding unnecessary allocations).

Original comment by mendel.m...@gmail.com on 20 Feb 2014 at 7:50

GoogleCodeExporter commented 9 years ago

You could spoof it; if you write the data using a fake outer sub-message of the 
length-prefixed style, you can then consume the data at the reader - it will be 
two varints (both readable via the direct read) - the first is the field header 
(you can ignore this), the second is the length.

Original comment by marc.gravell on 20 Feb 2014 at 10:53

GoogleCodeExporter commented 9 years ago

How would I know what length to write in the fake outer sub-message? It
seems like a cart before the horse situation on the writing side.

Original comment by mendel.m...@gmail.com on 20 Feb 2014 at 11:00

GoogleCodeExporter commented 9 years ago

You don't need to; the code takes care of that. Here's an example of using a 
reader and writer in that way - note that the reader here will not over-read:

        using(var ms = new MemoryStream())
        {
            // write it
            using(var writer = new ProtoWriter(ms, null, null))
            {
                ProtoWriter.WriteFieldHeader(1, WireType.String, writer);
                var token = ProtoWriter.StartSubItem(null, writer);
                // your stuff here
                ProtoWriter.WriteFieldHeader(1, WireType.String, writer);
                ProtoWriter.WriteString("abc", writer);
                // end of your stuff here
                ProtoWriter.EndSubItem(token, writer);
            }
            // some gibberish that we can use to check that we aren't over-reading
            ms.WriteByte(0);
            ms.WriteByte(0);
            ms.WriteByte(0);

            // now read it back
            ms.Position = 0;
            int header = ProtoReader.DirectReadVarintInt32(ms),
                len = ProtoReader.DirectReadVarintInt32(ms);
            using (var reader = new ProtoReader(ms, null, null, len))
            {
                int field;
                while((field = reader.ReadFieldHeader()) > 0)
                {
                    switch(field)
                    {
                        case 1:
                            Console.WriteLine(reader.ReadString());
                            break;
                        default:
                            Console.WriteLine("unexpected: {0}, {1}",
                                field, reader.WireType);
                            reader.SkipField();
                            break;
                    }
                }
            }

Original comment by marc.gravell on 20 Feb 2014 at 11:11

GoogleCodeExporter commented 9 years ago

btw - you can confirm (after the read) with ms.Position and ms.Length that it 
has not over-read

Original comment by marc.gravell on 20 Feb 2014 at 11:13

GoogleCodeExporter commented 9 years ago

I see, so you fake it out by telling the writer you're going to write a
string and then you jam in your sub messages and protobuf-net will
automatically calculate  and write the length...

I'll try it out tonight.  Btw, thanks for the time and effort spent on
this, way more than I could have asked for.

Original comment by mendel.m...@gmail.com on 20 Feb 2014 at 11:22

GoogleCodeExporter commented 9 years ago

That worked perfectly!  I would never have thought of writing a dummy string 
header + sub-item.

Original comment by mendel.m...@gmail.com on 20 Feb 2014 at 9:03

GoogleCodeExporter commented 9 years ago

Original comment by marc.gravell on 20 Feb 2014 at 9:35

Changed state: Verified

zling2001 / protobuf-net

ProtoReader blocks on network stream when at end of message #430