wang-xinhong / protobuf

Automatically exported from code.google.com/p/protobuf
Other
0 stars 0 forks source link

Skip(count) fails when count offset >2^32 count type should be off_t, not int #302

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Pick a file larger than 2,150,000,000 bytes (~2GB), initialize a file 
descriptor fd

2.  google::protobuf::io::ZeroCopyInputStream *input_stream = new 
google::protobuf::io::FileInputStream(fd);

3.  off_t offset=2150000000; (offset does not fit in 32 bits)

4.  input_streadm->Skip(offset); will yield:

libprotobuf FATAL google/protobuf/io/zero_copy_stream_impl_lite.cc:268] CHECK 
failed: (count) >= (0): 

What is the expected output? What do you see instead?

The Skip method should position the stream at the specified offset. Instead an 
assertion is raised when the compiler casts the off_t offset to an int (which 
becomes negative). 

What version of the product are you using? On what operating system?

protobuff 2.3.0, on linux, compiling with gcc (GCC) 4.1.2 20080704 (Red Hat 
4.1.2-46)
The latest version of protobuff is expected to have the same problem since the 
Skip method signature appears unchanged in source control.

Suggested resolution: define the Skip method signature with the off_t type, 
rather than the 32 bit int type. 

Original issue reported on code.google.com by fac2...@gmail.com on 5 Jun 2011 at 6:07

GoogleCodeExporter commented 9 years ago
Here's a patch that fixes the issue. The patch was written by Kevin C. Dorff 
and myself and fixes the issue against protobuf 2.4.1. Please incorporate to 
the head branch. We distribute several tools that are affected by this issue 
and while we can distribute the patch, it would be easier if the official 
protobuf release had the fix. 

Also note that the Java code generated by protobuf supports takes a long as 
argument to the corresponding skip method, so the C++ version should really 
match.

 public long skip(long l) throws java.io.IOException 

Original comment by fac2...@gmail.com on 6 Jun 2011 at 7:17

Attachments:

GoogleCodeExporter commented 9 years ago
Protobuf is not designed to handle large messages. We didn't consider messages 
larger than 2GB from the very beginning. Could you please try to break the 
files into smaller ones?

If we really need to change the count type, we need to update all functions 
dealing with byte-size/offset, not only the Skip function.

Original comment by liujisi@google.com on 5 Jul 2011 at 6:06

GoogleCodeExporter commented 9 years ago
The issue is not about large messages, but about large files. We use 
collections of small messages stored in large files (see 
http://campagnelab.org/software/goby/tutorials/developing-with-goby/). Please 
reconsider the issue considering that:

1. the current C++ API is in effect incompatible with the Java API, that has 
always supported skipping 64 bits offsets in large files. 
2. the issue is not about supporting large messages.
3. we have provided a patch that solves the problem (see post of June 6 2011). 
As can be shown in the patch, the issue is limited to changing a few signatures 
to prevent the compiler from rounding a 64 bit integer to a 32bit one.

Thanks.

Original comment by fac2...@gmail.com on 5 Jul 2011 at 12:46

GoogleCodeExporter commented 9 years ago
Well, for the java version, I think you mean the LimitedInputStream. That class 
has to use long to override its super class java.io.FilterInputStream;

If you take a look at the c.g.protobuf.CodedInputStream, it uses int32 for size 
type.
public void skipRawBytes(final int size) throws IOException

We cannot simply change the Skip() to accept int64 because:
1> In protobuf, we use "int" to describe buffer size, e.g. the constructor of 
CodedInputStream, PushLimit(), GetDirectBufferPointer(), etc. We cannot change 
Skip() to accept int64, while leaving others still expecting int32. Users may 
be confused by the inconsistent size type. Even if we change Skip() to int64, 
the rest of code will still assumes the underlying buffer size is within int32, 
which can cause potential problems.
2> ZeroCopyStream interface uses int32. Changing the interface function 
signature means we need to update all the implementation classes. We could do a 
large code refactoring inside google, but we cannot change code outside the 
company. Changing the interface may break existing opensource protobuf users. I 
believe there are many clients who implement ZeroCopyStream for their IO layer.

For your case, I think there are many workarounds. e.g. ensuring the chunk 
doesn't exceed int32max, and use ZeroCopyStream to operate on each chunk. Or 
maybe implement your own ZeroCopyStream, which ports large int operation to 
protobuf API.

Original comment by liujisi@google.com on 5 Jul 2011 at 2:00