jasonjack2015 / protobuf

Automatically exported from code.google.com/p/protobuf
Other
0 stars 0 forks source link

Repeated fields do not serialize the 'size' into the file. Hence deserialize of repeated expects an unending sequence of repeated fields #689

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Create a MyStudent.proto with below contents
   -----------------------------------------------
  1 option optimize_for=SPEED;
  2 
  3 package My;
  4 
  5 message Course {
  6     required string courseName = 1;
  7     required int32 courseMarks = 2;
  8 };
  9 
 10 message Student {
 11     required string rollNo      = 1;
 12     required string studentName = 2;
 13     repeated Course Courses     = 3;
 14 };
 15 
 16 enum RequestType {
 17     ReadStudentRequest_e   = 0;
 18     WriteStudentRequest_e  = 1;
 19     DeleteStudentRequest_e = 3;
 20     SetMarksRequest_e      = 4;
 21     GetMarksRequest_e      = 5;
 22     DeleteMarksRequest_e   = 6;
 23 };
 24 
 25 enum ResponseType {
 26     RequestSuccessful_e = 0;
 27     RequestFailed_e     = 1;
 28 };
 29 
 30 message Request
 31 {
 32     required RequestType requestType = 1; // The request type
 33     optional Student student         = 2; // For the use of WriteStudentRequest_e
 34     optional string rollNo           = 3; // For the use of ReadStudentRequest_e, DeleteStudentRequest_e,
 35                                           //                SetMarksRequest_e, GetMarksRequest_e, DeleteMarksRequest_e
 36     optional string courseName       = 4; // For the use of GetMarksRequest_e, SetMarksRequest_e, DeleteMarksRequest_e
 37     optional int32 courseMarks       = 5; // For the use of SetMarksRequest_e
 38 };
 39 
 40 message Response
 41 {
 42     required RequestType requestType   = 1; // The request type
 43     required ResponseType responseType = 2; // The response type
 44     optional string responseString     = 3; // The response type
 45     optional Student student           = 4; // For the use of ReadStudentRequest_e
 46     optional string rollNo             = 5; // For the use of WriteStudentRequest_e, DeleteStudentRequest_e,
 47                                             //                SetMarksRequest_e, GetMarksRequest_e, DeleteMarksRequest_e
 48     optional string courseName         = 6; // For the use of GetMarksRequest_e, SetMarksRequest_e, DeleteMarksRequest_e
 49     optional int32 courseMarks         = 7; // For the use of GetMarksRequest_e
 50 };
   -----------------------------------------------

2. Compile it:
   -----------------------------------------------
   $ protoc --cpp_out=$PWD Student.proto
   -----------------------------------------------

3. In the file MyStudent.pb.cc, what we see is:
   -----------------------------------------------
...
 686 ::google::protobuf::uint8* Student::SerializeWithCachedSizesToArray(
 687     ::google::protobuf::uint8* target) const {
 688   // required string rollNo = 1;
 689   if (has_rollno()) {
 690     ::google::protobuf::internal::WireFormat::VerifyUTF8String(
 691       this->rollno().data(), this->rollno().length(),
 692       ::google::protobuf::internal::WireFormat::SERIALIZE);
 693     target =
 694       ::google::protobuf::internal::WireFormatLite::WriteStringToArray(
 695         1, this->rollno(), target);
 696   }
 697 
 698   // required string studentName = 2;
 699   if (has_studentname()) {
 700     ::google::protobuf::internal::WireFormat::VerifyUTF8String(
 701       this->studentname().data(), this->studentname().length(),
 702       ::google::protobuf::internal::WireFormat::SERIALIZE);
 703     target =
 704       ::google::protobuf::internal::WireFormatLite::WriteStringToArray(
 705         2, this->studentname(), target);
 706   }
 707 
 708   // repeated .My.Course Courses = 3;
 709   for (int i = 0; i < this->courses_size(); i++) {
 710     target = ::google::protobuf::internal::WireFormatLite::
 711       WriteMessageNoVirtualToArray(
 712         3, this->courses(i), target);
 713   }
 714 
 715   if (!unknown_fields().empty()) {
 716     target = ::google::protobuf::internal::WireFormat::SerializeUnknownFieldsToArray(
 717         unknown_fields(), target);
 718   }
 719   return target;
 720 }
...
   -----------------------------------------------

What is the expected output? What do you see instead?
-----------------------------------------------
Notice that in line 708 of the above code-snippet from MyStudent.pb.cc, we see 
that it writes only contents, but does not write the 'size' before writing the 
contents. This makes subsequent 'parse' unsafe since it will read infinite 
number of repeated elements (till end of file / array).
-----------------------------------------------

What version of the product are you using? On what operating system?
-----------------------------------------------
I am using the protobuf-devel version that comes with Fedora 21. Here is a 
snippet from 'yum info protobuf':
Name        : protobuf-devel
Arch        : x86_64
Version     : 2.5.0
Release     : 11.fc21
Size        : 963 k
Repo        : installed
From repo   : fedora
Summary     : Protocol Buffers C++ headers and libraries
URL         : http://code.google.com/p/protobuf/
License     : BSD
Description : This package contains Protocol Buffers compiler for all languages 
and
            : C++ headers and libraries
-----------------------------------------------

Please provide any additional information below.
-----------------------------------------------
- N/A -

Original issue reported on code.google.com by narayan....@gmail.com on 18 Feb 2015 at 3:12

GoogleCodeExporter commented 9 years ago
This is by design. Protobuf fields are serialized as <tag, value> pairs. The 
parsing routine will check the tag to decide whether a value belongs to a 
repeated field and act accordingly. There could be a potentially infinite 
number of such <tag, value> pairs in an input stream, but in practice, the size 
of a message is usually already known before parsing. For example, when you 
send a protobuf message over the wire, you'll first send the size of the data 
before serialized bytes of the message.

Original comment by xiaof...@google.com on 10 Mar 2015 at 9:10

GoogleCodeExporter commented 9 years ago
Thank you for the response. This is what I am doing - i.e., adding the size 
before sending the protobuf on the wire (file descriptor of a file or socket) - 
then serializing-to-array and parsing-from-array.

Since there are many protobuf message structures like this in my college 
project, I ended up writing a template methods as follows (where T is the type 
of the protobuf object and My::Size itself is another protobuf object with a 
single int64 field in it, named size).

The request here is : Can we do something better?

---------------------
The template function:
---------------------
namespace My
{
    ///
    /// @brief
    /// Handy template to read a protobuf message object
    /// along with its byte size from a file descriptor.
    ///
    /// @param[out] rObj
    /// Reference to the protobuf object.
    ///
    /// @param[in] iFd
    /// The file descriptor.
    ///
    template <class T> void ReadProtobufWithSizeFromFileDescriptor(T &rObj, int iFd)
    {
        My::Size objSize;

        objSize.set_size(0);

        char *pByteArray = new char[objSize.ByteSize()];

        if (!pByteArray)
        {
            ERRORprintf("Unable allocate memory.");
            MY_THROW("Allocate Error.");
        }

        int iResult = read(iFd, pByteArray, objSize.ByteSize());

        if (iResult < objSize.ByteSize())
        {
            ERRORprintf("Unable to read from file.");
            delete [] pByteArray;
            MY_THROW("File Read Error.");
        } 

        // Get the object size first.
        bool bResult = objSize.ParseFromArray(pByteArray, objSize.ByteSize());
        if (!bResult)
        {
            ERRORprintf("Unable to read from file.");
            MY_THROW("File Read Error.");
        }

        delete [] pByteArray;

        pByteArray = new char[objSize.size()]; 

        if (!pByteArray)
        {
            ERRORprintf("Unable allocate memory.");
            MY_THROW("Allocate Error.");
        }

        iResult = read(iFd, pByteArray, objSize.size());

        if (iResult < objSize.size())
        {
            ERRORprintf("Unable to read from file.");
            delete [] pByteArray;
            MY_THROW("File Read Error.");
        }

        bResult = rObj.ParseFromArray(pByteArray, objSize.size());

        if (!bResult)
        {
            ERRORprintf("Unable to read from file.");
            delete [] pByteArray;
            MY_THROW("File Read Error.");
        }

        delete [] pByteArray;
    }

    ///
    /// @brief
    /// Handy template to read a protobuf message object
    /// along with its byte size from a file descriptor.
    ///
    /// @param[out] rObj
    /// Reference to the protobuf object.
    ///
    /// @param[in] iFd
    /// The file descriptor.
    ///
    template <class T> void WriteProtobufWithSizeToFileDescriptor(T &rObj, int iFd)
    {
        My::Size objSize;

        objSize.set_size(rObj.ByteSize());

        char *pByteArray = new char[objSize.ByteSize()];

        if (!pByteArray)
        {
            ERRORprintf("Unable allocate memory.");
            MY_THROW("Allocate Error.");
        }

        // Write the object size first.
        bool bResult = objSize.SerializeToArray(pByteArray, objSize.ByteSize());

        if (!bResult)
        {
            ERRORprintf("Unable to write to file.");
            MY_THROW("File Write Error.");
        }

        int iResult = write(iFd, pByteArray, objSize.ByteSize());

        if (iResult < objSize.ByteSize())
        {
            ERRORprintf("Unable to write to file.");
            delete [] pByteArray;
            MY_THROW("File Write Error.");
        } 

        delete [] pByteArray;

        pByteArray = new char[objSize.size()]; 

        if (!pByteArray)
        {
            ERRORprintf("Unable allocate memory.");
            MY_THROW("Allocate Error.");
        }

        bResult = rObj.SerializeToArray(pByteArray, objSize.size());

        if (!bResult)
        {
            ERRORprintf("Unable to write to file.");
            delete [] pByteArray;
            MY_THROW("File Write Error.");
        }

        iResult = write(iFd, pByteArray, objSize.size());

        if (iResult < objSize.size())
        {
            ERRORprintf("Unable to write to file.");
            delete [] pByteArray;
            MY_THROW("File Write Error.");
        }

        delete [] pByteArray;
    }
}

--------------------------

Original comment by narayan....@gmail.com on 11 Mar 2015 at 1:33

GoogleCodeExporter commented 9 years ago
It sounds like you're looking for the Message.writeDelimitedTo and 
Message.parseDelimitedFrom functions. Those are available in Java, but not in 
C++ (see issue 472).

Original comment by dwahler@gmail.com on 11 Mar 2015 at 11:43

GoogleCodeExporter commented 9 years ago
Looks likeit.

Original comment by narayan....@gmail.com on 12 Mar 2015 at 1:18