Size fields could use a more optimal data type than size_t

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?

1. Define a small bytes array:

message Foo
{
    optional bytes bar  = 1 [(nanopb).max_size = 2];
}

2. Observe the resulting C header definition:

typedef struct {
    size_t size;
    uint8_t bytes[2];
} mypackage_Foo_bar_t;

What is the expected output? What do you see instead?

This certainly works, but is not space efficient.  In cases where the bytes 
array is size smaller than 255 it would be better to use a uint8_t for the 
size.  Likewise, a uint16_t would be better for arrays sized between 256 and 
65535.  

What version of the product are you using? On what operating system?

Nanopb 0.1.7, protobuf 2.5.0, linux (but the generated code runs on both linux 
and an ARM Cortex M4 processor).

Please provide any additional information below.

Original issue reported on code.google.com by znfre...@gmail.com on 14 Aug 2013 at 4:21

GoogleCodeExporter commented 9 years ago

It might make sense to introduce a pb_size_t typedef, which would be controlled 
by the PB_FIELD_16BIT etc. settings. However, choosing the datatype per field 
would cause problems internally (all code that manipulates it would need two 
paths).

Original comment by Petteri.Aimonen on 14 Aug 2013 at 4:45

Changed state: Accepted
Added labels: Component-Decoder, Component-Encoder, Priority-Low, Type-Enhancement
Removed labels: Priority-Medium, Type-Defect

GoogleCodeExporter commented 9 years ago

Sure, controlling it by a define is 100% workable.  Thanks!

Original comment by znfre...@gmail.com on 14 Aug 2013 at 4:52

GoogleCodeExporter commented 9 years ago

Hmm, I'm still having very mixed feelings about this one.

There are basically three separate fields that could be made smaller:
1) The 'size' in bytes fields
2) The 'count' for array fields
3) The 'has' for optional fields

I guess it would make sense to make all of them as small as possible. When 
using non-packed structures, there won't be much space savings though. Also I 
feel it is somewhat dirtier to have things cluttered with 'pb_size_t' and 
'pb_bool8_t', instead of the standard size_t and bool.

It would be great if there was some realworld data on how much space savings 
this will give in realistic use cases.

Anyway, this is difficult to do without breaking backward compatibility, so I 
will reconsider it when time is due for nanopb-0.3.0.

Original comment by Petteri.Aimonen on 20 Oct 2013 at 7:59

Changed state: Tentative
Added labels: Milestone-0.3

GoogleCodeExporter commented 9 years ago

The main real-world problem comes when you have nested repeated fields and 
packed structures.  Here is a better example that is not far off from my actual 
product's usage (typically there are a few scalar fields scattered throughout):

message Foo
{
    optional bytes bar  = 1 [(nanopb).max_size = 2];
}

message Bar
{
    repeated Foo x = 1 [(nanopb).max_size = 50];
}

Today this generates a Bar struct that takes up I think 404 bytes.  My 
suggested optimization along with packing would be I think 151 bytes.  The 
over-the-wire size of the message will always be smaller than the nanopb struct 
representation, but it would be good to close the gap a bit.

Original comment by znfre...@gmail.com on 21 Oct 2013 at 4:30

GoogleCodeExporter commented 9 years ago

Hello all

I have the same problem. Using a small "bytes" data structure as repeated field 
within another message leads to a quite memory-intensive C representation.

enum TokenType {
    RFID = 1;
    CARD = 2;
    QRCODE = 3;
}

message Token {
    required TokenType type = 1;
    optional bytes rfid = 2 [(nanopb).max_size = 8];
}

message AddWhiteListCmd {
    repeated Token tokens = 1 [(nanopb).max_count = 20];
}

My idea is to generate the data type of the size and count fields depending on 
the max_size and max_count specifiers. For a quick test I've altered the 
nanobp-0.2.3 nanopb_generator.py this way:

class Field:
...
    def types(self):
        '''Return definitions for any special types this field might need.'''
        if self.pbtype == 'BYTES' and self.allocation == 'STATIC':
            result = 'typedef struct {\n'
            '''or140704: type of size field depends on max_size'''
            if self.max_size < 256:
                result += '    uint8_t size;\n'
            else:
                result += '    size_t size;\n'
            result += '    uint8_t bytes[%d];\n' % self.max_size
            result += '} %s;\n' % self.ctype
        else:
            result = None
        return result

The same for the repeated fields.
Is this feasible?

Original comment by gugla...@gmail.com on 9 Jul 2014 at 5:11

GoogleCodeExporter commented 9 years ago

Possibly, but the information about the data type of the size field has to be 
passed on to the C side somehow.

The max_size of the field cannot be used to determine it on the C side, as 
compiler padding can change the size. Instead, there would have to be some 
separate flag.

Original comment by Petteri.Aimonen on 9 Jul 2014 at 7:43

GoogleCodeExporter commented 9 years ago

I don't understand your doubts completely, but I've no knowledge about the
encoding/decoding mechanism of GPB messages.

As I understand the size field is just a helper to represent "bytes" fields
on the C side. So the type of the size field shouldn't be of interest for
the other (Jave, .NET, etc.) side. A Jave/.NET bytes array contains the
size information itself.
And the C side always has to cope with senders sending more bytes than
specified in the max_size field. So the type of the size field can't be a
problem.
Encoding should work too.

2014-07-09 9:43 GMT+02:00 <nanopb@googlecode.com>:

Original comment by gugla...@gmail.com on 9 Jul 2014 at 12:16

GoogleCodeExporter commented 9 years ago

The C side has to write to the size/bytes fields in the structure. In order to 
do this, it must know the data type in memory. Currently the datatype is the 
same for every field, so it is hardcoded in the source (see 
https://code.google.com/p/nanopb/source/browse/pb_encode.c#617 and 
https://code.google.com/p/nanopb/source/browse/pb.h#240 )

In order to work with varying data type, there would have to be two cases in 
the code also.

Original comment by Petteri.Aimonen on 9 Jul 2014 at 1:10

GoogleCodeExporter commented 9 years ago

Ok, now I understand the problem. But again, it would be very helpful if
you could provide a less memory-hungry implementation for "bytes" fields.
Embedded systems are always short of memory. So GPB applications running on
embedded systems won't use big-sized arrays or very frequently repeated
fields.
A user-definable data type for the size and count fields, as you have
considered some time ago, would relax my current memory situation.

Thank you for the discussion.

2014-07-09 15:10 GMT+02:00 <nanopb@googlecode.com>:

Original comment by gugla...@gmail.com on 10 Jul 2014 at 6:38

GoogleCodeExporter commented 9 years ago

I think changing the type to pb_size_t in nanopb-0.3.0 is reasonable.

pb_size_t will be 8, 16 or 32 bit depending on the compile time #defines.

Original comment by Petteri.Aimonen on 4 Aug 2014 at 4:24

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

This issue was updated by revision 1dd9f1900fca.

Original comment by Petteri.Aimonen on 18 Aug 2014 at 5:11

Changed state: FixedInGit

GoogleCodeExporter commented 9 years ago

Fix released in nanopb 0.3.0

Original comment by Petteri.Aimonen on 26 Aug 2014 at 3:28

Changed state: FixReleased

iamandi / nanopb

Size fields could use a more optimal data type than size_t #82