nccgroup / blackboxprotobuf

Blackbox Protobuf is a set of tools for working with encoded Protocol Buffers (protobuf) without the matching protobuf definition.
MIT License
516 stars 86 forks source link

Does BBPB support repeated type data? #32

Closed waconde closed 7 months ago

waconde commented 7 months ago

Hi, Thank you very much for developing such a great tool as BBPB, but I'm having some issues at the moment. Can you help me?

I'm using the latest version of the BBPB library, but today found that it doesn't seem to parse data of type repeated correctly.

I have a protobuf data like this:

1 {
  1: [
    999,
    888,
    -777
  ]
}

Then I want to generate a proto file for it, which might look like this:

message TestClass {
  message field1_type {
    repeated int64 field1 = 1;
  }
  field1_type field1 = 1;
}

But in practice, when I call BBPB's decode_message() method, the resulting typedef return value looks like this:

{'1': {'type': 'message', 'message_typedef': {'1': {'type': 'int'}}, 'field_order': ['1', '1', '1']}}

After I wrap the typedef around a layer of a TestClass class name, the typedef looks like this:

{'TestClass': {'1': {'type': 'message', 'message_typedef': {'1': {'type': 'int'}}, 'field_order': ['1', '1', '1']}}}

The proto file generated by calling the export_proto() method with that typedef looks like this:

message TestClass {
  message field1_type {
    int64 field1 = 1;
  }
  field1_type field1 = 1;
}

Apparently the repeated modifier is missing.

Is this a bug? or am I using it in the wrong way?

rwinkelmaier-ncc commented 7 months ago

Hi!

It looks like the decoder is not tagging the field with seen_repeated: true during decoding. The library should still be able to decode and re-encode messages with repeated fields. It's just the .proto export that will care about the seen_repeated. The seen_repeated should be getting populated though, and should also be used to force certain field values into arrays, so I'll work on getting that fixed.

On a protocol level, protobuf doesn't actually distinguish between repeated and non-repeated fields unless they are packed, so bbpb doesn't usually care whether a field should be repeated or not. Messages are encoded with <tag><value><tag><value>... and if bbpb sees the same tag twice, then it store the values as an array. Similarly, when it's encoding, if the argument is an array bbpb will encode each element of the array with it's own tag. field: 1 and field: [1] are identical once encoded. So the best we can do is try to remember if we've ever seen a tag more than once for that field.

The .proto file support is still pretty rough and there are probably a few edge cases it doesn't handle well, but this should be a case it's able to handle, so I'll work on fixing that.

waconde commented 7 months ago

Hi!

It looks like the decoder is not tagging the field with seen_repeated: true during decoding. The library should still be able to decode and re-encode messages with repeated fields. It's just the .proto export that will care about the seen_repeated. The seen_repeated should be getting populated though, and should also be used to force certain field values into arrays, so I'll work on getting that fixed.

On a protocol level, protobuf doesn't actually distinguish between repeated and non-repeated fields unless they are packed, so bbpb doesn't usually care whether a field should be repeated or not. Messages are encoded with <tag><value><tag><value>... and if bbpb sees the same tag twice, then it store the values as an array. Similarly, when it's encoding, if the argument is an array bbpb will encode each element of the array with it's own tag. field: 1 and field: [1] are identical once encoded. So the best we can do is try to remember if we've ever seen a tag more than once for that field.

The .proto file support is still pretty rough and there are probably a few edge cases it doesn't handle well, but this should be a case it's able to handle, so I'll work on fixing that.

Oh, Yes, You're absolutely right! OK, now I know what's going on, I guess I need to manually add the repeated modifier to the recurring values before fixing it.

rwinkelmaier-ncc commented 7 months ago

Hi, I just pushed version 1.1.1 to github and pypi which I think should handle the seen_repeated flag better, and therefore should add the repeated modifier if it gets multiple values. Note that it won't be able to detect repeated if the field is repeated but has only one element, such as [1].

Let me know if that does or doesn't work for you. Thanks!

waconde commented 7 months ago

Hi, I just pushed version 1.1.1 to github and pypi which I think should handle the seen_repeated flag better, and therefore should add the repeated modifier if it gets multiple values. Note that it won't be able to detect repeated if the field is repeated but has only one element, such as [1].

Let me know if that does or doesn't work for you. Thanks!

It is very effective! You can see that it does a good job of adding a 'repeated' modifier for simple or complex types, and I also see a 'seen_repeated: True' added to the 'message_type' property of the corresponding field in the return value 'typedef' of 'decode_message()'. Thank you very much! image