Closed tinix0 closed 1 year ago
Thank you for the bug report. This sounds like a potentially serious bug. Would you mind providing the reproducible example?
Yes a repro would be very helpful here. I was not able to reproduce it:
// test.proto
syntax = "proto2";
enum E {
ONE = 1;
THIRTY_ONE = 31;
THIRTY_TWO = 32;
ONE_HUNDRED = 100;
}
message M {
optional E e = 1;
}
# test.py
from test_pb2 import M, E
import unittest
class TestEnumValueSerialize(unittest.TestCase):
def do_test_val(self, val):
m1 = M(e = val)
serialized = m1.SerializeToString()
m2 = M()
m2.ParseFromString(serialized)
self.assertEqual(val, m2.e)
def test_enum_values(self):
self.do_test_val(E.ONE)
self.do_test_val(E.THIRTY_ONE)
self.do_test_val(E.THIRTY_TWO)
self.do_test_val(E.ONE_HUNDRED)
if __name__ == '__main__':
unittest.main()
Output:
$ venv/bin/python test.py --verbose
test_enum_values (__main__.TestEnumValueSerialize) ... ok
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
Hello, here is the repro with output. Note that I have also seen the issue with holes in the numeric ranges, but the total number of values was still larger than 31.
// test.proto
syntax = "proto2";
package Test;
message Person {
enum PhoneType {
blaa = 0;
HOME = 1;
WORK = 2;
a = 3;
b = 4;
c = 5;
d = 6;
e = 7;
f = 8;
g = 9;
h = 10;
i = 11;
j = 12;
k = 13;
l = 14;
m = 15;
n = 16;
o = 17;
p = 18;
q = 19;
r = 20;
s = 21;
t = 22;
u = 23;
v = 24;
w = 25;
x = 26;
y = 27;
z = 28;
aa = 29;
ab = 30;
ac = 31;
ad = 32;
ae = 33;
}
optional PhoneType TaskType = 1;
optional string Name = 2;
}
# main.py
import test_pb2
person = test_pb2.Person()
person.TaskType = 32
person.Name = "Name1"
serialized = person.SerializeToString()
personDe = test_pb2.Person()
print("a", person)
print("---")
personDe.ParseFromString(serialized)
print("b", personDe)
print("---")
person2 = test_pb2.Person()
person2.TaskType = 31
person2.Name = "Name2"
serialized2 = person2.SerializeToString()
personDe2 = test_pb2.Person()
print("a", person2)
print("---")
personDe2.ParseFromString(serialized2)
print("b", personDe2)
Ouput:
a TaskType: ad
Name: "Name1"
b Name: "Name1"
a TaskType: ac
Name: "Name2"
b TaskType: ac
Name: "Name2"
This must be a Windows-specific bug. When I try your repro on Linux, I get:
a TaskType: ad
Name: "Name1"
---
b TaskType: ad
Name: "Name1"
---
a TaskType: ac
Name: "Name2"
---
b TaskType: ac
Name: "Name2"
I suspect this is related to long
being only 32-bits on Windows. I will investigate further.
I also cannot reproduce the bug on (64-bit) Linux.
I can also confirm that I cannot reproduce it inside WSL. It does indeed seem to be Windows specific (both machines I tried this on were either Windows 11 or Windows 10).
Can you print out len(serialized)
? I suspect it will be non-zero, but want to verify.
(I don't have quick access to a Windows machine to verify).
len(serialized): 9 len(serialized2): 9 serialized in base64: 'CCASBU5hbWUx' serialized2 in base64: 'CB8SBU5hbWUy'
Great, thanks. It appears to be serializing correctly. When we parse it though, the decoder thinks the value (32) is not in the enum for some reason. In proto2, when you parse an unrecognized enum value, it goes into the unknown field set, which is what appears to be happening here.
The only question now is why the decoder thinks that 32 is an unknown value for this enum.
This is the code performing the check. The bug is likely here or nearby, but I can't see it: https://github.com/protocolbuffers/upb/blob/f45eeec625a1b5b4514683aba4b38d1864f0e211/upb/decode.c#L410-L440
In particular, this check looks correct to me. We cast 1ULL
to unsigned long long
, so it should be 64-bit even on Windows, which has 32-bit longs: https://github.com/protocolbuffers/upb/blob/f45eeec625a1b5b4514683aba4b38d1864f0e211/upb/decode.c#L437
Another helpful thing I can point out is that this is an regression introduced in 4.21.x. Protobuf 3.20.3 deserializes the message correctly.
Another helpful thing I can point out is that this is an regression introduced in 4.21.x.
That makes perfect sense, as 4.21.x introduced a completely rewritten native extension. More info here: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
I would be mildly surprised if the bug turned out to be in decode.c since that code has not changed much at all. I'm wondering whether this may be related to the mini descriptors.
It's true that decode.c has not changed much, but we also have much less usage on Windows than other platforms.
Keep in mind that we're releasing from the 21.x branch, which has not gotten any of the recent changes around MiniDescriptors: https://github.com/protocolbuffers/upb/tree/21.x/upb
I have a root cause and a fix for the bug. The problem was in the UPB_LIKELY()
macro, which uses __builtin_expect()
. The arguments to __buitin_expect()
are long
, so on Windows this truncates the arguments to 32 bits.
The fix is to cast the argument to bool
:
--- a/upb/port_def.inc
+++ b/upb/port_def.inc
@@ -96,8 +96,8 @@
/* Hints to the compiler about likely/unlikely branches. */
#if defined (__GNUC__) || defined(__clang__)
-#define UPB_LIKELY(x) __builtin_expect((x),1)
-#define UPB_UNLIKELY(x) __builtin_expect((x),0)
+#define UPB_LIKELY(x) __builtin_expect((bool)(x),1)
+#define UPB_UNLIKELY(x) __builtin_expect((bool)(x),0)
#else
#define UPB_LIKELY(x) (x)
#define UPB_UNLIKELY(x) (x)
A fix for this has just been pushed so I am closing.
What version of protobuf and what language are you using? Version: protoc 3.21.9 Language: Python - protobuf 4.21.9
OS: Windows 11
Interpreter: Python 3.11.0
What did you do?
What did you expect to see The deserialized protobuf has the enum field set to the value larger than 31
What did you see instead? The deserialized protobuf has the field clear (the serialized form does not contain the value at all).
Anything else we should know about your project / environment I can provide minimal reproduction example if desired, but it was pretty simple to reproduce. The issue was also reproduceable on protobuf lib versions as low as 4.21.3 and with grpcio-tools as low as 1.49.1. Lastly, switching the syntax to proto3 does fix the issue, however it is currently not viable for us.