wlav / cppyy

Other
387 stars 39 forks source link

SIGSEGV when accessing an array of uint8_t #172

Closed fabbbbbbbb closed 1 year ago

fabbbbbbbb commented 1 year ago

With cppyy 2.3.1 and python3.8, I could do the following:

import cppyy
cppyy.cppdef(''' struct T { uint8_t t[10]; }; ''')
t = cppyy.gbl.T()
bytes(t.t)

With cppyy 2.4.0 and above, I get a sigsev:

#8 signal handler called
#9 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:384
#10 0x000056085fd8d582 in memcpy (__len=2147483640, __src=, __dest=0x7f0f27d56030) at
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:34
#11 PyBuffer_ToContiguous (buf=buf entry=0x7f0f27d56030, src=src entry=0x7ffc116933b0, len=2147483640, order=order entry=67 'C') at
Objects/memoryobject.c:997
#12 0x000056085fe188d9 in _PyBytes_FromBuffer (x=) at Objects/bytesobject.c:2689
#13 0x000056085fd56279 in bytes_new_impl (errors=0x0, encoding=0x0, x=0x7f0fa7dba6b0, type=0x56086000bac0 <PyBytes_Type>) at
Objects/bytesobject.c:2667
#14 bytes_new (type=0x56086000bac0 <PyBytes_Type>, args=, kwargs=) at Objects/clinic/bytesobject.c.h:894
#15 0x000056085fda0bd5 in type_call (type=type entry=0x56086000bac0 <PyBytes_Type>, args=args entry=0x7f0fa7df0c70, kwds=kwds entry=0x0)
at Objects/typeobject.c:1100

When I simply want to print t.t[0] I get a random number above 255. Trying to set it has no effect.

I've tried by installing from pip and from sources, and with both python 3.8 and 3.11.

Interestingly enough, if I use "unsigned char" instead of "uint8_t", it works as expected; however I have plenty of uint8_t in my code, so it's not an option.

cppyy-2.4.0 changelog mentions something about uint8_t enums, that were previously treated as 1-character strings, so it seems that uin8t_t are now incorrectly handled.

Note: may be related to issue 154, but different since it crashes python, and I don't get the correct values even if I try to cast s.t as an "unsigned char*"

wlav commented 1 year ago

Although uint8_t is normally implemented as an unsigned char, it's intend is to be an 8bit integer type. In Python, you can't really mix them the way you can in C as there is no char type, only str (or bytes) of length 1, so this is mainly a convention for the most common use case. However, Cling isn't consistent in keeping uint8_t as a type such that it can be mapped to a Python int and will in some cases serve up unsigned char, but even worse in this case, where it is translated into an Int_t (an internal typedef).

fabbbbbbbb commented 1 year ago

Thank you for the analysis!

In the end, what do you think is the best way to fix this problem?

As a user, my point of view is that in v2.3, it's a bit awkward to use uint8_t, because you have to use ord(x) every time you want to compare it to an int. But at least it worked, so I wouldn't complain if it was simply reverted to the previous behavior.

wlav commented 1 year ago

This one is fixed in repo, as well as several other cases that I tried to cover. There are still more, though, e.g. when using int8_t or uint8_t as template arguments.

fabbbbbbbb commented 1 year ago

Thank you for the quick fix. I can confirm that this bug is fixed.

I just noted one thing though: if in the struct, I use an alias to uint8_t (for instance, typedef uint8_t UBYTE; struct T { UBYTE a; };), then t.a is an str as in cppyy2.3. Not a big deal, I can just keep the ord(t.a) in this case, but it's slightly inconsistent. Anyway, I'll close this bug since it is solved.

wlav commented 1 year ago

Yes, that case is even harder: the only types the mapping code deals with are the one that is actually used (UBYTE in this case) and the canonical (unsigned char here). Obviously, Cling can figure out that there's an intermediate uint8_t type, but that's just another corner to fill (as said, there are still several I know of ... it's a complex one).

This being Python, you can bake in a workaround, or create a pythonization:

import cppyy

cppyy.cppdef("""\
typedef uint8_t UBYTE; struct T { UBYTE a; };
""")

cppyy.cppdef("""\
uint8_t T_get_a(T* t) {
    return t->a;
}""")

print("PRE:", type(cppyy.gbl.T().a))

cppyy.gbl.T.a = property(cppyy.gbl.T_get_a)
print("POST:", type(cppyy.gbl.T().a))