wlav / cppyy

Other
387 stars 39 forks source link

uint8_t array access returns wrong value #154

Closed yanghao closed 8 months ago

yanghao commented 1 year ago

This is tested under cppyy 3.0.0, but seems to appear since version 2.4.0 already (last working version is 2.3.1).

OS: Ubuntu 22.04 cppyy: 3.0.0 Python: 3.10.6

Reproduce source code:

test.h:

#include <stdint.h>

extern uint8_t test[8];

test.c:

#include <stdint.h>

uint8_t test[8] = {0x12, 0x34, 0x56, 0x78};

test.py:

from cppyy import gbl as cpp
from cppyy import cppdef, include, load_library, add_include_path

include("test.h")
load_library("test.so")
print([hex(cpp.test[i]) for i in range(8)])

Reproduce steps:

$ gcc -o test.so -shared test.c
$ python3 test.py

Expected prints: 0x12, 0x34, 0x56, 0x78, 0x0, 0x0, 0x0, 0x0 Actual prints (varies for different runs): ['0x7f43335ae020', '0x7f433381b238', '0x1', '0x3', '0x7f432a57cf70', '0x7f432a57cf30', '0x0', '0x5613b4416e10']

Any idea why?

yanghao commented 1 year ago

by the way, it seems only "uint8_t/int8_t" does not work ... all other data type works (16/32/64/char/unsigned char).

This is very strange, it seems somehow uint8_t is treated differently. However according to the documentation, uint8_t should be mapped as "unsigned char"?

yanghao commented 1 year ago

@wlav any idea?

yanghao commented 1 year ago

A bit more information: The content of cppyy pythonized test seems to be some meta-information: ['0x7f43335ae020', '0x7f433381b238', '0x1', '0x3', '0x7f432a57cf70', '0x7f432a57cf30', '0x0', '0x5613b4416e10']

Where 0x7f43335ae020 is the actually test variable address after being loaded.

When having in test.c a statically initialized pointer: uint8_t *ptest = test; Then cppyy pythonized ptest does points to the proper buffer: ['0x78563412', '0x7ff8710cb030', '0x0', '0x2e31312075746e75', '0x756275312d302e33', '0x2e32327e3175746e', '0x332e313120293430', '0x302e']

But seems the data type is wrong (e.g. 4 bytes an item, instead of 1 byte), so 0x78563412 aggregated all the first four (or even eight) bytes.

When doing a ctypes casting of ptest

x = ll.cast['uint8_t *'](cpp.ptest)
x.reshape((8,))
print([hex(i) for i in list(x)])

The correct result can be recovered!

In summary: It seems cppyy pythonized uint8_t[] is having issues. However uint8_t * seems at least having the buffer address right, so casting it back to the proper data type still works.

wlav commented 8 months ago

Release 3.1.1 has LowLevelViews specialized for int8_t/uint8_t which solves this.

yanghao commented 7 months ago

I can confirm it is fixed, thank you @wlav.

If you want to point out how and where this is fixed I can try to learn a bit of cppyy internals :-)

wlav commented 7 months ago

The hard problem is that int8_t and uint8_t must not be further resolved, otherwise they become char and unsigned char respectively, so stringy types rather than integers. However, in all other cases, typedefs need to be resolved, in particular to make sure that typedefs and the the types they refer to, map to the same Python instances. If e.g. int8_t is used in and of itself, that's simple enough, but if it's part of a chain of typedefs or used in a template name, it's virtually impossible to get right (until there's some support in Cling at some point).

Anyway, I have a few places in cppyy-backend and cppyy-cling where types having int8_t and uint8_t are filtered out and not further resolved. These are just hacks, nothing earth-shattering. After that, it's simple: create specialized converters for these types (in this case, for custom LowLevelView objects).