robotpy / pyntcore

Moved to https://github.com/robotpy/mostrobotpy
2 stars 5 forks source link

[BUG]: MemoryError when publishing arrays #40

Closed jwbonner closed 1 year ago

jwbonner commented 1 year ago

Problem description

When setting an array value (int[], float[], double[], boolean[], string[], or raw) using a publisher or entry a MemoryError: std::bad_alloc is thrown. This does not occur for other types. Only tested on Raspbian.

Operating System

Raspbian

Installed Python Packages

pyntcore          2023.2.1.1
robotpy-wpinet    2023.2.1.0
robotpy-wpiutil   2023.2.1.0

Reproducible example code

ntcore.NetworkTableInstance.getDefault().getDoubleArrayTopic("/Topic").publish().set([])
virtuald commented 1 year ago

I'm not able to duplicate your error using that example code. Do you have a more complete example?

jwbonner commented 1 year ago

This is the specific place we're trying to publish an array, though all of the other test programs we've tried have exhibited the same issue:

https://github.com/Mechanical-Advantage/AdvantageKit/blob/ns-dev-array/akit/py/northstar/output/OutputPublisher.py#L40

virtuald commented 1 year ago

I'm still not able to reproduce even when tweaking your example. Does it happen immediately or is there something that needs to trigger it? Or maybe it only happens on raspbian... I'm testing on Linux.

jwbonner commented 1 year ago

No, it happens whenever we call set. All of our testing has been with the server connected. Also a correction, this is running on an Orange Pi with Ubuntu, so the Linux arm64 build.

virtuald commented 1 year ago

Yeah, it doesn't happen for me when I call set, so there must be something else to this.

jwbonner commented 1 year ago

Could it be the wrong Python version? I can double check what we were running (I don't have access to the device right now), but I think it was 3.10. The same issue happened when running on a Le Potato with Armbian (I can check versions on that too if it's useful).

Are there any other logs or tests that would be useful to debug this? For example, I could do a verbose pip install and see if it was doing anything unusual (rebuilding things it shouldn't or something like that).

virtuald commented 1 year ago

I think if you could identify what is throwing the bad_alloc that would be good. Might be some uninitialized variable on the C++ side. This might help you do that with gdb: https://stackoverflow.com/questions/6835728/how-to-break-when-a-specific-exception-type-is-thrown-in-gdb

jwbonner commented 1 year ago

The Python version is 3.10.6. I don't know much about gdb, but here's the backtrace from the exception:

#0  0x0000007ff66d2cdc in __cxa_throw () from /lib/aarch64-linux-gnu/libstdc++.so.6
#1  0x0000007ff66d3318 in operator new(unsigned long) () from /lib/aarch64-linux-gnu/libstdc++.so.6
#2  0x0000007ff61aab60 in nt::Value::MakeDoubleArray(std::span<double const, 18446744073709551615ul>, long) ()
   from /home/orangepi/.local/lib/python3.10/site-packages/ntcore/lib/libntcore.so
#3  0x0000007ff61cdb44 in nt::SetDoubleArray(unsigned int, std::span<double const, 18446744073709551615ul>, long) ()
   from /home/orangepi/.local/lib/python3.10/site-packages/ntcore/lib/libntcore.so
#4  0x0000007ff5ee8b10 in pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<void, nt::DoubleArrayPublisher, std::span<double const, 18446744073709551615ul>, long, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release>, pybind11::doc>(void (nt::DoubleArrayPublisher::*)(std::span<double const, 18446744073709551615ul>, long), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&, pybind11::doc const&)::{lambda(nt::DoubleArrayPublisher*, std::span<double const, 18446744073709551615ul>, long)#1}, void, nt::DoubleArrayPublisher*, std::span<double const, 18446744073709551615ul>, long, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release>, pybind11::doc>(pybind11::cpp_function::initialize<void, nt::DoubleArrayPublisher, std::span<double const, 18446744073709551615ul>, long, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release>, pybind11::doc>(void (nt::DoubleArrayPublisher::*)(std::span<double const, 18446744073709551615ul>, long), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&, pybind11::doc const&)::{lambda(nt::DoubleArrayPublisher*, std::span<double const, 18446744073709551615ul>, long)#1}&&, void (*)(nt::DoubleArrayPublisher*, std::span<double const, 18446744073709551615ul>, long), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::call_guard<pybind11::gil_scoped_release> const&, pybind11::doc const&)::{lambda(pybind11::detail::function_call&)#3}::operator()(pybind11::detail::function_call&) const [clone .isra.0] () from /home/orangepi/.local/lib/python3.10/site-packages/ntcore/_ntcore.cpython-310-aarch64-linux-gnu.so
#5  0x0000007ff5ec14c0 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) ()
   from /home/orangepi/.local/lib/python3.10/site-packages/ntcore/_ntcore.cpython-310-aarch64-linux-gnu.so
#6  0x00000055556536f4 in ?? ()
#7  0x000000555564a0a0 in _PyObject_MakeTpCall ()
#8  0x0000005555662f9c in ?? ()
#9  0x0000005555640bf0 in _PyEval_EvalFrameDefault ()
#10 0x0000005555739a80 in ?? ()
#11 0x0000005555739904 in PyEval_EvalCode ()
#12 0x000000555576f1ec in ?? ()
#13 0x00000055557668d8 in ?? ()
#14 0x000000555576ee9c in ?? ()
#15 0x000000555576e004 in _PyRun_SimpleFileObject ()
#16 0x000000555576dca4 in _PyRun_AnyFileObject ()
#17 0x000000555575c9b0 in Py_RunMain ()
#18 0x000000555572ab08 in Py_BytesMain ()
#19 0x0000007ff7d173fc in ?? () from /lib/aarch64-linux-gnu/libc.so.6
#20 0x0000007ff7d174cc in __libc_start_main () from /lib/aarch64-linux-gnu/libc.so.6
#21 0x000000555572a9f0 in _start ()
virtuald commented 1 year ago

That's a good start! There's no smoking gun yet, but I'll dig a bit more.

virtuald commented 1 year ago

Good news! I upgraded my odroid-c2 to Ubuntu 22.04 and was able to duplicate your error. Won't have time to dig into it until maybe very late tonight.

virtuald commented 1 year ago

So Peter found out that GCC changed the ABI for std::span between GCC 10 and 11. My ubuntu 22.04 has 11, and I'm guessing yours does too -- but the wpilib artifacts are built with 10.

Currently we only publish python 3.9 aarch64 artifacts, I'm going to roll out 3.8-3.11 artifacts as well.

virtuald commented 1 year ago

I'm heading to bed, but I've started the deploy process. Hopefully by the end of the night there should be some aarch64 artifacts at https://tortall.net/~robotpy/wheels/2023/raspbian/. You should be able to:

pip install --find-links https://tortall.net/~robotpy/wheels/2023/raspbian/ pyntcore==2023.2.1.2

And that will install a python 3.10 wheel, and it shouldn't have the problem you reported. I haven't built the wheel locally, so I haven't tested this either... but it seems pretty likely that this will fix it.

virtuald commented 1 year ago

Tried it on my odroid, that fixed it.