Closed knight-of-ni closed 6 years ago
Hey,
Yes it would be helpful if you could run it under valgrind (maybe be very slow, especially on pi zero) or gdb. While its possible this is a bug in the code, its possible its a binary compatibility issue.
I have never tried running Leosac on the Pi Zero though, so it might be an other issue.
I know I have recommended people to run 0.6.3
because it has less dependency and is easier to build
and run, however it seems its not really the case anymore.
The way forward in the short term is probably one of those:
develop
branch, as you did on Jessie, but hopefully we can do it on Stretch too. Then document this properly.develop
branch. Every dependency has been cross-compiled. The problem with this is that the full build is rather big. I ran valgrind last night, and immediately remembered I don't have the debug symbols. I can't say I'm an expert at reading these, but I'm not sure how useful this info is. Valgrind stopped at an un-recognized instruction in /usr/lib/arm-linux-gnueabihf/libarmmem.so.
That library is part of raspi-copies-and-fills package and that package has an interesting description:
ARM-accelerated versions of selected functions from string.h
Optimized implementations of memset, memcpy, memcmp and memmove for
ARM1176JZF-S and Cortex A7
So it's conceivable, to me anyway, that the copy of valgrind I have does not know about these modified instructions. That means option number 2 from the output below is conceivable, and this is a red herring.
Let me know if you've got any other insights into what is going on.
--2688-- REDIR: 0x4dd3b20 (rindex) redirected to 0x4835ef8 (rindex)
--2688-- REDIR: 0x4dd3630 (strlen) redirected to 0x48366ec (strlen)
--2688-- REDIR: 0x4dd4bf0 (bcmp) redirected to 0x483875c (bcmp)
--2688-- REDIR: 0x4dd2f40 (strcmp) redirected to 0x483739c (strcmp)
--2688-- REDIR: 0x4c22398 (operator new(unsigned int, std::nothrow_t const&)) redirected to 0x4834108 (operator new(unsigned int, std::nothrow_t const&))
--2688-- REDIR: 0x4dced98 (malloc) redirected to 0x48338ec (malloc)
--2688-- REDIR: 0x4dcf8d0 (calloc) redirected to 0x48356b8 (calloc)
==2688== Use of uninitialised value of size 4
==2688== at 0x400AA7C: _dl_lookup_symbol_x (dl-lookup.c:729)
==2688== by 0x400F9C3: _dl_fixup (dl-runtime.c:111)
==2688== by 0x401655B: _dl_runtime_resolve (dl-trampoline.S:57)
==2688== by 0x48A320B: ??? (in /usr/lib/libleosac_lib.so)
==2688== Uninitialised value was created by a stack allocation
==2688== at 0x4001BE4: handle_ld_preload (rtld.c:983)
==2688==
--2688-- REDIR: 0x4c22310 (operator new(unsigned int)) redirected to 0x4833ea8 (operator new(unsigned int))
disInstr(arm): unhandled instruction: 0xF4A174DD
cond=15(0xF) 27:20=74(0x4A) 4:4=1 3:0=13(0xD)
==2688== valgrind: Unrecognised instruction at address 0x4844f2c.
==2688== at 0x4844F2C: ??? (in /usr/lib/arm-linux-gnueabihf/libarmmem.so)
==2688== Your program just tried to execute an instruction that Valgrind
==2688== did not recognise. There are two possible reasons for this.
==2688== 1. Your program has a bug and erroneously jumped to a non-code
==2688== location. If you are running Memcheck and you just saw a
==2688== warning about a bad jump, it's probably your program's fault.
==2688== 2. The instruction is legitimate but Valgrind doesn't handle it,
==2688== i.e. it's Valgrind's fault. If you think this is the case or
==2688== you are not sure, please let us know and we'll try to fix it.
==2688== Either way, Valgrind will now raise a SIGILL signal which will
==2688== probably kill your program.
==2688==
==2688== Process terminating with default action of signal 4 (SIGILL)
==2688== Illegal opcode at address 0x4844F2C
==2688== at 0x4844F2C: ??? (in /usr/lib/arm-linux-gnueabihf/libarmmem.so)
--2688-- REDIR: 0x4dcf54c (free) redirected to 0x4834a88 (free)
--2688-- REDIR: 0x4dd5380 (memset) redirected to 0x4838c94 (memset)
==2688== Use of uninitialised value of size 4
==2688== at 0x4E66934: free_mem (in /lib/arm-linux-gnueabihf/libc-2.19.so)
==2688== by 0x4E663C3: __libc_freeres (in /lib/arm-linux-gnueabihf/libc-2.19.so)
==2688== by 0x4023633: _vgnU_freeres (vg_preloaded.c:61)
==2688== by 0x7FFFFFFF: ???
==2688== Uninitialised value was created by a stack allocation
==2688== at 0x4001BE4: handle_ld_preload (rtld.c:983)
The way forward in the short term is probably one of those:
- We can reproduce a clean build on the Pi of develop branch, as you did on Jessie, but hopefully we can do it on Stretch too. Then document this properly.
Yes, totally agree. I've gone back to figuring out how to build the develop branch. I'll get there. It's just been a little bumpy.
Been bouncing back and forth between 0.6.3 release and the develop branch.
I had a little success today. With a few modifications, I was able to build 0.6.3 release deb and install it (see #100 - would be glad to put that in the wiki if you want). Unfortunately when I try to run it, it crashes in the same manner as the deb hosted on the github releases page, so I think that rules out binary incompatibility.
After switching to a Raspberry Pi 2 and installing the 0.7.0 pre-release, leosac continues to crash when I try to assign a default value of 1 to any gpio configured as an output, using the sysfsgpio module.
This appears to be easily reproducible. From a working system, open kernel xml and add <value>true</value>
or <value>1</value>
to the gpio which represents the access granted output. Then restart the service. It will immediately fail with SEGV. At least it does for me.
Indeed I can reproduce too.
I've identified the issue and will commit a fix later today. If you're in a hurry you can comment this line: https://github.com/leosac/leosac/blob/develop/src/modules/sysfsgpio/SysFSGPIOPin.cpp#L128
Turns out it was not binary compat issue, just bad code :)
Raspberry Pi Zero W Raspbian Jessie Leosac 0.6.3 installed via deb from the releases page Using SysFS GPIO module
Relevant data is shown below.
Description: Upon adding led & beeper control to leosac, the beeper in the reader immediately went on and the led tuned green. Apparently, these functions are active low. In order to reverse the logic, I attempted to set the default value for these two outputs to 1. When I set a default value to either "1" or "true", leosacd crashes on startup with 11/SEGV.
Unfortunately the log entries shown below are not particularly helpful. Leosac just dies after setting up the gpios. If I set the default value to 0 or false, leosac starts up again.... along with the beeper in the card reader.
I can fire up valgrind to try to get more information if it is not obvious why the seg fault is occurring. Let me know.
OS:
Relevant portion of kernel.xml:
Relevant log entries: