Open LucAce opened 2 months ago
Build command for libmodbus and NUT
build_libmodbus.sh
#!/bin/bash
if [ "$EUID" -ne 0 ]; then
echo -e "ERROR: Please run as root\n"
exit
fi
# Install dependencies
apt-get install -y \
autoconf libusb-1.0-0-dev libusb-dev git libtool gawk
# Build RTU USB Library
git clone -b rtu_usb https://github.com/networkupstools/libmodbus
cd libmodbus
./autogen.sh
export DEB_HOST_MULTIARCH=aarch64-linux-gnu
./configure \
--prefix=/usr \
--sysconfdir=/etc/nut \
--includedir=/usr/include \
--mandir=/usr/share/man \
--libdir=/usr/lib/${DEB_HOST_MULTIARCH} \
--libexecdir=/usr/libexec \
--with-libusb \
--enable-static \
--enable-shared \
--with-pkgconfig-dir=/usr/lib/${DEB_HOST_MULTIARCH}/pkgconfig
make -j4 && make install
build_nut.sh
#!/bin/bash
if [ "$EUID" -ne 0 ]; then
echo -e "ERROR: Please run as root\n"
exit
fi
# Install dependencies
apt-get install -y \
libssl-dev libnss3-dev libgd-dev augeas-tools \
libaugeas-dev augeas-lenses libcppunit-dev \
ccache time perl curl make autoconf automake \
libltdl-dev libtool cppcheck pkg-config gcc g++ \
clang valgrind libsystemd-dev
# Build NUT
git clone https://github.com/networkupstools/nut
cd nut
./autogen.sh
export DEB_HOST_MULTIARCH="aarch64-linux-gnu"
./configure \
--prefix=/usr \
--with-doc=no \
--with-drivers="apc_modbus,usbhid-ups" \
--with-usb \
--with-modbus \
--sysconfdir=/etc/nut \
--includedir=/usr/include \
--mandir=/usr/share/man \
--libdir=/usr/lib/${DEB_HOST_MULTIARCH} \
--libexecdir=/usr/libexec \
--with-ssl \
--with-nss \
--with-cgi \
--with-dev \
--with-statepath=/run/nut \
--with-altpidpath=/run/nut \
--with-drvpath=/lib/nut \
--with-cgipath=/usr/lib/cgi-bin/nut \
--with-htmlpath=/usr/share/nut/www \
--with-pidpath=/run/nut \
--datadir=/usr/share/nut \
--with-pkgconfig-dir=/usr/lib/${DEB_HOST_MULTIARCH}/pkgconfig \
--with-user=nut \
--with-group=nut \
--with-udev-dir=/lib/udev \
--with-systemdsystemunitdir=/lib/systemd/system \
--with-systemdshutdowndir=/lib/systemd/system-shutdown \
--with-systemdtmpfilesdir=/usr/lib/tmpfiles.d
make -j4
echo ""
read -p "Do you wish to continue? [Y/N] " yesno
case $yesno in
[Yy]* )
echo "Installing..."
;;
* )
echo "Exiting..."
exit
;;
esac
make install
CC @EchterAgo : does this ring any bells, please?
In an effort to rule out the RPi being a factor, I recreated the build on an x86 system with Ubuntu 24.04.1 which results in:
$ ./apc_modbus -DDDDDD -a ups -x port=auto
0.000001 [D5] send_to_all: SETINFO driver.state "init.starting"
Network UPS Tools driver 2.8.2.1004-1004-g38d8078ab (development iteration after 2.8.2) - NUT APC Modbus driver 0.10
0.000145 [D5] send_to_all: SETINFO driver.version.usb "libusb-1.0.27 (API: 0x100010a)"
0.000160 [D1] Using USB implementation: libusb-1.0.27 (API: 0x100010a)
0.000311 [D5] do_upsconf_args: confupsname=(null), var=maxretry, val=3
0.000316 [D5] do_upsconf_args: call do_global_args()
0.000319 [D3] do_global_args: var='maxretry' val='3'
0.000388 [D5] do_upsconf_args: confupsname=ups, var=driver, val=apc_modbus
0.000392 [D5] do_upsconf_args: call main_arg()
0.000396 [D3] main_arg: var='driver' val='apc_modbus'
0.000399 [D5] do_upsconf_args: not a main_arg()
0.000402 [D5] do_upsconf_args: this is a 'driver' setting, may we proceed?
0.000406 [D6] testval_reloadable: var=driver, oldval=apc_modbus, newval=apc_modbus, reloadable=0, reload_flag=0
0.000410 [D6] testval_reloadable: verdict for (re)loading var=driver value: -1
0.000413 [D5] do_upsconf_args: 'driver' setting already applied with this value
0.000417 [D5] do_upsconf_args: confupsname=ups, var=port, val=auto
0.000421 [D5] do_upsconf_args: call main_arg()
0.000424 [D3] main_arg: var='port' val='auto'
0.000428 [D6] testinfo_reloadable: var=port, infoname=driver.parameter.port, newval=auto, reloadable=0, reload_flag=0
0.000433 [D6] testinfo_reloadable: verdict for (re)loading var=port value: 1
0.000439 [D5] send_to_all: SETINFO driver.parameter.port "auto"
0.000449 [D3] main_arg: var='port' val='auto'
0.000453 [D6] testinfo_reloadable: var=port, infoname=driver.parameter.port, newval=auto, reloadable=0, reload_flag=0
0.000456 [D6] testinfo_reloadable: verdict for (re)loading var=port value: 1
0.000460 [D1] Network UPS Tools version 2.8.2.1004-1004-g38d8078ab (development iteration after 2.8.2) built with gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0 and configured with flags: --prefix=/usr --with-doc=no --with-drivers=apc_modbus,usbhid-ups --with-usb --with-modbus --sysconfdir=/etc/nut --includedir=/usr/include --mandir=/usr/share/man --libdir=/usr/lib/x86_64-linux-gnu --libexecdir=/usr/libexec --with-ssl --with-nss --with-cgi --with-dev --with-statepath=/run/nut --with-altpidpath=/run/nut --with-drvpath=/lib/nut --with-cgipath=/usr/lib/cgi-bin/nut --with-htmlpath=/usr/share/nut/www --with-pidpath=/run/nut --datadir=/usr/share/nut --with-pkgconfig-dir=/usr/lib/x86_64-linux-gnu/pkgconfig --with-user=nut --with-group=nut --with-udev-dir=/lib/udev --with-systemdsystemunitdir=/lib/systemd/system --with-systemdshutdowndir=/lib/systemd/system-shutdown --with-systemdtmpfilesdir=/usr/lib/tmpfiles.d
0.000484 [D1] debug level is '6'
0.000489 [D5] send_to_all: SETINFO driver.debug "6"
0.000495 [D5] send_to_all: SETFLAGS driver.debug RW NUMBER
0.001387 [D1] Succeeded to become_user(nut): now UID=110 GID=111
0.001413 [D1] Signalling UPS [ups]: driver.exit (quietly, no fuss if no driver is running or responding)
0.001425 Can't open /run/nut/apc_modbus-ups: No such file or directory
0.001430 [D1] Request for other driver to exit returned code -1
0.001434 [D1] Socket dialog with the other driver instance (may be absent) failed: No such file or directory
0.001440 [D5] send_to_all: SETINFO device.type "ups"
0.001444 [D5] send_to_all: SETINFO driver.state "init.device"
Number of USB devices: 3
Considering device #1 (051d:0003)
0.025093 [D2] _apc_modbus_usb_callback: Matched American Power Conversion Smart-UPS_1500 FW:UPS 15.5 / ID=1015 (USB VID/PID 051d:0003)
0.025120 [D2] _apc_modbus_usb_callback: Checking American Power Conversion Smart-UPS_1500 FW:UPS 15.5 / ID=1015 (USB VID/PID 051d:0003) report descriptors
0.025264 [D1] _apc_modbus_usb_callback: Found report ids RX=0x90 TX=0x89
0.025279 [D5] send_to_all: SETINFO ups.vendorid "051d"
0.025287 [D5] send_to_all: SETINFO ups.productid "0003"
Found Device 1 (Path 1-8):
Vendor ID: 0x051d
Product ID: 0x0003
Bytes flushed (63)
0.036922 [D5] send_to_all: SETINFO driver.state "init.quiet"
0.036947 [D5] send_to_all: SETINFO driver.version "2.8.2.1004-1004-g38d8078ab"
0.036966 [D5] send_to_all: SETINFO driver.version.internal "0.10"
0.036982 [D5] send_to_all: SETINFO driver.name "apc_modbus"
0.036995 [D5] send_to_all: SETINFO driver.state "init.info"
[01][03][02][04][00][78]
Waiting for a confirmation...
*** bit out of range 0 - FD_SETSIZE on fd_set ***: terminated
Aborted (core dumped)
$ ./upsdrvctl start
Network UPS Tools upsdrvctl - UPS driver controller 2.8.2.1004-1004-g38d8078ab (development iteration after 2.8.2)
Network UPS Tools driver 2.8.2.1004-1004-g38d8078ab (development iteration after 2.8.2) - NUT APC Modbus driver 0.10
*** bit out of range 0 - FD_SETSIZE on fd_set ***: terminated
Driver exited abnormally
Thanks for the diligence about multi-platform check. RPi's USB chips were also suspects a few times.
But the segfault was reported in several issues recently - so seeing it misbehave differently per platform is at least "most intriguing!" :)
CC @EchterAgo again
I might at best look at code in a couple of weeks or more, and have no similar APC device to check against.
@jimklimov Thanks for the reply. I will keep a look out for updates, let me know if there is anything you'd like me to try.
To add another device: I'm seeing a similar behavior using a Smart-UPS 750 running FW 9.3 on NUT 2.8.2 with libmodbus c02f62b58aa4ce7538e087ef46266d2a048b0bd6.
/lib64/nut/apc_modbus -DDDDDD -s ups -x port=auto
0.000000 [D5] send_to_all: SETINFO driver.state "init.starting"
Network UPS Tools - NUT APC Modbus driver 0.10 (2.8.2)
0.000105 [D5] send_to_all: SETINFO driver.version.usb "libusb-1.0.27 (API: 0x100010a)"
0.000114 [D1] Using USB implementation: libusb-1.0.27 (API: 0x100010a)
0.000128 [D3] main_arg: var='port' val='auto'
0.000135 [D6] testinfo_reloadable: var=port, infoname=driver.parameter.port, newval=auto, reloadable=0, reload_flag=0
0.000141 [D6] testinfo_reloadable: verdict for (re)loading var=port value: 1
0.000156 [D5] send_to_all: SETINFO driver.parameter.port "auto"
0.000162 [D1] Network UPS Tools version 2.8.2 (release/snapshot of 2.8.2) built with x86_64-pc-linux-gnu-gcc (Gentoo 13.3.1_p20240614 p17) 13.3.1 20240614 and configured with flags: --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --datarootdir=/usr/share --disable-dependency-tracking --disable-silent-rules --disable-static --docdir=/usr/share/doc/nut-2.8.2-r1 --htmldir=/usr/share/doc/nut-2.8.2-r1/html --with-sysroot=/ --libdir=/usr/lib64 --datadir=/usr/share/nut --datarootdir=/usr/share/nut --disable-static --disable-strip --disable-Werror --sysconfdir=/etc/nut --with-dev --with-drvpath=/lib64/nut --with-group=nut --with-htmlpath=/usr/share/nut/html --with-logfacility=LOG_DAEMON --with-statepath=/var/lib/nut --with-systemdsystemunitdir=/lib/systemd/system --with-systemdtmpfilesdir=/usr/lib/tmpfiles.d --with-udev-dir=/lib/udev --with-user=nut --without-powerman --without-python --without-python2 --with-altpidpath=/run/nut --with-pidpath=/run/nut --with-usb --with-modbus --with-modbus-includes=-I/usr/local/include/modbus --with-modbus-libs='-L/usr/local/lib -lmodbus' --without-cgi --without-gpio --without-linux_i2c --without-freeipmi --without-ipmi --without-nut_monitor --without-pynut --without-serial --without-snmp --with-ssl --without-libsystemd --without-wrap --with-usb --without-neon --without-avahi --with-doc=man --without-python3
0.000197 [D1] debug level is '6'
0.000203 [D5] send_to_all: SETINFO driver.debug "6"
0.000210 [D5] send_to_all: SETFLAGS driver.debug RW NUMBER
0.000409 [D1] Succeeded to become_user(nut): now UID=84 GID=84
0.000433 [D5] send_to_all: SETINFO device.type "ups"
0.000440 [D5] send_to_all: SETINFO driver.state "init.device"
Number of USB devices: 25
Considering device #0 (9022:d483)
libusb_open for device #0 failed: Access denied (insufficient permissions)
Considering device #1 (1c4f:0002)
libusb_open for device #1 failed: Access denied (insufficient permissions)
Considering device #2 (10c4:ea60)
libusb_open for device #2 failed: Access denied (insufficient permissions)
Considering device #4 (10c4:ea60)
libusb_open for device #4 failed: Access denied (insufficient permissions)
Considering device #5 (1a40:0201)
libusb_open for device #5 failed: Access denied (insufficient permissions)
Considering device #6 (051d:0003)
0.026713 [D2] _apc_modbus_usb_callback: Matched American Power Conversion Smart-UPS 750 FW:UPS 09.3 / ID=18 (USB VID/PID 051d:0003)
0.026745 [D2] _apc_modbus_usb_callback: Checking American Power Conversion Smart-UPS 750 FW:UPS 09.3 / ID=18 (USB VID/PID 051d:0003) report descriptors
0.026906 [D1] _apc_modbus_usb_callback: Found report ids RX=0x90 TX=0x89
0.026926 [D5] send_to_all: SETINFO ups.vendorid "051d"
0.026945 [D5] send_to_all: SETINFO ups.productid "0003"
Found Device 6 (Path 12-1):
Vendor ID: 0x051d
Product ID: 0x0003
Bytes flushed (63)
0.047954 [D5] send_to_all: SETINFO driver.state "init.quiet"
0.047967 [D5] send_to_all: SETINFO driver.version "2.8.2"
0.047975 [D5] send_to_all: SETINFO driver.version.internal "0.10"
0.047981 [D5] send_to_all: SETINFO driver.name "apc_modbus"
0.047997 [D5] send_to_all: SETINFO driver.state "init.info"
[01][03][02][04][00][78]
Waiting for a confirmation...
*** bit out of range 0 - FD_SETSIZE on fd_set ***: terminated
UPDATE: Seems like a practical duplicate of #2289
To summarize current findings in one place, we see several similar patterns ending up in driver crash (typically with bit out of range 0 - FD_SETSIZE on fd_set
or buffer overflow detected
), or a graceful exit (with ERROR Connection timed out: select
; not sure if _apc_modbus_read_registers ... timed out
cases are all in the same bucket):
https://github.com/networkupstools/nut/issues/2609#issue-2499677698 - on Raspberry Pi 3B+ 64-bit build (Pi-OS lite based on Debian 12 bookworm)
...
0.058663 [D5] send_to_all: SETINFO driver.state "init.info"
[01][03][02][04][00][78]
Waiting for a confirmation...
ERROR Connection timed out: select
0.567189 _apc_modbus_read_registers: Read of 516:636 failed: Connection timed out (auto)
Bytes flushed (0)
1.577519 Can't read inventory information from the UPS
1.577610 [D5] send_to_all: SETINFO driver.state "cleanup.upsdrv"
1.578214 [D5] send_to_all: SETINFO driver.state "cleanup.exit"
https://github.com/networkupstools/nut/issues/2609#issuecomment-2323657557 - same device as above, on an x86 system with Ubuntu 24.04.1:
...
0.036995 [D5] send_to_all: SETINFO driver.state "init.info"
[01][03][02][04][00][78]
Waiting for a confirmation...
*** bit out of range 0 - FD_SETSIZE on fd_set ***: terminated
Aborted (core dumped)
https://github.com/networkupstools/nut/pull/2063#issuecomment-1905299561
Network UPS Tools - UPS driver controller 2.8.1
Network UPS Tools - NUT APC Modbus driver 0.01 (2.8.1)
*** buffer overflow detected ***: terminated
Driver exited abnormally
** Notably, Jan 23 05:00:27 nut kernel: [ 250.593376] usb 2-1: usbfs: process 918 (apc_modbus) did not claim interface 0 before use
https://github.com/networkupstools/nut/issues/2289#issue-2104336348
...
0.050384 [D5] send_to_all: SETINFO driver.state "init.info"
[01][03][02][04][00][58]
Waiting for a confirmation...
*** buffer overflow detected ***: terminated
Aborted
https://github.com/networkupstools/nut/issues/2289#issuecomment-2310851725
...
0.040156 [D5] send_to_all: SETINFO driver.state "init.info"
[01][03][02][04][00][78]
Waiting for a confirmation...
*** bit out of range 0 - FD_SETSIZE on fd_set ***: terminated
Aborted (core dumped)
https://github.com/networkupstools/nut/issues/2289#issuecomment-2313515279
...
5.438365 [D5] send_to_all: SETINFO driver.state "init.info"
[01][03][02][04][00][78]
Waiting for a confirmation...
***** buffer overflow detected *: terminated
Aborted
Separately from this (maybe... seems similar to first sample above), in the later after-party discussion of original PR #2063 there were at least occasional (fixed by cable re-plug) faults with:
Network UPS Tools - UPS driver controller 2.8.1
Network UPS Tools - NUT APC Modbus driver 0.01 (2.8.1)
_apc_modbus_read_registers: Read of 516:604 failed: Connection timed out (auto)
Can't read inventory information from the UPS
And also an strace
log was posted at https://gist.github.com/ZerglingGo/1cc0d27ae30e26d4000d674b0f7da6ec with the end-game part looking like below, and notably exposing Resource temporarily unavailable
which might be relevant (especially vs. the piece above about cable re-plugging helping for a while):
...
407494 18:43:17.850107 write(2, " 0.215807\t[D5] send_to_all: SE"..., 63) = 63 <0.000024>
407494 18:43:17.850202 write(1, "[01][03][02][04][00][78]\n", 25) = 25 <0.000026>
407494 18:43:17.850293 ioctl(8, USBDEVFS_SUBMITURB, 0x5ff638237f70) = 0 <0.000031>
407494 18:43:17.850382 poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=8, events=POLLOUT}], 3, 60000) = 1 ([{fd=8, revents=POLLOUT}]) <0.004183>
407494 18:43:17.854644 ioctl(8, USBDEVFS_REAPURBNDELAY, 0x7fffd7dcecd0) = 0 <0.000029>
407494 18:43:17.854738 ioctl(8, USBDEVFS_REAPURBNDELAY, 0x7fffd7dcecd0) = -1 EAGAIN (Resource temporarily unavailable) <0.000027>
407494 18:43:17.854834 write(1, "Waiting for a confirmation...\n", 30) = 30 <0.000033>
407494 18:43:17.854944 writev(2, [{iov_base="*** ", iov_len=4}, {iov_base="bit out of range 0 - FD_SETSIZE "..., iov_len=41}, {iov_base=" ***: terminated\n", iov_len=17}], 3) = 62 <0.000030>
407494 18:43:17.855054 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7771af1f4000 <0.000034>
407494 18:43:17.855161 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 <0.000024>
...
The Waiting for a confirmation
line and what I think are byte dumps after it come from libmodbus (if modbus_set_debug()
was enabled), src/modbus.c::_modbus_receive_msg()
which indeed does FD_SET
soon after the message:
I suppose a run with gdb
or similar would be helpful to identify the line at which it actually crashes - is it in FD_SET
or in some later loop? Note that FD_SET()
is probably a macro, and probably defined differently depending on architecture, e.g. here are some hits from the systems I could look at quickly:
:; grep -rw FD_SET /usr/include/
...
/usr/include/aarch64-linux-gnu/sys/select.h:#define FD_SET(fd, fdsetp) __FD_SET (fd, fdsetp)
/usr/include/x86_64-linux-gnu/sys/select.h:#define FD_SET(fd, fdsetp) __FD_SET (fd, fdsetp)
...
:; grep -rw -A5 __FD_SET /usr/include/
...
usr/include/aarch64-linux-gnu/bits/select.h:#define __FD_SET(d, s) \
/usr/include/aarch64-linux-gnu/bits/select.h- ((void) (__FDS_BITS (s)[__FD_ELT(d)] |= __FD_MASK(d)))
...
/usr/include/x86_64-linux-gnu/bits/select.h:#define __FD_SET(d, s) \
/usr/include/x86_64-linux-gnu/bits/select.h- ((void) (__FDS_BITS (s)[__FD_ELT(d)] |= __FD_MASK(d)))
...
There are other FD macros nearby, all defined in terms of access to __FDS_BITS (s)[]
array - hence the likely overflow if that array is too small for the offset number that lands into it as an index.
https://stackoverflow.com/questions/22827510/how-to-avoid-bad-fd-set-buffer-overflow-crash suggests there are problems with 1000+ socket/FD numbers, and also reminds to check if there are FD leaks (loops with a FD not closed so another number gets opened and issued for some next operation).
Another question is what ctx->s
is in FD_SET(ctx->s, &rset);
expression, my random guess is it might be mis-populated in the new code for libmodbus rtu_usb
branch?..
CC @EchterAgo
I suppose ctx->s
is initialized to -1
in _modbus_init_common()
and remains that way. At least, I do not see any calls to modbus_set_socket()
or direct access to the field in src/modbus-rtu-usb.c
.
That's about all I can say for now, lacking a device to test against, time (gotta run in fact), etc.
But I really hope the data-dig above can get someone started!
@jimklimov I can provide you with system. If you want, Please send me a mail.
@ZerglingGo : thanks I will, but first...
@ all : were all those attempts done with the proper NUT setup, drivers running unprivileged by default?
That "Resource temporarily unavailable" clue above bothers me since I saw it yesterday. Can you please try running the driver as root
(with -x root
option if running on command line, or user=root
in its ups.conf
section)?
The hypothesis I'm trying to check here is if permissions for nut
or ups
etc. account to use the devfs node for that USB device did not get set up correctly, so the driver does not succeed (fully?) opening the device, does not abort either, and gets into some sort of undefined state (e.g. polling 0 open/responding file descriptors or something).
In the debug dumps posted earlier I see _apc_modbus_usb_callback
lines with apparently device details, but maybe it is something cached by the OS USB handler (name/model/version) and not reliant on access to the actual device from the driver program.
I guess that should be -u root
and not -x root
? If so: Same output. https://gist.github.com/adlerweb/8803c7c4c088a62b4036f4179f0c246b
Sorry, yes. Thought about -x user=root
then edited and lost a bit :D
https://gist.github.com/ZerglingGo/1cc0d27ae30e26d4000d674b0f7da6ec#file-strace_apc_modbus-log-2
Here it is running result with -u root
On an APC SMT1500C device using the rtu_usb version of libmodbus and a USB cable, reads fail with a timeout.
Since support for USB based APC Modbus is not mainlined, I am building the required RTU_USB version of libmodbus and compiling the main branch of NUT with usb and modbus support per the https://github.com/networkupstools/nut/wiki/APC-UPS-with-Modbus-protocol wiki page. The build appears to be successful but I am not able to at get the apc_modbus driver to communicate with the UPS.
I have attached build logs and included the apc_modbus output below. I have tested the connection using APC's PowerChute program (on a different system) and PowerChute is able to collect all of the ModBus only fields (like load and temperatures, etc). So I am fairly confident it is not a physical connection issue. Also when using the standard Debian bookworm build of NUT I am able to reliably communicate using the usbhid-ups driver but as it only provides a few fields, the Modbus driver is prefered.
Any pointers would be appreciated but I am not sure where to go from here.
NUT configuration indicating the new RTU USB features are included in the NUT build
Using apc_modbus to probe the unit this is the output:
Host Details
ups.conf
upsd.conf
nut.conf