networkupstools / nut

The Network UPS Tools repository. UPS management protocol Informational RFC 9271 published by IETF at https://www.rfc-editor.org/info/rfc9271 Please star NUT on GitHub, this helps with sponsorships!
https://networkupstools.org/
Other
2.12k stars 355 forks source link

APC_MODBUS _apc_modbus_read_registers Timeouts #2609

Open LucAce opened 2 months ago

LucAce commented 2 months ago

On an APC SMT1500C device using the rtu_usb version of libmodbus and a USB cable, reads fail with a timeout.

Since support for USB based APC Modbus is not mainlined, I am building the required RTU_USB version of libmodbus and compiling the main branch of NUT with usb and modbus support per the https://github.com/networkupstools/nut/wiki/APC-UPS-with-Modbus-protocol wiki page. The build appears to be successful but I am not able to at get the apc_modbus driver to communicate with the UPS.

I have attached build logs and included the apc_modbus output below. I have tested the connection using APC's PowerChute program (on a different system) and PowerChute is able to collect all of the ModBus only fields (like load and temperatures, etc). So I am fairly confident it is not a physical connection issue. Also when using the standard Debian bookworm build of NUT I am able to reliably communicate using the usbhid-ups driver but as it only provides a few fields, the Modbus driver is prefered.

Any pointers would be appreciated but I am not sure where to go from here.

NUT configuration indicating the new RTU USB features are included in the NUT build

checking for libmodbus version via pkg-config... 3.1.10 found
checking for libmodbus cflags... -I/usr/include/modbus
checking for libmodbus ldflags... -lmodbus
checking for modbus.h... yes
checking for modbus_new_rtu... yes
checking for modbus_new_tcp... yes
checking for modbus_set_byte_timeout... yes
checking for modbus_set_response_timeout... yes
checking for modbus_new_rtu_usb... yes
checking types of arguments for modbus_set_byte_timeout... sec_usec_uint32_cast_timeval_fields
Found types to use for modbus_set_byte_timeout: sec_usec_uint32_cast_timeval_fields

Using apc_modbus to probe the unit this is the output:

$ sudo ./apc_modbus -DDDDDD -s ups -x port=auto
   0.000001     [D5] send_to_all: SETINFO driver.state "init.starting"
Network UPS Tools driver 2.8.2.989-989-g4bf05e115 (development iteration after 2.8.2) - NUT APC Modbus driver 0.10
   0.000408     [D5] send_to_all: SETINFO driver.version.usb "libusb-1.0.26 (API: 0x1000109)"
   0.000447     [D1] Using USB implementation: libusb-1.0.26 (API: 0x1000109)
   0.000504     [D3] main_arg: var='port' val='auto'
   0.000533     [D6] testinfo_reloadable: var=port, infoname=driver.parameter.port, newval=auto, reloadable=0, reload_flag=0
   0.000561     [D6] testinfo_reloadable: verdict for (re)loading var=port value: 1
   0.000604     [D5] send_to_all: SETINFO driver.parameter.port "auto"
   0.000624     [D1] Network UPS Tools version 2.8.2.989-989-g4bf05e115 (development iteration after 2.8.2) built with gcc (Debian 12.2.0-14) 12.2.0 and configured with flags: --prefix=/usr --with-doc=no --with-drivers=apc_modbus,usbhid-ups --with-usb --with-modbus --sysconfdir=/etc/nut --includedir=/usr/include --mandir=/usr/share/man --libdir=/usr/lib/aarch64-linux-gnu --libexecdir=/usr/libexec --with-ssl --with-nss --with-cgi --with-dev --with-statepath=/run/nut --with-altpidpath=/run/nut --with-drvpath=/lib/nut --with-cgipath=/usr/lib/cgi-bin/nut --with-htmlpath=/usr/share/nut/www --with-pidpath=/run/nut --datadir=/usr/share/nut --with-pkgconfig-dir=/usr/lib/aarch64-linux-gnu/pkgconfig --with-user=nut --with-group=nut --with-udev-dir=/lib/udev --with-systemdsystemunitdir=/lib/systemd/system --with-systemdshutdowndir=/lib/systemd/system-shutdown --with-systemdtmpfilesdir=/usr/lib/tmpfiles.d
   0.000695     [D1] debug level is '6'
   0.000724     [D5] send_to_all: SETINFO driver.debug "6"
   0.000754     [D5] send_to_all: SETFLAGS driver.debug RW NUMBER
   0.001755     [D1] Succeeded to become_user(nut): now UID=106 GID=112
   0.001830     [D1] Signalling UPS [ups]: driver.exit (quietly, no fuss if no driver is running or responding)
   0.001900     Can't open /run/nut/apc_modbus-ups: No such file or directory
   0.001938     [D1] Request for other driver to exit returned code -1
   0.001967     [D1] Socket dialog with the other driver instance (may be absent) failed: No such file or directory
   0.002003     [D5] send_to_all: SETINFO device.type "ups"
   0.002034     [D5] send_to_all: SETINFO driver.state "init.device"
Number of USB devices: 5
Considering device #0 (051d:0003)
   0.047244     [D2] _apc_modbus_usb_callback: Matched American Power Conversion  Smart-UPS_1500 FW:UPS 15.5 / ID=1015 (USB VID/PID 051d:0003)
   0.047332     [D2] _apc_modbus_usb_callback: Checking American Power Conversion  Smart-UPS_1500 FW:UPS 15.5 / ID=1015 (USB VID/PID 051d:0003) report descriptors
   0.048073     [D1] _apc_modbus_usb_callback: Found report ids RX=0x90 TX=0x89
   0.048160     [D5] send_to_all: SETINFO ups.vendorid "051d"
   0.048223     [D5] send_to_all: SETINFO ups.productid "0003"
Found Device 0 (Path 1-1.1.2):
  Vendor ID: 0x051d
  Product ID: 0x0003
Bytes flushed (0)
   0.058556     [D5] send_to_all: SETINFO driver.state "init.quiet"
   0.058602     [D5] send_to_all: SETINFO driver.version "2.8.2.989-989-g4bf05e115"
   0.058623     [D5] send_to_all: SETINFO driver.version.internal "0.10"
   0.058645     [D5] send_to_all: SETINFO driver.name "apc_modbus"
   0.058663     [D5] send_to_all: SETINFO driver.state "init.info"
[01][03][02][04][00][78]
Waiting for a confirmation...
ERROR Connection timed out: select
   0.567189     _apc_modbus_read_registers: Read of 516:636 failed: Connection timed out (auto)
Bytes flushed (0)
   1.577519     Can't read inventory information from the UPS
   1.577610     [D5] send_to_all: SETINFO driver.state "cleanup.upsdrv"
   1.578214     [D5] send_to_all: SETINFO driver.state "cleanup.exit"
   1.578280     upsnotify: notify about state 4 with libsystemd: was requested, but not running as a service unit now, will not spam more about it

   1.578310     upsnotify: failed to notify about state 4: no notification tech defined, will not spam more about it

Host Details

* Hardware: Raspberry Pi 3B+
* OS: Raspberry Pi OS Lite (64-bit)
* Kernel version: 6.6
* Debian version: 12 (bookworm)
* uname: Linux rpi 6.6.31+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.31-1+rpt1 (2024-05-29) aarch64 GNU/Linux

ups.conf

[ups]
    driver = "apc_modbus"
    port = "auto"

upsd.conf

LISTEN 0.0.0.0 3493

nut.conf

MODE=netserver
LucAce commented 2 months ago

Build command for libmodbus and NUT

build_libmodbus.sh

#!/bin/bash

if [ "$EUID" -ne 0 ]; then
    echo -e "ERROR: Please run as root\n"
    exit
fi

# Install dependencies
apt-get install -y \
    autoconf libusb-1.0-0-dev libusb-dev git libtool gawk

# Build RTU USB Library
git clone -b rtu_usb https://github.com/networkupstools/libmodbus
cd libmodbus
./autogen.sh

export DEB_HOST_MULTIARCH=aarch64-linux-gnu

./configure \
    --prefix=/usr \
    --sysconfdir=/etc/nut \
    --includedir=/usr/include \
    --mandir=/usr/share/man \
    --libdir=/usr/lib/${DEB_HOST_MULTIARCH} \
    --libexecdir=/usr/libexec \
    --with-libusb \
    --enable-static \
    --enable-shared \
    --with-pkgconfig-dir=/usr/lib/${DEB_HOST_MULTIARCH}/pkgconfig

make -j4 && make install

build_nut.sh

#!/bin/bash

if [ "$EUID" -ne 0 ]; then
    echo -e "ERROR: Please run as root\n"
    exit
fi

# Install dependencies
apt-get install -y \
    libssl-dev libnss3-dev libgd-dev augeas-tools \
    libaugeas-dev augeas-lenses libcppunit-dev \
    ccache time perl curl make autoconf automake \
    libltdl-dev libtool cppcheck pkg-config gcc g++ \
    clang valgrind libsystemd-dev

# Build NUT
git clone https://github.com/networkupstools/nut
cd nut
./autogen.sh

export DEB_HOST_MULTIARCH="aarch64-linux-gnu"

./configure \
    --prefix=/usr \
    --with-doc=no \
    --with-drivers="apc_modbus,usbhid-ups" \
    --with-usb \
    --with-modbus \
    --sysconfdir=/etc/nut \
    --includedir=/usr/include \
    --mandir=/usr/share/man \
    --libdir=/usr/lib/${DEB_HOST_MULTIARCH} \
    --libexecdir=/usr/libexec \
    --with-ssl \
    --with-nss \
    --with-cgi \
    --with-dev \
    --with-statepath=/run/nut \
    --with-altpidpath=/run/nut \
    --with-drvpath=/lib/nut \
    --with-cgipath=/usr/lib/cgi-bin/nut \
    --with-htmlpath=/usr/share/nut/www \
    --with-pidpath=/run/nut \
    --datadir=/usr/share/nut \
    --with-pkgconfig-dir=/usr/lib/${DEB_HOST_MULTIARCH}/pkgconfig \
    --with-user=nut \
    --with-group=nut \
    --with-udev-dir=/lib/udev \
    --with-systemdsystemunitdir=/lib/systemd/system \
    --with-systemdshutdowndir=/lib/systemd/system-shutdown \
    --with-systemdtmpfilesdir=/usr/lib/tmpfiles.d

make -j4

echo ""
read -p "Do you wish to continue? [Y/N] " yesno
case $yesno in
    [Yy]* )
        echo "Installing..."
    ;;
    * )
        echo "Exiting..."
        exit
    ;;
esac

make install
jimklimov commented 2 months ago

CC @EchterAgo : does this ring any bells, please?

LucAce commented 2 months ago

In an effort to rule out the RPi being a factor, I recreated the build on an x86 system with Ubuntu 24.04.1 which results in:

$ ./apc_modbus -DDDDDD -a ups -x port=auto
   0.000001     [D5] send_to_all: SETINFO driver.state "init.starting"
Network UPS Tools driver 2.8.2.1004-1004-g38d8078ab (development iteration after 2.8.2) - NUT APC Modbus driver 0.10
   0.000145     [D5] send_to_all: SETINFO driver.version.usb "libusb-1.0.27 (API: 0x100010a)"
   0.000160     [D1] Using USB implementation: libusb-1.0.27 (API: 0x100010a)
   0.000311     [D5] do_upsconf_args: confupsname=(null), var=maxretry, val=3
   0.000316     [D5] do_upsconf_args: call do_global_args()
   0.000319     [D3] do_global_args: var='maxretry' val='3'
   0.000388     [D5] do_upsconf_args: confupsname=ups, var=driver, val=apc_modbus
   0.000392     [D5] do_upsconf_args: call main_arg()
   0.000396     [D3] main_arg: var='driver' val='apc_modbus'
   0.000399     [D5] do_upsconf_args: not a main_arg()
   0.000402     [D5] do_upsconf_args: this is a 'driver' setting, may we proceed?
   0.000406     [D6] testval_reloadable: var=driver, oldval=apc_modbus, newval=apc_modbus, reloadable=0, reload_flag=0
   0.000410     [D6] testval_reloadable: verdict for (re)loading var=driver value: -1
   0.000413     [D5] do_upsconf_args: 'driver' setting already applied with this value
   0.000417     [D5] do_upsconf_args: confupsname=ups, var=port, val=auto
   0.000421     [D5] do_upsconf_args: call main_arg()
   0.000424     [D3] main_arg: var='port' val='auto'
   0.000428     [D6] testinfo_reloadable: var=port, infoname=driver.parameter.port, newval=auto, reloadable=0, reload_flag=0
   0.000433     [D6] testinfo_reloadable: verdict for (re)loading var=port value: 1
   0.000439     [D5] send_to_all: SETINFO driver.parameter.port "auto"
   0.000449     [D3] main_arg: var='port' val='auto'
   0.000453     [D6] testinfo_reloadable: var=port, infoname=driver.parameter.port, newval=auto, reloadable=0, reload_flag=0
   0.000456     [D6] testinfo_reloadable: verdict for (re)loading var=port value: 1
   0.000460     [D1] Network UPS Tools version 2.8.2.1004-1004-g38d8078ab (development iteration after 2.8.2) built with gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0 and configured with flags: --prefix=/usr --with-doc=no --with-drivers=apc_modbus,usbhid-ups --with-usb --with-modbus --sysconfdir=/etc/nut --includedir=/usr/include --mandir=/usr/share/man --libdir=/usr/lib/x86_64-linux-gnu --libexecdir=/usr/libexec --with-ssl --with-nss --with-cgi --with-dev --with-statepath=/run/nut --with-altpidpath=/run/nut --with-drvpath=/lib/nut --with-cgipath=/usr/lib/cgi-bin/nut --with-htmlpath=/usr/share/nut/www --with-pidpath=/run/nut --datadir=/usr/share/nut --with-pkgconfig-dir=/usr/lib/x86_64-linux-gnu/pkgconfig --with-user=nut --with-group=nut --with-udev-dir=/lib/udev --with-systemdsystemunitdir=/lib/systemd/system --with-systemdshutdowndir=/lib/systemd/system-shutdown --with-systemdtmpfilesdir=/usr/lib/tmpfiles.d
   0.000484     [D1] debug level is '6'
   0.000489     [D5] send_to_all: SETINFO driver.debug "6"
   0.000495     [D5] send_to_all: SETFLAGS driver.debug RW NUMBER
   0.001387     [D1] Succeeded to become_user(nut): now UID=110 GID=111
   0.001413     [D1] Signalling UPS [ups]: driver.exit (quietly, no fuss if no driver is running or responding)
   0.001425     Can't open /run/nut/apc_modbus-ups: No such file or directory
   0.001430     [D1] Request for other driver to exit returned code -1
   0.001434     [D1] Socket dialog with the other driver instance (may be absent) failed: No such file or directory
   0.001440     [D5] send_to_all: SETINFO device.type "ups"
   0.001444     [D5] send_to_all: SETINFO driver.state "init.device"
Number of USB devices: 3
Considering device #1 (051d:0003)
   0.025093     [D2] _apc_modbus_usb_callback: Matched American Power Conversion  Smart-UPS_1500 FW:UPS 15.5 / ID=1015 (USB VID/PID 051d:0003)
   0.025120     [D2] _apc_modbus_usb_callback: Checking American Power Conversion  Smart-UPS_1500 FW:UPS 15.5 / ID=1015 (USB VID/PID 051d:0003) report descriptors
   0.025264     [D1] _apc_modbus_usb_callback: Found report ids RX=0x90 TX=0x89
   0.025279     [D5] send_to_all: SETINFO ups.vendorid "051d"
   0.025287     [D5] send_to_all: SETINFO ups.productid "0003"
Found Device 1 (Path 1-8):
  Vendor ID: 0x051d
  Product ID: 0x0003
Bytes flushed (63)
   0.036922     [D5] send_to_all: SETINFO driver.state "init.quiet"
   0.036947     [D5] send_to_all: SETINFO driver.version "2.8.2.1004-1004-g38d8078ab"
   0.036966     [D5] send_to_all: SETINFO driver.version.internal "0.10"
   0.036982     [D5] send_to_all: SETINFO driver.name "apc_modbus"
   0.036995     [D5] send_to_all: SETINFO driver.state "init.info"
[01][03][02][04][00][78]
Waiting for a confirmation...
*** bit out of range 0 - FD_SETSIZE on fd_set ***: terminated
Aborted (core dumped)
$ ./upsdrvctl start
Network UPS Tools upsdrvctl - UPS driver controller 2.8.2.1004-1004-g38d8078ab (development iteration after 2.8.2)
Network UPS Tools driver 2.8.2.1004-1004-g38d8078ab (development iteration after 2.8.2) - NUT APC Modbus driver 0.10
*** bit out of range 0 - FD_SETSIZE on fd_set ***: terminated
Driver exited abnormally
jimklimov commented 2 months ago

Thanks for the diligence about multi-platform check. RPi's USB chips were also suspects a few times.

But the segfault was reported in several issues recently - so seeing it misbehave differently per platform is at least "most intriguing!" :)

CC @EchterAgo again

I might at best look at code in a couple of weeks or more, and have no similar APC device to check against.

LucAce commented 2 months ago

@jimklimov Thanks for the reply. I will keep a look out for updates, let me know if there is anything you'd like me to try.

adlerweb commented 2 months ago

To add another device: I'm seeing a similar behavior using a Smart-UPS 750 running FW 9.3 on NUT 2.8.2 with libmodbus c02f62b58aa4ce7538e087ef46266d2a048b0bd6.

/lib64/nut/apc_modbus  -DDDDDD -s ups -x port=auto
   0.000000     [D5] send_to_all: SETINFO driver.state "init.starting"
Network UPS Tools - NUT APC Modbus driver 0.10 (2.8.2)
   0.000105     [D5] send_to_all: SETINFO driver.version.usb "libusb-1.0.27 (API: 0x100010a)"
   0.000114     [D1] Using USB implementation: libusb-1.0.27 (API: 0x100010a)
   0.000128     [D3] main_arg: var='port' val='auto'
   0.000135     [D6] testinfo_reloadable: var=port, infoname=driver.parameter.port, newval=auto, reloadable=0, reload_flag=0
   0.000141     [D6] testinfo_reloadable: verdict for (re)loading var=port value: 1
   0.000156     [D5] send_to_all: SETINFO driver.parameter.port "auto"
   0.000162     [D1] Network UPS Tools version 2.8.2 (release/snapshot of 2.8.2) built with x86_64-pc-linux-gnu-gcc (Gentoo 13.3.1_p20240614 p17) 13.3.1 20240614 and configured with flags: --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --datarootdir=/usr/share --disable-dependency-tracking --disable-silent-rules --disable-static --docdir=/usr/share/doc/nut-2.8.2-r1 --htmldir=/usr/share/doc/nut-2.8.2-r1/html --with-sysroot=/ --libdir=/usr/lib64 --datadir=/usr/share/nut --datarootdir=/usr/share/nut --disable-static --disable-strip --disable-Werror --sysconfdir=/etc/nut --with-dev --with-drvpath=/lib64/nut --with-group=nut --with-htmlpath=/usr/share/nut/html --with-logfacility=LOG_DAEMON --with-statepath=/var/lib/nut --with-systemdsystemunitdir=/lib/systemd/system --with-systemdtmpfilesdir=/usr/lib/tmpfiles.d --with-udev-dir=/lib/udev --with-user=nut --without-powerman --without-python --without-python2 --with-altpidpath=/run/nut --with-pidpath=/run/nut --with-usb --with-modbus --with-modbus-includes=-I/usr/local/include/modbus --with-modbus-libs='-L/usr/local/lib -lmodbus' --without-cgi --without-gpio --without-linux_i2c --without-freeipmi --without-ipmi --without-nut_monitor --without-pynut --without-serial --without-snmp --with-ssl --without-libsystemd --without-wrap --with-usb --without-neon --without-avahi --with-doc=man --without-python3
   0.000197     [D1] debug level is '6'
   0.000203     [D5] send_to_all: SETINFO driver.debug "6"
   0.000210     [D5] send_to_all: SETFLAGS driver.debug RW NUMBER
   0.000409     [D1] Succeeded to become_user(nut): now UID=84 GID=84
   0.000433     [D5] send_to_all: SETINFO device.type "ups"
   0.000440     [D5] send_to_all: SETINFO driver.state "init.device"
Number of USB devices: 25
Considering device #0 (9022:d483)
libusb_open for device #0 failed: Access denied (insufficient permissions)
Considering device #1 (1c4f:0002)
libusb_open for device #1 failed: Access denied (insufficient permissions)
Considering device #2 (10c4:ea60)
libusb_open for device #2 failed: Access denied (insufficient permissions)
Considering device #4 (10c4:ea60)
libusb_open for device #4 failed: Access denied (insufficient permissions)
Considering device #5 (1a40:0201)
libusb_open for device #5 failed: Access denied (insufficient permissions)
Considering device #6 (051d:0003)
   0.026713     [D2] _apc_modbus_usb_callback: Matched American Power Conversion  Smart-UPS 750 FW:UPS 09.3 / ID=18 (USB VID/PID 051d:0003)
   0.026745     [D2] _apc_modbus_usb_callback: Checking American Power Conversion  Smart-UPS 750 FW:UPS 09.3 / ID=18 (USB VID/PID 051d:0003) report descriptors
   0.026906     [D1] _apc_modbus_usb_callback: Found report ids RX=0x90 TX=0x89
   0.026926     [D5] send_to_all: SETINFO ups.vendorid "051d"
   0.026945     [D5] send_to_all: SETINFO ups.productid "0003"
Found Device 6 (Path 12-1):
  Vendor ID: 0x051d
  Product ID: 0x0003
Bytes flushed (63)
   0.047954     [D5] send_to_all: SETINFO driver.state "init.quiet"
   0.047967     [D5] send_to_all: SETINFO driver.version "2.8.2"
   0.047975     [D5] send_to_all: SETINFO driver.version.internal "0.10"
   0.047981     [D5] send_to_all: SETINFO driver.name "apc_modbus"
   0.047997     [D5] send_to_all: SETINFO driver.state "init.info"
[01][03][02][04][00][78]
Waiting for a confirmation...
*** bit out of range 0 - FD_SETSIZE on fd_set ***: terminated
jimklimov commented 2 months ago

UPDATE: Seems like a practical duplicate of #2289

jimklimov commented 2 months ago

To summarize current findings in one place, we see several similar patterns ending up in driver crash (typically with bit out of range 0 - FD_SETSIZE on fd_set or buffer overflow detected), or a graceful exit (with ERROR Connection timed out: select; not sure if _apc_modbus_read_registers ... timed out cases are all in the same bucket):

Separately from this (maybe... seems similar to first sample above), in the later after-party discussion of original PR #2063 there were at least occasional (fixed by cable re-plug) faults with:

Network UPS Tools - UPS driver controller 2.8.1
Network UPS Tools - NUT APC Modbus driver 0.01 (2.8.1)
_apc_modbus_read_registers: Read of 516:604 failed: Connection timed out (auto)
Can't read inventory information from the UPS

And also an strace log was posted at https://gist.github.com/ZerglingGo/1cc0d27ae30e26d4000d674b0f7da6ec with the end-game part looking like below, and notably exposing Resource temporarily unavailable which might be relevant (especially vs. the piece above about cable re-plugging helping for a while):

...
407494 18:43:17.850107 write(2, "   0.215807\t[D5] send_to_all: SE"..., 63) = 63 <0.000024>
407494 18:43:17.850202 write(1, "[01][03][02][04][00][78]\n", 25) = 25 <0.000026>
407494 18:43:17.850293 ioctl(8, USBDEVFS_SUBMITURB, 0x5ff638237f70) = 0 <0.000031>
407494 18:43:17.850382 poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=8, events=POLLOUT}], 3, 60000) = 1 ([{fd=8, revents=POLLOUT}]) <0.004183>
407494 18:43:17.854644 ioctl(8, USBDEVFS_REAPURBNDELAY, 0x7fffd7dcecd0) = 0 <0.000029>
407494 18:43:17.854738 ioctl(8, USBDEVFS_REAPURBNDELAY, 0x7fffd7dcecd0) = -1 EAGAIN (Resource temporarily unavailable) <0.000027>
407494 18:43:17.854834 write(1, "Waiting for a confirmation...\n", 30) = 30 <0.000033>
407494 18:43:17.854944 writev(2, [{iov_base="*** ", iov_len=4}, {iov_base="bit out of range 0 - FD_SETSIZE "..., iov_len=41}, {iov_base=" ***: terminated\n", iov_len=17}], 3) = 62 <0.000030>
407494 18:43:17.855054 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7771af1f4000 <0.000034>
407494 18:43:17.855161 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 <0.000024>
...

The Waiting for a confirmation line and what I think are byte dumps after it come from libmodbus (if modbus_set_debug() was enabled), src/modbus.c::_modbus_receive_msg() which indeed does FD_SET soon after the message:

I suppose a run with gdb or similar would be helpful to identify the line at which it actually crashes - is it in FD_SET or in some later loop? Note that FD_SET() is probably a macro, and probably defined differently depending on architecture, e.g. here are some hits from the systems I could look at quickly:

:; grep -rw FD_SET /usr/include/
...
/usr/include/aarch64-linux-gnu/sys/select.h:#define     FD_SET(fd, fdsetp)      __FD_SET (fd, fdsetp)
/usr/include/x86_64-linux-gnu/sys/select.h:#define      FD_SET(fd, fdsetp)      __FD_SET (fd, fdsetp)
...

:; grep -rw -A5  __FD_SET /usr/include/
...
usr/include/aarch64-linux-gnu/bits/select.h:#define __FD_SET(d, s) \
/usr/include/aarch64-linux-gnu/bits/select.h-  ((void) (__FDS_BITS (s)[__FD_ELT(d)] |= __FD_MASK(d)))
...
/usr/include/x86_64-linux-gnu/bits/select.h:#define __FD_SET(d, s) \
/usr/include/x86_64-linux-gnu/bits/select.h-  ((void) (__FDS_BITS (s)[__FD_ELT(d)] |= __FD_MASK(d)))
...

There are other FD macros nearby, all defined in terms of access to __FDS_BITS (s)[] array - hence the likely overflow if that array is too small for the offset number that lands into it as an index.

https://stackoverflow.com/questions/22827510/how-to-avoid-bad-fd-set-buffer-overflow-crash suggests there are problems with 1000+ socket/FD numbers, and also reminds to check if there are FD leaks (loops with a FD not closed so another number gets opened and issued for some next operation).

Another question is what ctx->s is in FD_SET(ctx->s, &rset); expression, my random guess is it might be mis-populated in the new code for libmodbus rtu_usb branch?..

CC @EchterAgo

jimklimov commented 2 months ago

I suppose ctx->s is initialized to -1 in _modbus_init_common() and remains that way. At least, I do not see any calls to modbus_set_socket() or direct access to the field in src/modbus-rtu-usb.c.

That's about all I can say for now, lacking a device to test against, time (gotta run in fact), etc.

But I really hope the data-dig above can get someone started!

ZerglingGo commented 2 months ago

@jimklimov I can provide you with system. If you want, Please send me a mail.

jimklimov commented 2 months ago

@ZerglingGo : thanks I will, but first...

@ all : were all those attempts done with the proper NUT setup, drivers running unprivileged by default?

That "Resource temporarily unavailable" clue above bothers me since I saw it yesterday. Can you please try running the driver as root (with -x root option if running on command line, or user=root in its ups.conf section)?

The hypothesis I'm trying to check here is if permissions for nut or ups etc. account to use the devfs node for that USB device did not get set up correctly, so the driver does not succeed (fully?) opening the device, does not abort either, and gets into some sort of undefined state (e.g. polling 0 open/responding file descriptors or something).

In the debug dumps posted earlier I see _apc_modbus_usb_callback lines with apparently device details, but maybe it is something cached by the OS USB handler (name/model/version) and not reliant on access to the actual device from the driver program.

adlerweb commented 2 months ago

I guess that should be -u root and not -x root? If so: Same output. https://gist.github.com/adlerweb/8803c7c4c088a62b4036f4179f0c246b

jimklimov commented 2 months ago

Sorry, yes. Thought about -x user=root then edited and lost a bit :D

ZerglingGo commented 2 months ago

https://gist.github.com/ZerglingGo/1cc0d27ae30e26d4000d674b0f7da6ec#file-strace_apc_modbus-log-2 Here it is running result with -u root