networkupstools / nut

The Network UPS Tools repository. UPS management protocol Informational RFC 9271 published by IETF at https://www.rfc-editor.org/info/rfc9271 Please star NUT on GitHub, this helps with sponsorships!
https://networkupstools.org/
Other
1.72k stars 334 forks source link

apc_modbus "buffer overflow detected" - usb enabled libmodbus #2289

Open cbrherms opened 5 months ago

cbrherms commented 5 months ago

Running in to a similar (if not the same), issue as mentioned in https://github.com/networkupstools/nut/pull/2063#issuecomment-1905299561

This is with a 2020 SMT2200RM12U, connected via USB and attempting to use apc_modbus. Have been able to build the modified libmodbus fine (and attempted to use apc-test-client but showed a similar error), but upon building nut with this modified lib for the driver, copying driver to the correct place, and installing over the apt package of nut as described in wiki/comments on the issue thread, the driver won't load.

Output when trying to start the driver directly:

cbrherms@nutpi:~$ sudo /lib/nut/apc_modbus -a apc-2200 -u nut -DDDDDD -d1
   0.000001     [D5] send_to_all: SETINFO driver.state "init.starting"
Network UPS Tools - NUT APC Modbus driver 0.01 (2.8.1-403-g746cb8611)
   0.000478     [D5] send_to_all: SETINFO driver.version.usb "libusb-1.0.25 (API: 0x1000109)"
   0.000527     [D1] Using USB implementation: libusb-1.0.25 (API: 0x1000109)
   0.000716     [D5] do_upsconf_args: confupsname=(null), var=pollinterval, val=1
   0.000755     [D5] do_upsconf_args: call do_global_args()
   0.000792     [D3] do_global_args: var='pollinterval' val='1'
   0.000829     [D6] testval_reloadable: var=pollinterval, oldval=2, newval=1, reloadable=1, reload_flag=0
   0.000891     [D1] testval_reloadable: setting 'pollinterval' exists and differs: new value '1' vs. '2'
   0.000934     [D6] testval_reloadable: verdict for (re)loading var=pollinterval value: 1
   0.000989     [D5] do_upsconf_args: confupsname=(null), var=maxretry, val=3
   0.001034     [D5] do_upsconf_args: call do_global_args()
   0.001086     [D3] do_global_args: var='maxretry' val='3'
   0.001129     [D5] do_upsconf_args: confupsname=apc-2200, var=driver, val=apc_modbus
   0.001175     [D5] do_upsconf_args: call main_arg()
   0.001216     [D3] main_arg: var='driver' val='apc_modbus'
   0.001252     [D5] do_upsconf_args: not a main_arg()
   0.001295     [D5] do_upsconf_args: this is a 'driver' setting, may we proceed?
   0.001338     [D6] testval_reloadable: var=driver, oldval=apc_modbus, newval=apc_modbus, reloadable=0, reload_flag=0
   0.001402     [D6] testval_reloadable: verdict for (re)loading var=driver value: -1
   0.001456     [D5] do_upsconf_args: 'driver' setting already applied with this value
   0.001522     [D5] do_upsconf_args: confupsname=apc-2200, var=port, val=auto
   0.001583     [D5] do_upsconf_args: call main_arg()
   0.001631     [D3] main_arg: var='port' val='auto'
   0.001673     [D6] testinfo_reloadable: var=port, infoname=driver.parameter.port, newval=auto, reloadable=0, reload_flag=0
   0.001723     [D6] testinfo_reloadable: verdict for (re)loading var=port value: 1
   0.001784     [D5] send_to_all: SETINFO driver.parameter.port "auto"
   0.001848     [D5] do_upsconf_args: confupsname=apc-2200, var=vendorid, val=051D
   0.001902     [D5] do_upsconf_args: call main_arg()
   0.001943     [D3] main_arg: var='vendorid' val='051D'
   0.001990     [D5] do_upsconf_args: not a main_arg()
   0.002042     [D5] do_upsconf_args: process as value
   0.002091     [D6] testvar_reloadable: searching for var=vendorid, vartype=2, reload_flag=0
   0.002150     [D6] testvar_reloadable: found var=vendorid, val='(null)' => '051D', vartype=2 => 2, found=0, reloadable=0, reload_flag=0
   0.002205     [D6] testvar_reloadable: verdict for (re)loading var=vendorid value: 1
   0.002260     [D5] send_to_all: SETINFO driver.parameter.vendorid "051D"
   0.002314     [D5] do_upsconf_args: confupsname=apc-2200, var=productid, val=0003
   0.002377     [D5] do_upsconf_args: call main_arg()
   0.002419     [D3] main_arg: var='productid' val='0003'
   0.002466     [D5] do_upsconf_args: not a main_arg()
   0.002522     [D5] do_upsconf_args: process as value
   0.002569     [D6] testvar_reloadable: searching for var=productid, vartype=2, reload_flag=0
   0.002625     [D6] testvar_reloadable: found var=productid, val='(null)' => '0003', vartype=2 => 2, found=0, reloadable=0, reload_flag=0
   0.002680     [D6] testvar_reloadable: verdict for (re)loading var=productid value: 1
   0.002739     [D5] send_to_all: SETINFO driver.parameter.productid "0003"
   0.002804     [D5] do_upsconf_args: confupsname=apc-2200, var=product, val=Smart-UPS 2200 FW:UPS 15.0 / ID=18
   0.002867     [D5] do_upsconf_args: call main_arg()
   0.002908     [D3] main_arg: var='product' val='Smart-UPS 2200 FW:UPS 15.0 / ID=18'
   0.002957     [D5] do_upsconf_args: not a main_arg()
   0.003013     [D5] do_upsconf_args: process as value
   0.003061     [D6] testvar_reloadable: searching for var=product, vartype=2, reload_flag=0
   0.003116     [D6] testvar_reloadable: found var=product, val='(null)' => 'Smart-UPS 2200 FW:UPS 15.0 / ID=18', vartype=2 => 2, found=0, reloadable=0, reload_flag=0
   0.003172     [D6] testvar_reloadable: verdict for (re)loading var=product value: 1
   0.003228     [D5] send_to_all: SETINFO driver.parameter.product "Smart-UPS 2200 FW:UPS 15.0 / ID=18"
   0.003283     [D5] do_upsconf_args: confupsname=apc-2200, var=serial, val=AS2034242877
   0.003347     [D5] do_upsconf_args: call main_arg()
   0.003391     [D3] main_arg: var='serial' val='AS2034242877'
   0.003438     [D5] do_upsconf_args: not a main_arg()
   0.003487     [D5] do_upsconf_args: process as value
   0.003539     [D6] testvar_reloadable: searching for var=serial, vartype=2, reload_flag=0
   0.003595     [D6] testvar_reloadable: found var=serial, val='(null)' => 'AS2034242877', vartype=2 => 2, found=0, reloadable=0, reload_flag=0
   0.003650     [D6] testvar_reloadable: verdict for (re)loading var=serial value: 1
   0.003709     [D5] send_to_all: SETINFO driver.parameter.serial "AS2034242877"
   0.003770     [D5] do_upsconf_args: confupsname=apc-2200, var=vendor, val=American Power Conversion
   0.003832     [D5] do_upsconf_args: call main_arg()
   0.003880     [D3] main_arg: var='vendor' val='American Power Conversion'
   0.003933     [D5] do_upsconf_args: not a main_arg()
   0.003982     [D5] do_upsconf_args: process as value
   0.004037     [D6] testvar_reloadable: searching for var=vendor, vartype=2, reload_flag=0
   0.004093     [D6] testvar_reloadable: found var=vendor, val='(null)' => 'American Power Conversion', vartype=2 => 2, found=0, reloadable=0, reload_flag=0
   0.004165     [D6] testvar_reloadable: verdict for (re)loading var=vendor value: 1
   0.004227     [D5] send_to_all: SETINFO driver.parameter.vendor "American Power Conversion"
   0.004277     [D5] do_upsconf_args: confupsname=apc-2200, var=bus, val=001
   0.004338     [D5] do_upsconf_args: call main_arg()
   0.004391     [D3] main_arg: var='bus' val='001'
   0.004432     [D5] do_upsconf_args: not a main_arg()
   0.004478     [D5] do_upsconf_args: process as value
   0.004531     [D6] testvar_reloadable: searching for var=bus, vartype=2, reload_flag=0
   0.004583     [D6] testvar_reloadable: found var=bus, val='(null)' => '001', vartype=2 => 2, found=0, reloadable=0, reload_flag=0
   0.004639     [D6] testvar_reloadable: verdict for (re)loading var=bus value: 1
   0.004698     [D5] send_to_all: SETINFO driver.parameter.bus "001"
   0.004752     [D5] do_upsconf_args: confupsname=apc-2200, var=device, val=004
   0.004805     [D5] do_upsconf_args: call main_arg()
   0.004860     [D3] main_arg: var='device' val='004'
   0.004908     [D5] do_upsconf_args: not a main_arg()
   0.004960     [D5] do_upsconf_args: process as value
   0.005014     [D6] testvar_reloadable: searching for var=device, vartype=2, reload_flag=0
   0.005065     [D6] testvar_reloadable: found var=device, val='(null)' => '004', vartype=2 => 2, found=0, reloadable=0, reload_flag=0
   0.005120     [D6] testvar_reloadable: verdict for (re)loading var=device value: 1
   0.005180     [D5] send_to_all: SETINFO driver.parameter.device "004"
   0.005233     [D5] do_upsconf_args: confupsname=apc-2200, var=busport, val=003
   0.005282     [D5] do_upsconf_args: call main_arg()
   0.005342     [D3] main_arg: var='busport' val='003'
   0.005390     [D5] do_upsconf_args: not a main_arg()
   0.005443     [D5] do_upsconf_args: process as value
   0.005497     [D6] testvar_reloadable: searching for var=busport, vartype=2, reload_flag=0
   0.005547     [D6] testvar_reloadable: found var=busport, val='(null)' => '003', vartype=2 => 2, found=0, reloadable=0, reload_flag=0
   0.005602     [D6] testvar_reloadable: verdict for (re)loading var=busport value: 1
   0.005678     [D5] send_to_all: SETINFO driver.parameter.busport "003"
   0.005794     [D1] Built-in default or configured user for drivers 'nut' was ignored due to 'nut' specified on command line
   0.005876     [D1] Network UPS Tools version 2.8.1-403-g746cb8611 (release/snapshot of 2.8.1.1) built with gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 and configured with flags: --with-drivers=usbhid-ups,apc_modbus,apcsmart --with-usb --prefix= --sysconfdir=/etc/nut --with-statepath=/run/nut --with-user=nut --with-group=nut --with-modbus --with-modbus-includes=-I/home/cbrherms/libmodbus/prefix/include/modbus --with-modbus-libs='-L/home/cbrherms/libmodbus/prefix/lib -lmodbus'
   0.005951     [D1] debug level is '6'
   0.006000     [D5] send_to_all: SETINFO driver.debug "6"
   0.006059     [D5] send_to_all: SETFLAGS driver.debug RW NUMBER
   0.013012     [D1] Succeeded to become_user(nut): now UID=114 GID=123
   0.013108     [D5] send_to_all: SETINFO device.type "ups"
   0.013152     [D5] send_to_all: SETINFO driver.state "init.device"
Number of USB devices: 4
Considering device #0 (051d:0003)
   0.036808     [D3] match_function_regex: matching a device...
   0.037914     [D2] _apc_modbus_usb_callback: Matched American Power Conversion  Smart-UPS 2200 FW:UPS 15.0 / ID=18 (USB VID/PID 051d:0003)
   0.038000     [D2] _apc_modbus_usb_callback: Checking American Power Conversion  Smart-UPS 2200 FW:UPS 15.0 / ID=18 (USB VID/PID 051d:0003) report descriptors
   0.038460     [D1] _apc_modbus_usb_callback: Found report ids RX=0x90 TX=0x89
   0.038523     [D5] send_to_all: SETINFO ups.vendorid "051d"
   0.038567     [D5] send_to_all: SETINFO ups.productid "0003"
Found Device 0 (Path 1-1.3):
  Vendor ID: 0x051d
  Product ID: 0x0003
Bytes flushed (63)
   0.050102     [D5] send_to_all: SETINFO driver.state "init.quiet"
   0.050195     [D5] send_to_all: SETINFO driver.version "2.8.1-403-g746cb8611"
   0.050259     [D5] send_to_all: SETINFO driver.version.internal "0.01"
   0.050323     [D5] send_to_all: SETINFO driver.name "apc_modbus"
   0.050384     [D5] send_to_all: SETINFO driver.state "init.info"
[01][03][02][04][00][58]
Waiting for a confirmation...
*** buffer overflow detected ***: terminated
Aborted

And my syslog has the entry at the time of trying to run the driver of: kernel: [ 2104.577931] usb 1-1.3: usbfs: process 1371 (apc_modbus) did not claim interface 0 before use

Not really sure where to start on this one if anyone has any ideas?

Can't say i've done any debugging before, but if someone could possibly give me a rundown on what i'd need to do to get something useful out of gdb, I could run through it. I'm aware that on Ubuntu i've apport installed by default for crash dumps and i've enabled it taking these in crashdb.conf

jimklimov commented 5 months ago

CC @EchterAgo

jimklimov commented 5 months ago

Searching in sources, the error does not seem to be reported - by neither NUT, nor libmodubs (rtu_usb branch), nor libusb (at least 1.0.26 that I have locally). Probably there is some security-conscious libmalloc variant used on your platform which detects (or claims) an overflow and causes the program to abort.

Arbitrarily, as common errors go, I would guess that in the driver we provide a buffer for I/O, maybe even pass the length, and the other side in one of those libraries does not honour this limit. Points to check:

Maybe it would symptomatically help to find the buffer involved and just allocate it larger than the expected transfers. Brute, uncivilized, prone to repeating if something changes later and we are none the wiser from debugging - but a quick and dirty hotfix, at least.

You both mentioned that kernel complains about lack of claiming the device. Maybe it is also involved, but I am not sure how :) If the kernel did hand off the device (and nobody else intervenes, especially with virtualization and pass-through involved in that #2063 comment), I suppose it does pass all the data between devices and programs honestly.

jimklimov commented 5 months ago

For gdb, generally you can start your driver program (with all competitors disabled) like gdb --args ./apc_modbus [-x key=val...] and it would run in the debugger until it fails. Then you can type bt to get a backtrace, unless that memory library causes some sort of clean exit which obscures things.

It may help to pass CFLAGS="-g3 -gdwarf-2 -O0" to configure to have debugging symbols built into the program.

jimklimov commented 5 months ago

For some cases, it may be simpler to run programs through strace, ptrace, truss or whatever your system offers, which should at least report the syscalls made -- often this is also sufficiently revealing.

cbrherms commented 5 months ago

I should probably mention that this is all being done on a raspberry pi 3b+

But, okay so here's the weird thing.... I've just gone and re-configured both libmodbus and nut with the CFLAGS and make / installed, and now the driver just started working. Albeit with a large number of connection lost and established messages:

Broadcast message from nut@nutpi (somewhere) (Mon Jan 29 00:22:25 2024):
Communications with UPS apc-2200@localhost established

Broadcast message from nut@nutpi (somewhere) (Mon Jan 29 00:22:25 2024):
Communications with UPS apc-2200@localhost established

Broadcast message from nut@nutpi (somewhere) (Mon Jan 29 00:22:35 2024):
Communications with UPS apc-2200@localhost lost

Broadcast message from nut@nutpi (somewhere) (Mon Jan 29 00:22:35 2024):
Communications with UPS apc-2200@localhost lost

Broadcast message from nut@nutpi (somewhere) (Mon Jan 29 00:22:40 2024):
Communications with UPS apc-2200@localhost established

Broadcast message from nut@nutpi (somewhere) (Mon Jan 29 00:22:40 2024):
Communications with UPS apc-2200@localhost established

Broadcast message from nut@nutpi (somewhere) (Mon Jan 29 00:23:50 2024):
Communications with UPS apc-2200@localhost lost

Broadcast message from nut@nutpi (somewhere) (Mon Jan 29 00:23:50 2024):
Communications with UPS apc-2200@localhost lost

Broadcast message from nut@nutpi (somewhere) (Mon Jan 29 00:23:55 2024):
Communications with UPS apc-2200@localhost established

Broadcast message from nut@nutpi (somewhere) (Mon Jan 29 00:23:55 2024):
Communications with UPS apc-2200@localhost established

The only thing that I can think of having changed between the two times building libmodbus, was that inbetween i'd installed nut and nut-cgi via apt, and as part of the nut package installation it installed libusb-0.1-4 as a dependency. I'd previously just had libusb-1.0-0-dev (2:1.0.25-1ubuntu2) installed the first time. May that be related?

jimklimov commented 5 months ago

I think the libmodbus changes for USB support might primarily require libusb-1.x - so wonder if it got somehow built against wrong implementation then (would rely on having the "dev" packages and so headers for both, and then mixing something up while building modbus and NUT). But that's mostly a guess.

I think the USB-capable drivers should report which libusb they use, in debug and recently (since 2.8.0) in upsc reports, e.g. on a box nearby I see:

driver.version.data: MGE HID 1.46
driver.version.internal: 0.52
driver.version.usb: libusb-1.0.24 (API: 0x1000108)

Regarding how it started working... maybe the nut package re-installation restarted udev service so it recognized the IDs needed for APC modbus? :\