Closed richardtector closed 1 year ago
A little more info via apctest:
2023-05-12 17:08:04 apctest 3.14.14 (31 May 2016) freebsd
Checking configuration ...
sharenet.type = Network & ShareUPS Disabled
cable.type = Custom Cable Smart
mode.type = APC Smart UPS (any)
Setting up the port ...
Doing prep_device() ...
You are using a SMART cable type, so I'm entering SMART test mode
Hello, this is the apcupsd Cable Test program.
This part of apctest is for testing Smart UPSes.
Please select the function you want to perform.
1) Query the UPS for all known values
2) Perform a Battery Runtime Calibration
3) Abort Battery Calibration
4) Monitor Battery Calibration progress
5) Program EEPROM
6) Enter TTY mode communicating with UPS
Q) Quit
Select function number: 1
I am going to run through the series of queries of the UPS
that are used in initializing apcupsd.
Simulating UPSlinkCheck ...
Wrote: Y Got: SM
Attempting to use smart_poll() ...
Sent: Y Got: SM Good -- smart_poll() works!.
Going to ask for valid commands...
Wrote: a Got: 4.?=*.
')-/789<>@ABCDEFGKLMNOPQRSUVWXYZ\^abcdefgjklmnopqrsuwxyz.▒▒▒▒▒▒▒
Protocol version is: 4
Alert characters are: ?=*^D
Command characters are: ^A^E^I^L^N^T^V^Z')-/789<>@ABCDEFGKLMNOPQRSUVWXYZ\^abcdefgjklmnopqrsuwxyz.^▒^^"^0^=^>^?
Now running through apcupsd get_UPS capabilities().
NA indicates that the feature is Not Available
PuTTYUPS Status: 08
Line quality: FF
Reason for last transfer to batteries: L
Self-Test Status: NO
Line Voltage: 237.6
Line Voltage Max: 237.6
Line Voltage Min: 234.7
Output Voltage: 226.9
Batt level percent: 100.0
Batt voltage: 54.54
UPS Load: 027.0
Line freq: 50.03
Runtime left: 0043
UPS Internal temp: 025.2
Dip switch settings: 00
Register 1: 00
Register 2: 00
Register 3: 00
Sensitivity: A
Wakeup delay: 000
Sleep delay: 020
Low transfer voltage: 161
High transfer voltage: 253
Batt charge for return: 00
Alarm status: 0
Low battery shutdown level: 02
UPS Name: UPS_IDEN
UPS Self test interval: 336
UPS manufacture date: 10/22/11
UPS serial number: QS1143231718
Date battery replaced: 10/22/11
Output voltage when on batteries: 230
Nominal battery voltage: 048
Percent humidity: NA
Ambient temperature: NA
Firmware revision: 411.8.I
Number of external batteries installed: 001
Number of bad batteries installed: 000
UPS model as defined by UPS: Smart-UPS RT 1000 XL
UPS EPROM capabilities string: uI43253242265276lI43161196184173oI43230240220225e4820015253550607590s411Aq4820205071012151820p483020060120240480720960000k4410TLNr483000020060120240480720960E443336168ON OFF▒445AUTO 60/.160/3 50/.150/3
The EPROM string is 202 characters long!
Hours since last self test: NA
That is all for now.
issue related to the EEPROM capabilities string?
I think you are going to have to turn up debugging even more and probably add debug calls in the source, which means building your own. I would then want to hand analyze the bytes from the serial port, and compare to the bytes that apcupsd-world programs get (and any spec). Basically keep looking harder at any part of this which seems to be going wrong. I don't have any APC wisdom, but I did something sort of similar for old Best Fortress which I got working somewhat better than it had been.
In a wider context, worth linking this question to https://github.com/networkupstools/nut/issues/139#issuecomment-1369527363
While that is not (originally) about Smart protocol support, importing apcupsd
capabilities into NUT might be an answer.
@richardtector : Cheers, any luck tracing the issue further? (Debug and whatnot?)
I've recently documented some use of IDEs to develop and debug NUT codebase. I wonder if NetBeans would fire up on FreeBSD to make it easier for you to step through the drivers as they talk to the device.
Hello again, a recent investigation with issue #2015 and PR #2016 seems fruitful, at least resolving the part about "nument/entlen out of range". Can you please check if a build from current NUT master branch gets your device recognized better?
https://github.com/networkupstools/nut/wiki/Building-NUT-for-in%E2%80%90place-upgrades-or-non%E2%80%90disruptive-tests can help about providing a build, to test right from the workspace (not changing the OS until you desire to).
TL/DR: SOLVED! - by replacing current 2.8.0 version of /lib/nut/apcsmart-old
with the older 2.7.4 version.
After upgrading Proxmox VE 7.4 to 8.0 (thus Debian 11 -> 12, NUT 2.7.4 -> NUT 2.8.0) I see the same problem:
# /lib/nut/apcsmart-old -a main
Network UPS Tools - APC Smart protocol driver 2.2 (2.8.0)
APC command table version 2.2
do_capabilities: nument (-1) or entlen (3) out of range
nument or entlen out of range
Please report this error
ERROR: capability overflow!
The 'main' UPS defined as:
[main]
driver = apcsmart-old
port = /dev/ttyS0
desc = "SURT2000XLI"
cable = 940-0024C
After replacing binary /lib/nut/apcsmart-old
with the older version from 2.7.4 package it works again.
Great that you found that, but that's not "SOLVED"; it is "worked around by running unmaintained code, in a way that is not sutstainable".
It would be good to read the code diffs from 2.7.4 to 2.8.0, and bisect. Also to build the 2.7.4 sources on your Debian 12 system, to separate "bug introduced in apcsmart-old" from "this code compiles wrong on D12". I suspect an issue in the code more than the build environment, but when debugging one should avoid assumptions.
FWIW, I haven't seen any reports above about testing if the current code (as in "master" branch builds) fixes some alleged regression of 2.8.0 release.
Yes, it's actually a workaround, sorry. At least it gives the idea, where the error might be.
I see that there are actually 2 insertions in the code since 2.7.4, except formatting and more precise type picking for some variables. They are just an assertions:
if (ptr[2] < 48 || ptr[3] < 48) {
upsdebugx(0,
"%s: nument (%d) or entlen (%d) out of range",
__func__, (ptr[2] - 48), (ptr[3] - 48));
fatalx(EXIT_FAILURE,
"nument or entlen out of range\n"
"Please report this error\n"
"ERROR: capability overflow!");
}
and
if (cnt > INT_MAX || cnt < 0) {
fatalx(EXIT_FAILURE, "Error: %s: cnt (%ld) is out of range", __func__, cnt);
}
Obiously the former gives the error.
Is this condition ptr[2] < 48 || ptr[3] < 48
really critical? Because everything was working without this check.
Let me know if I can help.
Well, on one hand, in current master https://github.com/networkupstools/nut/commit/bc14bf8f67c733eb446df087fa7f622d4eea66e2 in PR #2016 for issue #2015 removes the fatal part so it (mis?-)behaves like it did before with these values.
On another, the following lines are why the check appeared (otherwise we have a byte-wrap overflow or find negative caps values).
nument = (size_t)ptr[2] - 48;
entlen = (size_t)ptr[3] - 48;
Modulo careful type casting, this was also in 2.7.4 : https://github.com/networkupstools/nut/blob/v2.7.4/drivers/apcsmart-old.c#L387
There ptr
is a technically signed char*
array, and nument
/entlen
are (signed) architecture-dependent int
values.
Looking some more at it now, I guess if they're negative, the for (i = 0; i < nument; i++) {...}
loop effectively does not run and that's it.
Guess another revision is due to make sure this is what would happen despite size_t
:\
@albert-a @richardtector : can you give a shot to https://github.com/jimklimov/nut/tree/issue-1941 ?
Follow e.g. https://github.com/networkupstools/nut/wiki/Building-NUT-for-in%E2%80%90place-upgrades-or-non%E2%80%90disruptive-tests with a
git clone https://github.com/jimklimov/nut -b issue-1941
to initiate the workspace, in which to build and run the driver for a test data dump.
Hi - apologies for not following up with this. I should have a little more time next week to do some digging.
I've just installed sysutils/nut-devel from FreeBSD ports which is currently built from networkupstools-nut-2023.10.07-5dc797025.
The driver test appears to complete okay - output below.
Network UPS Tools - APC Smart protocol driver 3.31 (2.8.0.1)
APC command table version 3.1
0.000000 [D1] Network UPS Tools version 2.8.0.1 (release/snapshot of 2.8.0.1) built with FreeBSD clang version 16.0.6 (https://github.com/llvm/llvm-project.git llvmorg-16.0.6-0-g7cbf1a259152); Target: x86_64-unknown-freebsd13.2; Thread model: posix and configured with flags: --sysconfdir=/usr/local/etc/nut --program-transform-name= --localstatedir=/var/db/nut --datadir=/usr/local/etc/nut --with-devd-dir=/usr/local/etc/devd --with-drvpath=/usr/local/libexec/nut --with-statepath=/var/db/nut --with-altpidpath=/var/db/nut --with-pidpath=/var/db/nut --with-pkgconfig-dir=/usr/local/libdata/pkgconfig --with-user=nut --with-group=nut --with-python=/usr/local/bin/python3.10 --without-python2 --with-python3=/usr/local/bin/python3.10 --with-ltdl --with-nut-scanner --without-avahi --without-cgi --with-dev --without-freeipmi --without-ipmi --with-doc=no --without-modbus --without-neon --without-nss --with-openssl --without-powerman --with-serial --without-snmp --without-usb --prefix=/usr/local --mandir=/usr/local/man --disable-silent-rules --infodir=/usr/local/share/info/ --build=amd64-portbld-freebsd13.2
0.000029 [D1] debug level is '1'
0.000279 [D1] Succeeded to become_user(nut): now UID=316 GID=316
0.057101 [D1] attempting firmware lookup using [V]
0.126868 [D1] detected firmware version: 5YI
0.126919 [D1] attempting var/cmdset lookup using [a]
1.075106 [D1] parsing out supported cmds/vars
1.075122 [D1] preread_data: ups.model [0x01]
1.364266 [D1] ups.model [0x01] - variable supported
1.364278 [0x05] unrecognized
1.364282 [0x09] unrecognized
1.364287 [D1] preread_data: ups.display.language [0x0c]
1.414029 [D1] ups.display.language [0x0c] - variable supported
1.414038 [D1] load.on [0x0e] - command supported
1.414042 [0x14] unrecognized
1.414046 [0x16] unrecognized
1.414051 [D1] preread_data: output.current [/]
1.474014 preread_data: output.current [/] timed out or not supported
1.474024 output.current [/] - variable invalid
1.474028 [7] unrecognized
1.474032 [D1] preread_data: input.quality [9]
1.533997 [D1] input.quality [9] - variable supported
1.534005 [D1] preread_data: battery.packs.bad [<]
1.603762 [D1] battery.packs.bad [<] - variable supported
1.603769 [D1] preread_data: battery.packs [>]
1.683708 [D1] battery.packs [>] - variable supported
1.683718 [D1] shutdown.return [@] - command supported
1.683724 [D1] test.panel.start [A] - command supported
1.683728 [D1] preread_data: battery.voltage [B]
1.783263 [D1] battery.voltage [B] - variable supported
1.783271 [D1] preread_data: ups.temperature [C]
1.883228 [D1] ups.temperature [C] - variable supported
1.883242 [D1] calibrate.start [D] - command supported
1.883248 [D1] calibrate.stop [D] - command supported
1.883253 [D1] preread_data: ups.test.interval [E]
1.952999 [D1] ups.test.interval [E] - variable supported
1.953008 [D1] preread_data: input.frequency [F]
2.052947 [D1] input.frequency [F] - variable supported
2.052960 [D1] preread_data: input.transfer.reason [G]
2.102691 [D1] input.transfer.reason [G] - variable supported
2.102703 [D1] shutdown.stayoff [K] - command supported
2.102708 [D1] preread_data: input.voltage [L]
2.202678 [D1] input.voltage [L] - variable supported
2.202687 [D1] preread_data: input.voltage.maximum [M]
2.302209 [D1] input.voltage.maximum [M] - variable supported
2.302219 [D1] preread_data: input.voltage.minimum [N]
2.402180 [D1] input.voltage.minimum [N] - variable supported
2.402190 [D1] preread_data: output.voltage [O]
2.501710 [D1] output.voltage [O] - variable supported
2.501720 [D1] preread_data: ups.load [P]
2.601678 [D1] ups.load [P] - variable supported
2.601689 [D1] shutdown.return [S] - command supported
2.601695 [D1] test.failure.start [U] - command supported
2.601701 [D1] test.battery.start [W] - command supported
2.601706 [D1] test.battery.stop [W] - command supported
2.601710 [D1] preread_data: ups.test.result [X]
2.661626 [D1] ups.test.result [X] - variable supported
2.661640 [D1] load.off [Z] - command supported
2.661645 [\] unrecognized
2.661651 [D1] bypass.start [^] - command supported
2.661656 [D1] bypass.stop [^] - command supported
2.661660 [D1] preread_data: ups.id [c]
2.801163 [D1] ups.id [c] - variable supported
2.801172 [d] unrecognized
2.801178 [D1] preread_data: battery.charge.restart [e]
2.861122 [D1] battery.charge.restart [e] - variable supported
2.861132 [D1] preread_data: battery.charge [f]
2.960688 [D1] battery.charge [f] - variable supported
2.960697 [D1] preread_data: battery.voltage.nominal [g]
3.030858 [D1] battery.voltage.nominal [g] - variable supported
3.030867 [D1] preread_data: battery.runtime [j]
3.130429 [D1] battery.runtime [j] - variable supported
3.130441 [D1] preread_data: battery.alarm.threshold [k]
3.180164 [D1] battery.alarm.threshold [k] - variable supported
3.180172 [D1] preread_data: input.transfer.low [l]
3.270348 [D1] input.transfer.low [l] - variable supported
3.270358 [D1] preread_data: ups.mfr.date [m]
3.409846 [D1] ups.mfr.date [m] - variable supported
3.409855 [D1] preread_data: ups.serial [n]
3.592734 [D1] ups.serial [n] - variable supported
3.592743 [D1] preread_data: output.voltage.nominal [o]
3.659154 [D1] output.voltage.nominal [o] - variable supported
3.659163 [D1] preread_data: ups.delay.shutdown [p]
3.729330 [D1] ups.delay.shutdown [p] - variable supported
3.729342 [D1] preread_data: battery.runtime.low [q]
3.788864 [D1] battery.runtime.low [q] - variable supported
3.788873 [D1] preread_data: ups.delay.start [r]
3.858650 [D1] ups.delay.start [r] - variable supported
3.858659 [D1] preread_data: input.sensitivity [s]
3.908828 [D1] input.sensitivity [s] - variable supported
3.908836 [D1] preread_data: input.transfer.high [u]
3.998584 [D1] input.transfer.high [u] - variable supported
3.998591 [w] unrecognized
3.998596 [D1] preread_data: battery.date [x]
4.138093 [D1] battery.date [x] - variable supported
4.138100 [D1] preread_data: ups.firmware [b]
4.247905 [D1] ups.firmware [b] - variable combination supported
4.247917 [D1] parsing out caps
6.572518 [D1] input.transfer.high [u(I)] - capability supported
6.572555 [D1] input.transfer.low [l(I)] - capability supported
6.572571 [D1] output.voltage.nominal [o(I)] - capability supported
6.572587 [D1] battery.charge.restart [e(4)] - capability supported
6.572609 [D1] input.sensitivity [s(4)] - capability supported
6.572618 [D1] battery.runtime.low [q(4)] - capability supported
6.572647 [D1] ups.delay.shutdown [p(4)] - capability supported
6.572670 [D1] battery.alarm.threshold [k(4)] - capability supported
6.572686 [D1] ups.delay.start [r(4)] - capability supported
6.572714 [D1] ups.test.interval [E(4)] - capability supported
6.572734 [D1] detected Smart-UPS RT 1000 XL [QS1143231718 ] on /dev/cuau0
6.572744 [D1] update_status: [Q]
6.632493 [D1] update_info: starting scan (all vars)
6.632503 [D1] poll_data: ups.temperature [C]
6.732442 [D1] poll_data: ups.load [P]
6.831998 [D1] poll_data: ups.test.interval [E]
6.901749 [D1] poll_data: ups.test.result [X]
6.961723 [D1] poll_data: ups.delay.start [r]
7.031485 [D1] poll_data: ups.delay.shutdown [p]
7.101255 [D1] poll_data: ups.id [c]
7.241175 [D1] poll_data: ups.display.language [0x0c]
7.290963 [D1] poll_data: input.voltage [L]
7.390939 [D1] poll_data: input.frequency [F]
7.490483 [D1] poll_data: input.sensitivity [s]
7.540229 [D1] poll_data: input.quality [9]
7.600214 [D1] poll_data: input.transfer.low [l]
7.689972 [D1] poll_data: input.transfer.high [u]
7.779722 [D1] poll_data: input.transfer.reason [G]
7.829896 [D1] poll_data: input.voltage.maximum [M]
7.929469 [D1] poll_data: input.voltage.minimum [N]
8.029419 [D1] poll_data: output.voltage [O]
8.128971 [D1] poll_data: output.voltage.nominal [o]
8.198718 [D1] poll_data: battery.date [x]
8.338638 [D1] poll_data: battery.charge [f]
8.438221 [D1] poll_data: battery.charge.restart [e]
8.498162 [D1] poll_data: battery.voltage [B]
8.598148 [D1] poll_data: battery.voltage.nominal [g]
8.667894 [D1] poll_data: battery.runtime [j]
8.767435 [D1] poll_data: battery.runtime.low [q]
8.827403 [D1] poll_data: battery.packs [>]
8.907384 [D1] poll_data: battery.packs.bad [<]
8.977138 [D1] poll_data: battery.alarm.threshold [k]
9.026894 [D1] poll_data: ups.serial [n]
9.209780 [D1] poll_data: ups.mfr.date [m]
9.346335 [D1] poll_data: ups.model [0x01]
9.635605 [D1] poll_data: ups.firmware [b]
9.745352 [D1] update_info: scan completed
9.745472 Running as foreground process, not saving a PID file
9.745493 [D1] Driver initialization completed, beginning regular infinite loop
There's a little bit of upset during the driver startup but otherwise it seems to be running fine and reporting all the right info
root@daffy:/usr/ports/sysutils/nut-devel # /usr/local/etc/rc.d/nut start
Network UPS Tools - UPS driver controller 2.8.0.1
Network UPS Tools - APC Smart protocol driver 3.31 (2.8.0.1)
APC command table version 3.1
[0x05] unrecognized
[0x09] unrecognized
[0x14] unrecognized
[0x16] unrecognized
preread_data: output.current [/] timed out or not supported
output.current [/] - variable invalid
[7] unrecognized
[\] unrecognized
[d] unrecognized
[w] unrecognized
Starting nut.
Network UPS Tools upsd 2.8.0.1
fopen /var/db/nut/upsd.pid: No such file or directory
Could not find PID file '/var/db/nut/upsd.pid' to see if previous upsd instance is already running!
/usr/local/etc/nut/upsd.conf is world readable
listening on 10.0.2.2 port 3493
Connected to UPS [ups1]: apcsmart-ups1
Found 1 UPS defined in ups.conf
I can give your branch a test but it'll be next week I think.
Richard
@richardtector : thanks! I suppose the version in ports includes the earlier fix to not fail the driver fatally, however it skips some further initialization when it sees "bad" caps which NUT v2.7.4 and this new PR should be performing. Not quickly sure if this is what the "upset" messages above are about, but might be.
Hi,
I'm running an APC Smart UPS RT 1000 XL which I'm having a little trouble monitoring from a FreeBSD 13 system, nut 2.8.0 installed from ports. The UPS is connected with the OEM serial cable to COM1. ups.conf:
Testing with both the apcsmart and apcsmart-old driver I get a similar error: "do_capabilities: nument (-1) or entlen (3) out of range"
I've tried with "port = /dev/ttyu0" with the same outcome.
I can monitor successfully with apcupsd (output at the bottom if it helps) but I was hoping to get a native solution working with NUT. I'm happy to tinker to get this working if someone could guide me.
Thanks,
Richard
Change ups.conf to new driver and:
apcaccess output: