networkupstools / nut

The Network UPS Tools repository. UPS management protocol Informational RFC 9271 published by IETF at https://www.rfc-editor.org/info/rfc9271 Please star NUT on GitHub, this helps with sponsorships!
https://networkupstools.org/
Other
2.09k stars 351 forks source link

`apcsmart` lost communication with UPS results in intense syslog flood #704

Open tomiisp opened 5 years ago

tomiisp commented 5 years ago

Hi,

I get this issue second time, nutups lost communication with UPS (via USB/Serial cable) and nut tools and syslog start eating all 4 cores (cpu quickly reach temperature 78C ), it produce huge log file (my poor SD card...) and it produce about 4500lines in log per second ! Entries in log looks:

Jun 17 22:00:08 iotgwpc2 apcsmart[1285]: Warning: excessive comm failures, limiting error reporting

Jun 17 22:00:08 iotgwpc2 apcsmart[1285]: Communications with UPS lost: serial port write error: 1379(smartmode): Input/output error

Jun 17 22:00:08 iotgwpc2 apcsmart[1285]: message repeated 9 times: [ Communications with UPS lost: serial port write error: 1379(smartmode): Input/output error]

That USB/serial is only temporary solution, later UPS will be connected directly to onboard UART but this is insane amount of error messages and rate. Is this a bug or there is a option to limit this error messages ?

Orange PI PC2 - Armbian 4.19.38-sunxi64 #5.86 SMP
Network UPS Tools - UPS driver controller 2.7.4

/Tomi

fatbasstard commented 2 years ago

Hitting the same behaviour now, any progress on this?

jimklimov commented 2 years ago

I am not aware of anyone addressing this specifically, so probably fair to say it is a bug, and probably it is still present. Tested PRs for throttling the message emission (maybe slower backoff to retry connecting?) are welcome.

olicooper commented 1 year ago

Just had the same happen to me on a Raspberry Pi. Filled my 250GB SSD which subsequently made the Home Assistant database get corrupted. No way of catching it that quickly since it happened while I was sleeping. I'm not happy about this at all.

Any solution or workaround to this? I've just disabled nut for now.

jimklimov commented 1 year ago

Was that also with apcsmart driver? Probably a solution in NUT could be to throttle it sending the error message (or add a config toggle for that effect - e.g. send disconnect infos once at all, or once every N minutes).

With HA involved, the practical solution would also depend on getting modern NUT running there instead of the older package (see wiki for contributed article about custom-building a container).

Another vector could be to configure your syslog daemon log rotation and/or throttling of same messages (would help storage at least, if not cpu stress).

Finally, try to figure out the nature of disconnects and how to cause a reconnect or driver restart - PRs welcome. This would be an actual fix :)

Jim

On Tue, May 2, 2023, 13:53 Oli Cooper @.***> wrote:

Just had the same happen to me on a Raspberry Pi. Filled my 250GB SSD which subsequently made the Home Assistant database get corrupted. No way of catching it that quickly since it happened while I was sleeping. I'm not happy about this at all.

Any solution or workaround to this? I've just disabled nut for now.

— Reply to this email directly, view it on GitHub https://github.com/networkupstools/nut/issues/704#issuecomment-1531337019, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMPTFEFZ7ZG3NKD34H3TITXEDYSNANCNFSM4HY2BTEQ . You are receiving this because you commented.Message ID: @.***>

nnov4k commented 3 months ago

Hello! i have some problem, is any ideas, how to fix syslog flooding by apcsmart?

jimklimov commented 3 months ago

Not that I'm aware of, nobody posted related PRs or investigation notes...

In-driver throttling could be relatively easy, there are a few precedents in other NUT programs.

Restarting the driver or the connection if the situation persists (and if it does mean the running copy of driver becomes useless for data collection) might kickstart from that same timestamp tracking involved in throttling.

Getting to the root of it (faulty HW/FW? any mitigations?) would be harder.

nnov4k commented 3 months ago

baaaad :( i made a script, which kills stupid driver when UPS deads... it fixes a system stability, but problem still remains...

ACTION=="add", SUBSYSTEM=="usb", ATTR{idVendor}=="067b", ATTR{idProduct}=="2303", MODE="0660", GROUP="nut", RUN+="/bin/nut-switch-restart.sh"
ACTION=="remove", SUBSYSTEM=="usb", ATTR{idVendor}=="067b", ATTR{idProduct}=="2303", MODE="0660", GROUP="nut", RUN+="/bin/systemctl stop nut-driver"
jimklimov commented 3 months ago

Just in case, would you care to contribute your script and a README about it to NUT, for others to use?

Wiki page https://github.com/networkupstools/nut/wiki/Troubleshooting-eventual-disconnections-(Data-stale) refers to a couple of such scripted and documented know-how remedies, one more would be good... At least, better have a bad way forward for the time being, than none at all.

jimklimov commented 3 months ago

Finally got to dig around the code and try to guess the issue.

So, my guess is that at some point the UPS controller pushes out some message into the buffer, so FD is sort of ready to be read. But something fails while reading it, maybe the apc_read_*() methods fail early (leaving part of buffer unread) and there would be a logged message just before the flood?

Supposedly, at this point the buffer is not empty and dstate_poll_fds() returns immediately.

Further attempts to upsdrv_updateinfo() start with setting of smartmode and that fails, maybe because the controller is still pushing out that earlier message and the buffer was not emptied by the driver.

After a few quiet retries, the driver bombards the controller with smartmode requests as part of failure recovery, probably DoS'ing the UPS controller even further.

Some ideas to try would be:

UPDATE: One more thing of note: in apc_read() there are timeouts for the select() from the upsfd, different for varying flags bits. Maybe those are insufficient if the controller on the UPS is old and clocked out of spec, or the host (or its USB bus) is busy context-switching and traffic-sharing, and does not deliver the message in time?

jimklimov commented 3 months ago

Posted a PR based on the findings and guesses above. Testing would be very much welcome.

jimklimov commented 3 months ago

Did anyone manage to try testing the proposed code? Instructions are in the PR #2564

jimklimov commented 3 months ago

Bump? I'm eager to merge that change, and head towards a new NUT release, but would love confirmations first that it does not actually break the driver into a worse state than now :)

nnov4k commented 2 months ago

compiled 2.8.2, service starts OK, but does not accept connections to itself at start. After some minutes connects OK. testing... :)

log: root@ha:/etc/nut# upsdrvctl status Network UPS Tools upsdrvctl - UPS driver controller 2.8.2.922-922-g575d42369 (development iteration after 2.8.2) UPSNAME UPSDRV RUNNING PF_PID S_RESPONSIVE S_PID S_STATUS apc apcsmart N/A -3 NOT_RESPONSIVE N/A root@ha:/etc/nut# upsdrvctl start Network UPS Tools upsdrvctl - UPS driver controller 2.8.2.922-922-g575d42369 (development iteration after 2.8.2) Network UPS Tools driver 2.8.2.922-922-g575d42369 (development iteration after 2.8.2) - APC Smart protocol driver 3.33 APC command table version 3.1 [d] unrecognized root@ha:/etc/nut# upsdrvctl status Network UPS Tools upsdrvctl - UPS driver controller 2.8.2.922-922-g575d42369 (development iteration after 2.8.2) UPSNAME UPSDRV RUNNING PF_PID S_RESPONSIVE S_PID S_STATUS apc apcsmart RUNNING 6652 RESPONSIVE 6652 "OL LB"

root@ha:/etc/nut# service nut-driver start root@ha:/etc/nut# service nut-driver status ● nut-driver.service - Network UPS Tools - power device driver controller Loaded: loaded (/lib/systemd/system/nut-driver.service; static) Active: inactive (dead)

Aug 20 21:33:37 ha upsdrvctl[6662]: [d] unrecognized Aug 20 21:33:37 ha apcsmart[6662]: [d] unrecognized Aug 20 21:33:43 ha apcsmart[6683]: Startup successful Aug 20 21:33:43 ha apcsmart[6683]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it Aug 20 21:33:43 ha systemd[1]: Started Network UPS Tools - power device driver controller. Aug 20 21:33:43 ha systemd[1]: Stopping Network UPS Tools - power device driver controller... Aug 20 21:33:43 ha upsdrvctl[6684]: Network UPS Tools upsdrvctl - UPS driver controller 2.8.2.922-922-g575d42369 (development iteration after 2.8.2) Aug 20 21:33:43 ha apcsmart[6683]: Signal 15: exiting Aug 20 21:33:45 ha systemd[1]: nut-driver.service: Succeeded. Aug 20 21:33:45 ha systemd[1]: Stopped Network UPS Tools - power device driver controller. root@ha:/etc/nut# service nut-server status ● nut-server.service - Network UPS Tools - power devices information server Loaded: loaded (/lib/systemd/system/nut-server.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2024-08-20 21:29:54 MSK; 4min 20s ago Process: 6414 ExecStartPre=/usr/bin/systemd-tmpfiles --create /nut-common-tmpfiles.conf (code=exited, status=1/FAILURE) Process: 6416 ExecStartPost=/bin/grep -E Units|Max open files /proc/${MAINPID}/limits (code=exited, status=0/SUCCESS) Main PID: 6415 (upsd) Tasks: 1 (limit: 2171) Memory: 1.5M CGroup: /system.slice/nut-server.service └─6415 /usr/sbin/upsd -F

Aug 20 21:29:54 ha nut-server[6415]: Running as foreground process, not saving a PID file Aug 20 21:29:54 ha nut-server[6415]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it Aug 20 21:29:54 ha upsd[6415]: Running as foreground process, not saving a PID file Aug 20 21:29:54 ha upsd[6415]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it Aug 20 21:29:58 ha nut-server[6415]: User net@192.168.2.3 logged into UPS [apc] Aug 20 21:29:58 ha upsd[6415]: User net@192.168.2.3 logged into UPS [apc] Aug 20 21:31:07 ha nut-server[6415]: Connected to UPS [apc]: apcsmart-apc Aug 20 21:31:07 ha upsd[6415]: Connected to UPS [apc]: apcsmart-apc Aug 20 21:33:15 ha nut-server[6415]: Connected to UPS [apc]: apcsmart-apc Aug 20 21:33:15 ha upsd[6415]: Connected to UPS [apc]: apcsmart-apc

root@ha:/etc/nut# upsc -L apc: APC SC420i @ HA root@ha:/etc/nut# upsc -V Network UPS Tools upsc 2.8.2.922-922-g575d42369 (development iteration after 2.8.2) root@ha:/etc/nut# upsc apc Error: Driver not connected root@ha:/etc/nut# upsc apc@localhost Error: Driver not connected

but.... after 5 minutes...

root@ha:/var/log# upsdrvctl start Network UPS Tools upsdrvctl - UPS driver controller 2.8.2.922-922-g575d42369 (development iteration after 2.8.2) Network UPS Tools driver 2.8.2.922-922-g575d42369 (development iteration after 2.8.2) - APC Smart protocol driver 3.33 APC command table version 3.1 [d] unrecognized root@ha:/var/log# upsc apc battery.alarm.threshold: 0 battery.charge: 100.0 battery.charge.low: 80 battery.charge.restart: 00 battery.date: 11/14/06 battery.runtime: 360 battery.runtime.low: 600 battery.voltage: 13.69 battery.voltage.nominal: 012 device.mfr: APC device.model: Smart-UPS SC420 device.serial: QS0647321282 device.type: ups driver.debug: 0 driver.flag.allow_killpower: 0 driver.name: apcsmart driver.parameter.override.battery.charge.low: 80 driver.parameter.pollinterval: 2 driver.parameter.port: /dev/serial/by-id/usb-Prolific_Technology_Inc._USB_2.0_To_COM_Device-if00-port0 driver.parameter.synchronous: auto driver.state: updateinfo driver.version: 2.8.2.922-922-g575d42369 driver.version.internal: 3.33 input.frequency: 50.00 input.quality: FF input.sensitivity: M input.transfer.high: 253 input.transfer.low: 200 input.transfer.reason: unacceptable utility voltage rate of change input.voltage: 213.1 input.voltage.maximum: 213.1 input.voltage.minimum: 211.6 output.voltage: 211.6 output.voltage.nominal: 230 ups.delay.shutdown: 060 ups.delay.start: 000 ups.firmware: 725.1.I ups.id: UPS_IDEN ups.load: 031.8 ups.mfr: APC ups.mfr.date: 11/14/06 ups.model: Smart-UPS SC420 ups.serial: QS0647321282 ups.status: OL LB ups.test.interval: 1209600 ups.test.result: NO

jimklimov commented 2 months ago

Just to clarify: there are several programs (wrapped into services) connecting not to "itself" but to each other, e.g. the driver (e.g. apcsmart) talking over a local unix socket to the data server (upsd, nut-server.service), which in turn talks over TCP to clients like upsc or upsmon (nut-monitor).

Logs seem to indicate that the driver did not start, and so upsd can not talk to it.

You were starting it by upsdrvctl as in the olden times; nowadays on Linux the nut-driver-enumerator service (if installed and activated) could have created the nut-driver@yourdevicename.service instance and it would conflict with a driver started manually - more details on Wiki. This is consistent with it eventually working, as systemd restarted a failed service instance (I suppose).

jimklimov commented 2 months ago

In any case, thanks for the report - also in the area that upsdrvctl might warn about systemd support being present, and its implications. :)

I also thought the two copies of the driver with modern protocol model would find the older running copy and arrange... something.

Not sure what [d] unrecognized is about, but if they both did manage to connect to the serial port and each process reads every other byte or so, that could be confusing too...

nnov4k commented 2 months ago

after a while (~20h) driver kills syslog with flood write :)

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND

13773 root 20 0 440556 86688 5812 R 148.3 4.3 25:49.76 rsyslogd 7534 nut 20 0 2432 1552 1408 R 100.0 0.1 21:51.33 apcsmart

service nut-server stop service nut-driver stop does not kills process

logs: daemon.log: Aug 21 19:32:21 ha apcsmart[7534]: Communications with UPS lost: serial port read timeout: 816(poll_data) Aug 21 19:32:21 ha apcsmart[7534]: Communications with UPS lost: serial port write error: 1447(smartmode): Input/output error Aug 21 19:32:21 ha nut-server[6415]: Data for UPS [apc] is stale - check driver Aug 21 19:32:21 ha apcsmart[7534]: Communications with UPS lost: serial port write error: 1447(smartmode): Input/output error flood...

kernel.log detach + attach usb hub + ups usb-com converter

messages: Aug 21 19:30:05 ha rsyslogd[13773]: imjournal: journal files changed, reloading... [v8.2102.0 try https://www.rsyslog.com/e/0 ] Aug 21 19:32:21 ha kernel: [79785.565042] pl2303 ttyUSB2: pl2303 converter now disconnected from ttyUSB2 Aug 21 19:32:21 ha kernel: [79785.568009] pl2303 1-1.3.4:1.0: device disconnected Aug 21 19:32:21 ha kernel: pl2303 ttyUSB2: pl2303 converter now disconnected from ttyUSB2 Aug 21 19:32:22 ha kernel: [79786.860498] usb 1-1.3.4: New USB device found, idVendor=067b, idProduct=2303, bcdDevice= 3.00 Aug 21 19:32:22 ha kernel: [79786.860525] usb 1-1.3.4: New USB device strings: Mfr=1, Product=2, SerialNumber=0 Aug 21 19:32:22 ha kernel: [79786.860532] usb 1-1.3.4: Product: USB 2.0 To COM Device Aug 21 19:32:22 ha kernel: [79786.860537] usb 1-1.3.4: Manufacturer: Prolific Technology Inc. Aug 21 19:32:22 ha kernel: [79786.926207] pl2303 1-1.3.4:1.0: pl2303 converter detected Aug 21 19:32:22 ha kernel: [79786.933207] usb 1-1.3.4: pl2303 converter now attached to ttyUSB7 Aug 21 19:32:22 ha kernel: usb 1-1.3.4: New USB device found, idVendor=067b, idProduct=2303, bcdDevice= 3.00 Aug 21 19:32:22 ha kernel: usb 1-1.3.4: New USB device strings: Mfr=1, Product=2, SerialNumber=0 Aug 21 19:32:22 ha kernel: usb 1-1.3.4: Product: USB 2.0 To COM Device Aug 21 19:32:22 ha kernel: usb 1-1.3.4: Manufacturer: Prolific Technology Inc. Aug 21 19:32:22 ha kernel: pl2303 1-1.3.4:1.0: pl2303 converter detected Aug 21 19:32:22 ha kernel: usb 1-1.3.4: pl2303 converter now attached to ttyUSB7 Aug 21 19:32:34 ha rsyslogd: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="13773" x-info="https://www.rsyslog.com"] rsyslogd was HUPed Aug 21 19:32:34 ha rsyslogd[13773]: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="13773" x-info="https://www.rsyslog.com"] rsyslogd was HUPed Aug 21 19:32:40 ha rsyslogd: rsyslogd[internal_messages]: 23744 messages lost due to rate-limiting (500 allowed within 5 seconds) Aug 21 19:32:40 ha rsyslogd[13773]: rsyslogd[internal_messages]: 23744 messages lost due to rate-limiting (500 allowed within 5 seconds) Aug 21 19:32:46 ha rsyslogd: rsyslogd[internal_messages]: 23840 messages lost due to rate-limiting (500 allowed within 5 seconds) Aug 21 19:32:46 ha rsyslogd[13773]: rsyslogd[intern

syslog: Aug 21 19:32:21 ha kernel: [79785.353339] pl2303 ttyUSB2: usb_serial_generic_read_bulk_callback - urb stopped: -32 Aug 21 19:32:21 ha kernel: [79785.353339] pl2303 ttyUSB2: usb_serial_generic_read_bulk_callback - urb stopped: -32 Aug 21 19:32:21 ha kernel: [79785.519601] pl2303 ttyUSB2: pl2303_read_int_callback - usb_submit_urb failed with result -19 Aug 21 19:32:21 ha kernel: [79785.519601] pl2303 ttyUSB2: pl2303_read_int_callback - usb_submit_urb failed with result -19 Aug 21 19:32:21 ha kernel: [79785.553105] pl2303 ttyUSB2: error sending break = -19 Aug 21 19:32:21 ha kernel: [79785.553105] pl2303 ttyUSB2: error sending break = -19 Aug 21 19:32:21 ha kernel: [79785.565042] pl2303 ttyUSB2: pl2303 converter now disconnected from ttyUSB2 Aug 21 19:32:21 ha kernel: [79785.565042] pl2303 ttyUSB2: pl2303 converter now disconnected from ttyUSB2 Aug 21 19:32:21 ha kernel: [79785.568009] pl2303 1-1.3.4:1.0: device disconnected Aug 21 19:32:21 ha kernel: [79785.568009] pl2303 1-1.3.4:1.0: device disconnected Aug 21 19:32:21 ha apcsmart[7534]: Communications with UPS lost: serial port read timeout: 816(poll_data) Aug 21 19:32:21 ha apcsmart[7534]: Communications with UPS lost: serial port read timeout: 816(poll_data) Aug 21 19:32:21 ha apcsmart[7534]: Communications with UPS lost: serial port write error: 1447(smartmode): Input/output error Aug 21 19:32:21 ha apcsmart[7534]: Communications with UPS lost: serial port write error: 1447(smartmode): Input/output error Aug 21 19:32:21 ha nut-server[6415]: Data for UPS [apc] is stale - check driver Aug 21 19:32:21 ha apcsmart[7534]: Communications with UPS lost: serial port write error: 1447(smartmode): Input/output error Aug 21 19:32:21 ha apcsmart[7534]: Communications with UPS lost: serial port write error: 1447(smartmode): Input/output error flood... Aug 21 19:32:21 ha apcsmart[7534]: Communications with UPS lost: serial port write error: 1447(smartmode): Input/output error Aug 21 19:32:21 ha apcsmart[7534]: WARNING: send_to_all: write 34 bytes to socket 6 failed (ret=-1), disconnecting: Resource temporarily unavailable Aug 21 19:32:21 ha apcsmart[7534]: send_to_all: synchronous mode was 'auto', will try 'on' for next connections

jimklimov commented 2 months ago

Whoa, wait. It seems you've built from commit 575d42369 which is current tip of master branch. The PR is still open - its changes are not in your build.

Can you please retry with its actual source branch? :)

e.g.

:; git clone https://github.com/jimklimov/nut -b issue-704 nut-issue-704
:; cd nut-issue-704
# the config/build rituals
nnov4k commented 2 months ago

of course, wait a moment

jimklimov commented 2 months ago

Cheers, any news, by chance? :)

nnov4k commented 2 months ago

i'm sorry for delay... ) everything is OK, driver correctly works with hardware restarts uptime 3 days

root@ha:/var/log.hdd# service nut-server status ● nut-server.service - Network UPS Tools - power devices information server Loaded: loaded (/lib/systemd/system/nut-server.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2024-08-21 23:44:20 MSK; 2 days ago Process: 56418 ExecStartPre=/usr/bin/systemd-tmpfiles --create /nut-common-tmpfiles.conf (code=exited, status=1/FAILURE) Process: 56421 ExecStartPost=/bin/grep -E Units|Max open files /proc/${MAINPID}/limits (code=exited, status=0/SUCCESS) Main PID: 56420 (upsd) Tasks: 1 (limit: 2171) Memory: 2.2M CGroup: /system.slice/nut-server.service └─56420 /usr/sbin/upsd -F

Warning: journal has been rotated since unit was started, output may be incomplete. root@ha:/var/log.hdd# service nut-driver status ● nut-driver.service - Network UPS Tools - power device driver controller Loaded: loaded (/lib/systemd/system/nut-driver.service; static) Active: inactive (dead) root@ha:/var/log.hdd# upsc apc battery.alarm.threshold: 0 battery.charge: 100.0 battery.charge.low: 80 battery.charge.restart: 00 battery.date: 11/14/06 battery.runtime: 1020 battery.runtime.low: 600 battery.voltage: 13.77 battery.voltage.nominal: 012 device.mfr: APC device.model: Smart-UPS SC420 device.serial: QS0647321282 device.type: ups driver.debug: 0 driver.flag.allow_killpower: 0 driver.name: apcsmart driver.parameter.override.battery.charge.low: 80 driver.parameter.pollinterval: 2 driver.parameter.port: /dev/serial/by-id/usb-Prolific_Technology_Inc._USB_2.0_To_COM_Device-if00-port0 driver.parameter.synchronous: auto driver.state: updateinfo driver.version: 2.7.4.10178.17-10195-gd7f6d61b2 driver.version.internal: 3.34 input.frequency: 50.00 input.quality: FF input.sensitivity: M input.transfer.high: 253 input.transfer.low: 200 input.transfer.reason: simulated power failure or UPS test input.voltage: 217.4 input.voltage.maximum: 217.4 input.voltage.minimum: 214.5 output.voltage: 214.5 output.voltage.nominal: 230 ups.delay.shutdown: 060 ups.delay.start: 000 ups.firmware: 725.1.I ups.id: UPS_IDEN ups.load: 018.2 ups.mfr: APC ups.mfr.date: 11/14/06 ups.model: Smart-UPS SC420 ups.serial: QS0647321282 ups.status: OL ups.test.interval: 1209600 ups.test.result: NO root@ha:/var/log.hdd#

jimklimov commented 2 months ago

Super! And no more log-floods?

jimklimov commented 2 months ago

In case of a hiccup that would flood before, it should log "recovered the connection" now eventually.

Off-topic, version: 2.7.4.10178.17-10195-gd7f6d61b2 looks fun. I guess your checkout somehow missed newer git tags, so based the local view of history on the newest it had... 10k commits on master since Arnaud last released NUT, how time flies...

jimklimov commented 2 months ago

Also, nowadays nut-driver.service as such should be inert. NDE (see wiki) would manage variable-named instances like nut-driver@apc.service.

nnov4k commented 2 months ago

as a result i think i have bad china USB to RS232 converter, not fully transparent for UPS. but new version really fixes flood.

===autorun failed Aug 26 18:57:01 ha systemd[1]: nut-monitor.service: Scheduled restart job, restart counter is at 5. Aug 26 18:57:01 ha systemd[1]: Stopped Network UPS Tools - power device monitor and shutdown controller. Aug 26 18:57:01 ha systemd[1]: nut-server.service: Start request repeated too quickly. Aug 26 18:57:01 ha systemd[1]: nut-server.service: Failed with result 'exit-code'. Aug 26 18:57:01 ha systemd[1]: Failed to start Network UPS Tools - power devices information server. Aug 26 18:57:01 ha systemd[1]: nut-monitor.service: Start request repeated too quickly. Aug 26 18:57:01 ha systemd[1]: nut-monitor.service: Failed with result 'exit-code'. Aug 26 18:57:01 ha systemd[1]: Failed to start Network UPS Tools - power device monitor and shutdown controller.

===after manual upsdrvctl start Aug 26 22:11:31 ha systemd[1]: Starting Network UPS Tools - power device driver controller... Aug 26 22:11:31 ha upsdrvctl[15789]: Network UPS Tools upsdrvctl - UPS driver controller 2.7.4.10178.17-10195-gd7f6d61b2 (development iteration after 2.7.4) Aug 26 22:11:31 ha upsdrvctl[15790]: Network UPS Tools driver 2.7.4.10178.17-10195-gd7f6d61b2 (development iteration after 2.7.4) - APC Smart protocol driver 3.34 Aug 26 22:11:31 ha upsdrvctl[15790]: APC command table version 3.1 Aug 26 22:11:31 ha kernel: pl2303 ttyUSB0: pl2303_get_line_request - failed: -32 Aug 26 22:11:31 ha kernel: pl2303 ttyUSB0: pl2303_get_line_request - failed: -32 Aug 26 22:11:33 ha upsdrvctl[15790]: [d] unrecognized Aug 26 22:11:33 ha apcsmart[15790]: [d] unrecognized Aug 26 22:11:39 ha apcsmart[15792]: Startup successful Aug 26 22:11:39 ha apcsmart[15792]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it Aug 26 22:11:39 ha systemd[1]: Started Network UPS Tools - power device driver controller. Aug 26 22:11:39 ha systemd[1]: Stopping Network UPS Tools - power device driver controller... Aug 26 22:11:39 ha upsdrvctl[15793]: Network UPS Tools upsdrvctl - UPS driver controller 2.7.4.10178.17-10195-gd7f6d61b2 (development iteration after 2.7.4) Aug 26 22:11:39 ha apcsmart[15792]: Signal 15: exiting Aug 26 22:11:40 ha kernel: pl2303 ttyUSB0: error sending break = -32 Aug 26 22:11:41 ha systemd[1]: nut-driver.service: Succeeded. Aug 26 22:11:41 ha systemd[1]: Stopped Network UPS Tools - power device driver controller. Aug 26 22:13:16 ha systemd[1]: Starting Network UPS Tools - power devices information server... Aug 26 22:13:16 ha nut-server[15899]: Limit Soft Limit Hard Limit Units Aug 26 22:13:16 ha nut-server[15899]: Max open files 1048576 1048576 files Aug 26 22:13:16 ha systemd[1]: Started Network UPS Tools - power devices information server. Aug 26 22:13:16 ha nut-server[15898]: Network UPS Tools upsd 2.7.4.10178.17-10195-gd7f6d61b2 (development iteration after 2.7.4) Aug 26 22:13:16 ha nut-server[15898]: listening on 127.0.0.1 port 3493 Aug 26 22:13:16 ha upsd[15898]: listening on 127.0.0.1 port 3493 Aug 26 22:13:16 ha upsd[15898]: listening on ::1 port 3493 Aug 26 22:13:16 ha nut-server[15898]: listening on ::1 port 3493 Aug 26 22:13:16 ha nut-server[15898]: listening on 192.168.2.10 port 3493 Aug 26 22:13:16 ha upsd[15898]: listening on 192.168.2.10 port 3493 Aug 26 22:13:16 ha upsd[15898]: Can't connect to UPS [apc] (apcsmart-apc): No such file or directory Aug 26 22:13:16 ha nut-server[15898]: Can't connect to UPS [apc] (apcsmart-apc): No such file or directory Aug 26 22:13:16 ha nut-server[15898]: Found 1 UPS defined in ups.conf Aug 26 22:13:16 ha upsd[15898]: Found 1 UPS defined in ups.conf Aug 26 22:13:16 ha nut-server[15898]: Running as foreground process, not saving a PID file Aug 26 22:13:16 ha nut-server[15898]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it Aug 26 22:13:16 ha upsd[15898]: Running as foreground process, not saving a PID file Aug 26 22:13:16 ha upsd[15898]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it Aug 26 22:13:33 ha systemd[1]: Starting Network UPS Tools - power device driver controller... Aug 26 22:13:33 ha upsdrvctl[15908]: Network UPS Tools upsdrvctl - UPS driver controller 2.7.4.10178.17-10195-gd7f6d61b2 (development iteration after 2.7.4) Aug 26 22:13:33 ha upsdrvctl[15909]: Network UPS Tools driver 2.7.4.10178.17-10195-gd7f6d61b2 (development iteration after 2.7.4) - APC Smart protocol driver 3.34 Aug 26 22:13:33 ha upsdrvctl[15909]: APC command table version 3.1 Aug 26 22:13:33 ha kernel: pl2303 ttyUSB0: pl2303_get_line_request - failed: -32 Aug 26 22:13:35 ha upsdrvctl[15909]: [d] unrecognized Aug 26 22:13:35 ha apcsmart[15909]: [d] unrecognized Aug 26 22:13:41 ha apcsmart[15910]: Startup successful Aug 26 22:13:41 ha apcsmart[15910]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it Aug 26 22:13:41 ha systemd[1]: Started Network UPS Tools - power device driver controller. Aug 26 22:13:41 ha systemd[1]: Stopping Network UPS Tools - power device driver controller... Aug 26 22:13:41 ha upsdrvctl[15911]: Network UPS Tools upsdrvctl - UPS driver controller 2.7.4.10178.17-10195-gd7f6d61b2 (development iteration after 2.7.4) Aug 26 22:13:41 ha apcsmart[15910]: Signal 15: exiting Aug 26 22:13:42 ha kernel: pl2303 ttyUSB0: error sending break = -32 Aug 26 22:13:43 ha systemd[1]: nut-driver.service: Succeeded. Aug 26 22:13:43 ha systemd[1]: Stopped Network UPS Tools - power device driver controller. Aug 26 22:14:51 ha kernel: pl2303 ttyUSB0: pl2303_get_line_request - failed: -32 Aug 26 22:14:53 ha apcsmart[15983]: [d] unrecognized Aug 26 22:14:59 ha apcsmart[15985]: Startup successful Aug 26 22:14:59 ha apcsmart[15985]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it Aug 26 22:14:59 ha nut-server[15898]: Connected to UPS [apc]: apcsmart-apc Aug 26 22:14:59 ha upsd[15898]: Connected to UPS [apc]: apcsmart-apc Aug 26 22:15:00 ha apcsmart[15985]: sock_connect: enabling asynchronous mode (auto) Aug 26 22:15:00 ha apcsmart[15985]: sock_connect: enabling asynchronous mode (auto)

nnov4k commented 2 months ago

reconnect log: Aug 27 19:30:08 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error Aug 27 19:30:09 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error Aug 27 19:30:10 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error Aug 27 19:30:11 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error Aug 27 19:30:12 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error Aug 27 19:30:13 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error Aug 27 19:30:14 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error Aug 27 19:30:15 ha apcsmart[15985]: Warning: excessive comm failures, limiting error reporting Aug 27 19:30:15 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error Aug 27 19:30:25 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error Aug 27 19:30:35 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error Aug 27 19:30:45 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error Aug 27 19:30:55 ha apcsmart[15985]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error Aug 27 19:30:57 ha apcsmart[15985]: Trying to reconnect to the UPS Aug 27 19:31:02 ha kernel: pl2303 ttyUSB7: pl2303_get_line_request - failed: -32 Aug 27 19:31:02 ha kernel: pl2303 ttyUSB7: pl2303_get_line_request - failed: -32 Aug 27 19:31:03 ha apcsmart[15985]: Communications with UPS re-established Aug 27 19:31:04 ha apcsmart[15985]: [d] unrecognized Aug 27 19:31:08 ha nut-server[15898]: UPS [apc] data is no longer stale Aug 27 19:31:08 ha upsd[15898]: UPS [apc] data is no longer stale Aug 27 19:31:10 ha apcsmart[15985]: upsdrv_updateinfo: recovered the connection

jimklimov commented 2 months ago

Looks great now, thanks!

jimklimov commented 2 months ago

PR merged, keeping issue open for HCL/DDL update reminder.

vadegdadeg commented 1 month ago

PR merged, keeping issue open for HCL/DDL update reminder.

Hello! Tell me, is your patch already in some release or do you have to apply it yourself? Forgive my ignorance, but I have a hard time with Git Hub and all its tools...

jimklimov commented 1 month ago

Currently it is only on master branch, should be easy to build though (see wiki footer links). Will be part of release 2.8.3 and later.

UPDATE: link: https://github.com/networkupstools/nut/wiki/Building-NUT-for-in%E2%80%90place-upgrades-or-non%E2%80%90disruptive-tests

vadegdadeg commented 1 month ago

Thank you very much for the work done! After all, it was a time bomb. If the USB to serial converter fell off, the disk became tightly clogged! But now it doesn't. It even tries to reconnect, although I didn’t have time to plug the converter back in and the driver closed. I compiled it in Gentoo as version 9999. If you need any tests on this topic, write to me. My UPS Smart UPS SC620 2 pieces.

Oct 12 12:39:12 test_box apcsmart[31543]: [d] unrecognized
Oct 12 12:39:17 test_box apcsmart[31546]: Startup successful
Oct 12 12:39:17 test_box apcsmart[31546]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it
Oct 12 12:40:18 test_box upsd[31578]: listening on 127.0.0.1 port 3493
Oct 12 12:40:18 test_box upsd[31578]: Connected to UPS [1-SC620]: apcsmart-1-SC620
Oct 12 12:40:18 test_box upsd[31578]: Found 1 UPS defined in ups.conf
Oct 12 12:40:18 test_box upsd[31581]: Startup successful
Oct 12 12:40:18 test_box upsd[31581]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it
Oct 12 12:40:18 test_box apcsmart[31546]: sock_connect: enabling asynchronous mode (auto)
Oct 12 12:41:17 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 859(update_status): Input/output error
Oct 12 12:41:17 test_box upsd[31581]: Data for UPS [1-SC620] is stale - check driver
Oct 12 12:41:27 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:41:28 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:41:29 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:41:30 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:41:31 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:41:32 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:41:33 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:41:34 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:41:35 test_box apcsmart[31546]: Warning: excessive comm failures, limiting error reporting
Oct 12 12:41:35 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:41:45 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:41:55 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:42:05 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:42:15 test_box apcsmart[31546]: Communications with UPS lost: serial port write error: 1478(smartmode): Input/output error
Oct 12 12:42:17 test_box apcsmart[31546]: Trying to reconnect to the UPS
Oct 12 12:42:22 test_box apcsmart[31546]: Fatal error: unusable configuration
Oct 12 12:42:22 test_box upsd[31581]: Can't connect to UPS [1-SC620] (apcsmart-1-SC620): No such file or directory
Oct 12 12:43:49 test_box /etc/init.d/upsdrv[31711]: ERROR: upsdrv failed to stop
Oct 12 12:44:25 test_box /etc/init.d/upsdrv[31730]: WARNING: upsdrv has already been started
Oct 12 12:44:33 test_box apcsmart[31753]: [d] unrecognized
Oct 12 12:44:39 test_box apcsmart[31754]: Startup successful
Oct 12 12:44:39 test_box apcsmart[31754]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it
Oct 12 12:44:40 test_box upsd[31581]: Connected to UPS [1-SC620]: apcsmart-1-SC620
Oct 12 12:44:40 test_box apcsmart[31754]: sock_connect: enabling asynchronous mode (auto)
jimklimov commented 1 month ago

@vadegdadeg : I suppose, as a data point - a DDL dump with modern driver builds (to know what they report, what not, how correctly) would be great. Can be directly as a PR to https://github.com/networkupstools/nut-ddl/