networkupstools / nut

The Network UPS Tools repository. UPS management protocol Informational RFC 9271 published by IETF at https://www.rfc-editor.org/info/rfc9271 Please star NUT on GitHub, this helps with sponsorships!
https://networkupstools.org/
Other
1.91k stars 344 forks source link

Default "scan all" by `nut-scanner` aborts on error of one bus (at least "init libusb") and does not complete other buses #2575

Open jimklimov opened 1 month ago

jimklimov commented 1 month ago
...
0.007398     [D2] Entering run_snmp for 1 IP address range(s)
0.007411     [D1] nutscan_scan_ip_range_snmp: Scanning SNMP for IP address range(s): (1)[10.94.56.0 .. 10.94.56.255]
0.007774     [D2] Entering list_nut_devices_thready for 10.94.56.0
0.007803     [D2] Entering list_nut_devices_thready for 10.94.56.1
0.008031     [D2] nutscan_scan_xml_http_thready: scanning IP '10.94.56.0' with a unicast, attempt 1 of 3 with a timeout of 5000000 usec
0.008073     nutscan_scan_xml_http_thready: Error sending Eaton <SCAN_REQUEST/> to 10.94.56.0, #1/3
0.008436     Failed to init libusb 1.0: Permission denied
~/nut $ echo $?
1

...or

~/nut $ LD_LIBRARY_PATH=`pwd`/clients/.libs ./tools/nut-scanner/nut-scanner -DDDDDD -m auto
...
   0.024553     Scanning USB bus.
   0.024727     Scanning SNMP bus.
   0.024745     [D1] SNMP SCAN: starting pthread_create with run_snmp...
   0.024931     Scanning XML/HTTP bus.
   0.024949     [D1] XML/HTTP SCAN: starting pthread_create with run_xml...
   0.025069     Scanning NUT bus (old libupsclient connect method).
   0.025113     [D1] NUT bus (old) SCAN: starting pthread_create with run_nut_old...
   0.025302     Scanning NUT simulation devices.
   0.025356     [D1] NUT simulation devices SCAN: starting pthread_create with run_nut_simulation...
   0.025491     [D2] Entering run_snmp for 1 IP address range(s)
   0.025500     [D1] NUT bus (avahi) SCAN: not requested or supported, SKIPPED
   0.025634     [D1] IPMI SCAN: not requested or supported, SKIPPED
   0.025640     [D1] SERIAL SCAN: not requested or supported, SKIPPED
   0.025645     [D1] USB SCAN: join back the pthread
   0.025662     [D1] Scanning: /usr/local/ups/etc
   0.025670     [D2] Entering run_nut_old for 1 IP address range(s)
   0.025800     [D1] nutscan_scan_nut_simulation: Failed to open /usr/local/ups/etc: No such file or directory
   0.025836     [D4] nutscan_scan_ip_range_nut: sem_init() for 1021 threads
   0.025871     Failed to open /usr/local/ups/etc, skip NUT simulation scan
   0.025907     [D1] nutscan_scan_ip_range_nut: Scanning "Old NUT" bus for IP address range(s): (1)[10.29.148.0 .. 10.29.148.255]
   0.025806     [D2] Entering run_xml for 1 IP address range(s)
   0.025971     [D4] nutscan_ip_ranges_iter_init: beginning iteration with first IP range [10.29.148.0 .. 10.29.148.255]
   0.025640     [D1] nutscan_scan_ip_range_snmp: Scanning SNMP for IP address range(s): (1)[10.29.148.0 .. 10.29.148.255]
   0.026096     [D5] nutscan_ip_ranges_iter_init: got IP from range: 10.29.148.0
   0.026032     [D1] nutscan_scan_ip_range_xml_http: Scanning XML/HTTP bus for IP address range(s): (1)[10.29.148.0 .. 10.29.148.255]
   0.026213     [D4] nutscan_scan_ip_range_xml_http: sem_init() for 1021 threads
   0.026235     [D4] nutscan_ip_ranges_iter_init: beginning iteration with first IP range [10.29.148.0 .. 10.29.148.255]
   0.026277     [D5] nutscan_ip_ranges_iter_init: got IP from range: 10.29.148.0
   0.026539     [D5] nutscan_ip_ranges_iter_inc: got IP from range: 10.29.148.1
   0.026575     [D4] nutscan_scan_ip_range_nut: max_threads_scantype=1021 curr_threads=0 thread_count=1 stwST=0 stwS=0 pass=1
   0.027006     [D5] nutscan_ip_ranges_iter_inc: got IP from range: 10.29.148.2
   0.027041     [D4] nutscan_scan_ip_range_nut: max_threads_scantype=1021 curr_threads=0 thread_count=2 stwST=0 stwS=0 pass=1
   0.027062     [D5] nutscan_ip_ranges_iter_inc: got IP from range: 10.29.148.1
   0.027096     [D4] nutscan_scan_ip_range_xml_http: max_threads_scantype=1021 curr_threads=0 thread_count=1 stwST=0 stwS=0 pass=1
   0.027182     No USB device found: No such file or directory

~/nut $ echo $?
1

On one hand, this is a bit inconsistent vs. the situation where a missing library causes a scan to be overlooked and skipped. On the other, if it was requested and could not be fulfilled, it should be an error.

Maybe the explicit settings -O, -U, etc. should be treated as "required to succeed", while automatic enablement via no flag at all should be best-effort (require any one to succeed or hold no expectations at all?), and via -C is most questionable (require all to succeed or at least one)?

In any case, returning an error or success as exit code should be deferred to until all possible scans were completed (barring malloc, ulimit or pthreadserrors, etc. - and better step back there too). Here the other scans (netxml, snmp, oldnut) were started, passed 1-2 IP addresses, and were not allowed to finish.

jimklimov commented 1 month ago

Seems limited to USB scans:

:; grep fatal  nut/tools/nut-scanner/scan_*
nut/tools/nut-scanner/scan_usb.c:               fatal_with_errno(EXIT_FAILURE, "Failed to init libusb 1.0");
nut/tools/nut-scanner/scan_usb.c:               fatal_with_errno(EXIT_FAILURE, "No USB device found");
nut/tools/nut-scanner/scan_usb.c:                       fatal_with_errno(EXIT_FAILURE, "Out of memory");
nut/tools/nut-scanner/scan_usb.c:                       fatal_with_errno(EXIT_FAILURE, "Out of memory");
nut/tools/nut-scanner/scan_usb.c:                               fatal_with_errno(EXIT_FAILURE, "Out of memory");
nut/tools/nut-scanner/scan_usb.c:                                                       fatal_with_errno(EXIT_FAILURE, "Out of memory");
nut/tools/nut-scanner/scan_usb.c:                                                       fatal_with_errno(EXIT_FAILURE, "Out of memory");
nut/tools/nut-scanner/scan_usb.c:                                                       fatal_with_errno(EXIT_FAILURE, "Out of memory");
nut/tools/nut-scanner/scan_xml_http.c:          "%s: Had to abort scan for %s, see fatal details above",

Note that malloc()==NULL handling is also not fatal in other scanners (and once even in USB), where they break off a loop iteration to try going into next cycle:

nut/tools/nut-scanner/scan_ipmi.c:                              upsdebugx(0, "%s: Memory allocation error", __func__);
nut/tools/nut-scanner/scan_ipmi.c:                                      upsdebugx(0, "%s: Memory allocation error", __func__);

nut/tools/nut-scanner/scan_nut.c:                               upsdebugx(0, "%s: Memory allocation error", __func__);

nut/tools/nut-scanner/scan_snmp.c:              upsdebugx(0, "%s: Memory allocation error", __func__);
nut/tools/nut-scanner/scan_snmp.c:                              upsdebugx(0, "%s: Memory allocation error", __func__);

nut/tools/nut-scanner/scan_usb.c:                       fatal_with_errno(EXIT_FAILURE, "Out of memory");
nut/tools/nut-scanner/scan_usb.c:                       fatal_with_errno(EXIT_FAILURE, "Out of memory");
nut/tools/nut-scanner/scan_usb.c:                               fatal_with_errno(EXIT_FAILURE, "Out of memory");
nut/tools/nut-scanner/scan_usb.c:                                                       fatal_with_errno(EXIT_FAILURE, "Out of memory");
nut/tools/nut-scanner/scan_usb.c:                                                       fatal_with_errno(EXIT_FAILURE, "Out of memory");
nut/tools/nut-scanner/scan_usb.c:                                                       fatal_with_errno(EXIT_FAILURE, "Out of memory");
nut/tools/nut-scanner/scan_usb.c:                                       upsdebugx(0, "%s: Memory allocation error", __func__);

nut/tools/nut-scanner/scan_xml_http.c:                                  upsdebugx(0, "%s: Memory allocation error", __func__);
nut/tools/nut-scanner/scan_xml_http.c:                                  upsdebugx(0, "%s: Memory allocation error", __func__);
nut/tools/nut-scanner/scan_xml_http.c:          upsdebugx(0, "%s: Memory allocation error", __func__);
jimklimov commented 1 month ago

Partial solution posted: the single-commit PR #2577 addresses the acute problem from this issue, to not abort other scan types if there are problems with USB (other scanners do not seem to indulge in fatal behaviors).

It does not (yet?) address the ideas about returning or not an error exit code based on explicitly requested scan type failures (and/or them all or any not-returning any findings?) Probably there should be explicit CLI options about what we want to see as an error, for scripting.

Perhaps separately from that, memory errors that were not retried and recovered from should end up in an erroneous exit-code regardless, as we did not ATTEMPT scanning some of the items we otherwise could (so the lack of findings may be a false-positive)?