Open jimklimov opened 3 months ago
Experimented with a change to log errno
- and yes: at nut-scanner
level, at least for this use-case, we do know the cause of the problem:
diff --git a/tools/nut-scanner/nut-scanner.c b/tools/nut-scanner/nut-scanner.c
index a3d785f5a..711dc3307 100644
--- a/tools/nut-scanner/nut-scanner.c
+++ b/tools/nut-scanner/nut-scanner.c
@@ -84,7 +84,7 @@
* Another +1 is for NetSNMP which wants to open MIB files,
* potential per-host configuration files, etc.
*/
-# define RESERVE_FD_COUNT 4
+# define RESERVE_FD_COUNT 0
# endif /* HAVE_SYS_RESOURCE_H */
# endif /* HAVE_PTHREAD_TRYJOIN || HAVE_SEMAPHORE_UNNAMED || HAVE_SEMAPHORE_NAMED */
#endif /* HAVE_PTHREAD */
diff --git a/tools/nut-scanner/scan_snmp.c b/tools/nut-scanner/scan_snmp.c
index a8c3b42cb..fc3826454 100644
--- a/tools/nut-scanner/scan_snmp.c
+++ b/tools/nut-scanner/scan_snmp.c
@@ -969,7 +969,7 @@ static void * try_SysOID_thready(void * arg)
/* Open the session */
handle = wrap_nut_snmp_sess_open(&snmp_sess); /* establish the session */
if (handle == NULL) {
- upsdebugx(2,
+ upsdebug_with_errno(2,
"Failed to open SNMP session for %s",
sec->peername);
goto try_SysOID_free;
...leads to:
...
0.296940 [D2] Entering try_SysOID_thready for 172.28.67.252
0.297073 [D5] nutscan_ip_ranges_iter_inc: got IP from range: 172.28.67.254
0.297115 [D4] nutscan_scan_ip_range_snmp: max_threads_scantype=0 curr_threads=1022 thread_count=1022 stwST=-1 stwS=0 pass=1
0.297190 [D5] nutscan_ip_ranges_iter_inc: got IP from range: 172.28.67.255
0.297235 [D4] nutscan_scan_ip_range_snmp: max_threads_scantype=0 curr_threads=1023 thread_count=1023 stwST=-1 stwS=0 pass=1
0.297083 [D2] Entering try_SysOID_thready for 172.28.67.253
0.297190 [D2] Entering try_SysOID_thready for 172.28.67.254
0.297351 [D2] Entering try_SysOID_thready for 172.28.67.255
0.297359 [D5] nutscan_ip_ranges_iter_inc: got IP from range: 172.28.68.0
0.297396 [D4] nutscan_scan_ip_range_snmp: max_threads_scantype=0 curr_threads=1024 thread_count=1024 stwST=-1 stwS=-1 pass=0
0.297413 [D2] nutscan_scan_ip_range_snmp: Running too many scanning threads (1024), waiting until older ones would finish
/var/lib/snmp/hosts/172.28.67.165.local.conf: Too many open files
0.378710 [D2] Failed to open SNMP session for 172.28.65.167: Too many open files
0.378813 [D2] Failed to open SNMP session for 172.28.65.113: Too many open files
0.378755 [D2] Failed to open SNMP session for 172.28.67.165: Too many open files
^C
As slightly noted in issue #2575 and in PRs that dealt with parallelized scans in
nut-scanner
, depending on platform defaults and particular OS deployment and third-party library specifics,nut-scanner
may run out of file descriptors despite already trying to adapt the maximums toulimit
information where available.As seen recently and culminating in commit 2c3a09ef0cbc845d53f603fdf9316c6f0f901979 of PR #2539 (issue #2511), certain libnetsnmp builds can consume FD's for network sockets, local filesystem looking for per-host configuration files or MIB files, for directory scanning during those searches, etc. This is a variable beyond our control, different implementations and versions of third-party code can behave as they please. Example staged with that commit reverted and a scan of a large network range:
What we can do is not abort the scans upon any hiccup, but checking for
errno==EMFILE
and delaying and retrying later (or maybe even actively decreasing the thread maximum variable of the process). We already have a way to detectRunning too many scanning threads (NUM), waiting until older ones would finish
so that's about detecting the issue and extending criteria.