Open dhedberg opened 1 year ago
Hey @dhedberg ,
Thanks for the bug report and sorry for the late reply.
We tried to replicate the issue but without success. We guess that this is a possible bug in glib.
Does this issue still happen or have you found a solution since then? Do you have some minimal reproducer, so we can replicate the issue?
Thanks for looking into this.
As far as I can tell from our logs and CPU usage monitoring, the issue still persists. We're working around the issue for now by regularly killing any lingering openvas --scan-start processes that matches tasks marked as Done in the API with a cronjob. We also automatically update to the latest stable images (at least) once a week.
I don't have a minimal reproducer at hand. We don't do that much customization; we create a custom port list from the IANA one with one port removed, a custom scan config from "Full and fast" with around 10 oids disabled, and have a few tasks which scan a bunch of IP ranges with ssh auth enabled. Usually there's only one task running at a time.
Hi all, I seem to have the same issue.
OSPD[7] 2023-09-11 11:04:31,470: INFO: (ospd.command.command) Scan 3b93c961-e15f-460e-859a-15fa12c26368 added to the queue in position 2. OSPD[7] 2023-09-11 11:04:40,670: INFO: (ospd.ospd) Currently 1 queued scans. OSPD[7] 2023-09-11 11:04:41,325: INFO: (ospd.ospd) Starting scan 3b93c961-e15f-460e-859a-15fa12c26368. OSPD[7] 2023-09-11 11:47:09,521: INFO: (ospd.ospd) 3b93c961-e15f-460e-859a-15fa12c26368: Host scan finished. OSPD[7] 2023-09-11 11:47:09,528: INFO: (ospd.ospd) 3b93c961-e15f-460e-859a-15fa12c26368: Scan finished.
root@ospd-openvas:/ospd-openvas# ps fax PID TTY STAT TIME COMMAND 33270 pts/0 Ss 0:00 bash 33286 pts/0 R+ 0:00 _ ps fax 1 ? Ss 0:00 /sbin/docker-init -- /usr/bin/tini -- /usr/local/bin/entrypoint ospd-openvas -f --config /etc/gvm/ospd-openvas.conf --mqtt-broker 6 ? S 0:00 /usr/bin/tini -- /usr/local/bin/entrypoint ospd-openvas -f --config /etc/gvm/ospd-openvas.conf --mqtt-broker-address mqtt-broker 7 ? Sl 3:19 _ /usr/bin/python3 /usr/local/bin/ospd-openvas -f --config /etc/gvm/ospd-openvas.conf --mqtt-broker-address mqtt-broker --notus 20 ? Sl 0:09 _ /usr/bin/python3 /usr/local/bin/ospd-openvas -f --config /etc/gvm/ospd-openvas.conf --mqtt-broker-address mqtt-broker --n 1133 ? RNL 3:50 openvas --scan-start 3b93c961-e15f-460e-859a-15fa12c26368
It seems to stay in memory and not cleaning. Restarting the docker process helps, but somewhere ospd-openvas is not giving the memory claim free.
R = running or runnable (on run queue) N = low-priority (nice to other users) L = has pages locked into memory (for real-time and custom IO)
Especially the last part L is happening. Is this a bug in ospd-openvas? I've never experienced this issue before when using Debian 11.5 with same specs as below, except Debian 12
Debian 12 (bookworm) 8 vCPU 16 MEM Used the docker solution from OpenVas Latest images pulled and installed
Extra information:
Tasks: 169 total, 4 running, 165 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.8 us, 0.2 sy, 70.7 ni, 23.3 id, 0.0 wa, 0.0 hi, 2.5 si, 2.6 st MiB Mem : 15823.6 total, 249.8 free, 2185.4 used, 13855.3 buff/cache MiB Swap: 8192.0 total, 8075.0 free, 117.0 used. 13638.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1153249 docker 30 10 10.8t 16228 7556 R 100.0 0.1 9,57 openvas --scan-start 57278eb0-2a35-456b-8661-ec049570ea49 1153257 docker 30 10 10.8t 16240 7568 R 99.7 0.1 9,53 openvas --scan-start c0ce0ed3-6856-48d1-9321-86d0593afcd1 1153253 docker 30 10 10.8t 16220 7552 R 99.3 0.1 9,55 openvas --scan-start 3b420acc-947c-4fd3-a4f8-2e0b94465c9c
Scans keep on running infinitely. When using strace on the pid no output is shown. The process is not a zombie. Somewhere the process is not getting killed.
I've tried with the default docker-compose file on a Debian 12 server. The VM is running on Amazon btw. Doesn't make a difference, because I used a Debian 11.5 before on Amazon (same architecture) and that one worked just fine. Same docker-compose file.
I am seeing the same thing with our deployment using our helm chart: https://github.com/fpm-git/Greenbone-Community-Edition-Helm
We've been investigating this an it appears to be similar to the OP with a more recent build of OpenVAS.
Including the quoted bit below.
(gdb) bt #0 0x00007f39f099ef1e in resize_set (reallocated_buckets_bitmap=0x7472682d5010, old_size=2660584736, hash_table=0x563b9e954760) at ../../../glib/ghash.c:834 #1 g_hash_table_resize (hash_table=0x563b9e954760) at ../../../glib/ghash.c:888 #2 0x00007f39f09a0d3e in g_hash_table_destroy (hash_table=0x563b9e954760) at ../../../glib/ghash.c:1488 #3 0x00007f39f0aa18c0 in destroy_scan_globals () from /usr/local/lib/libopenvas_misc.so.22 #4 0x0000563b9d621107 in openvas () #5 0x00007f39f0591d0a in __libc_start_main (main=0x563b9d61cd50 <main>, argc=3, argv=0x7ffe314163d8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe314163c8) at ../csu/libc-start.c:308 #6 0x0000563b9d61cd8a in _start ()
We seem to see it only when there is authenticated scans being used.
We're on the latest greenbone/ospd-openvas:stable, dated 2022-12-12T14:39:05.303393489Z, and we're seeing an issue where the openvas processes hang around after the tasks have been marked as done, taking up 100% of a CPU core.
The container is running in kubernetes with a deployment converted (with some small adaptations) from the community container docker-compose file.
Example:
The task is already marked as finished. Attaching with gdb gives:
it seems to be stuck in:
.. which I think corresponds to
in glib/ghash.c
That's as far as I have dug; I'm guessing it might be some sort of memory corruption or concurrency issue?