Open y0d4a opened 1 year ago
Hey @y0d4a
Thanks for creating the issue.
Just to make sure. You are scanning the Products "boundary" and "Vault" from haschicorp?
How do you know that the posted nasl script is the culprit? Do you have additional logs or reasons to believe so?
Hi, yes those two products. I saw with monitoring the process, when he come to IPs with that services he scanning for very long time and on the end, he used all of memory and interrupt the scan (sometimes crash openvas service - when is run from docker)
Ref to two relevant community forum postings (for tracking purposes)
and a more recent issue posted over at greenbone/ospd-openvas#974
In https://github.com/greenbone/ospd-openvas/issues/974 I've commented my findings. Summarizing:
I dropped the db that seemed to be transient:
-server_time_usec:1709405329080351
-uptime_in_seconds:588
+server_time_usec:1709406156005681
+uptime_in_seconds:1415
...
# Memory
-used_memory:3344099096
-used_memory_human:3.11G
-used_memory_rss:3001507840
-used_memory_rss_human:2.80G
+used_memory:171441592
+used_memory_human:163.50M
+used_memory_rss:194514944
+used_memory_rss_human:185.50M
...
# Keyspace
db0:keys=1,expires=0,avg_ttl=0
db1:keys=177456,expires=0,avg_ttl=0
-db6:keys=3468,expires=0,avg_ttl=0
Dropping those 3400 entries in db6 freed a whopping 2GB.
The keys I freed look like:
$ sudo docker exec -it greenbone-community-container_redis-server_1 redis-cli -s /run/redis/redis.sock -n 6
redis /run/redis/redis.sock[6]> keys *
1) "Cache/node.example.com/8200/excluding_404_body/URL_/ui/vcav-bootstrap/rest/vcav-providers/config.neon"
2) "Cache/node.example.com/8200/excluding_404_body/URL_/ui/vcav-bootstrap/rest/WEB-INF/local.properties"
3) "Cache/node.example.com/8200/excluding_404_body/URL_/ui/vropspluginui/rest/services/.env.example"
And it turned out that the contents were that of a HashiCorp Vault instance: any URL after /ui/ would return a 200 and about 700kB of html:
$ curl --fail -k https://vault.example.com:8200/ui/whatever -o/dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 786k 100 786k 0 0 4990k 0 --:--:-- --:--:-- --:--:-- 5009k
With 28000+ URLs scanned, this would quickly add up (about 350kB stored in redis per URL: 10GB).
Changes for community docker-compose.yml
:
redis-server:
image: greenbone/redis-server
command:
# https://forum.greenbone.net/t/redis-oom-killed-on-one-host-scan/15722/5
- /bin/sh
- -c
- 'rm -f /run/redis/redis.sock && cat /etc/redis/redis.conf >/run/redis/redis.conf && printf "%s\n" "maxmemory 12884901888" "maxmemory-policy allkeys-lru" "maxclients 150" "tcp-keepalive 15" >>/run/redis/redis.conf && redis-server /run/redis/redis.conf'
logging:
driver: journald
restart: on-failure
volumes:
- redis_socket_vol:/run/redis/
The allkeys-lru
above is wrong. You'll end up losing the important stuff in Keyspaces 0 and 1. Better is using volatile-ttl
, but it doesn't do anything effectively as none of the stored items has a non-INF ttl. So for now, I went with noeviction
.
The settings:
maxmemory 12884901888
12GB, adjust as neededmaxmemory-policy noeviction
maxclients 150
a single run with 6 simultaneous hosts and 3 simultaneous scans per host already does about ~40 open redis connections; tweak as appropriatetcp-keepalive 15
not sure, copied from the forumRedis now won't die, but instead the users of redis report failures:
kernel: openvas[166925]: segfault at 0 ip 000055c611a13fe6 sp 00007ffe63419aa0 error 4 in openvas[55c611a13000+9000]
redis.exceptions.OutOfMemoryError: Command # 1 (LRANGE internal/results 0 -1) of pipeline caused error: command not allowed when used memory > 'maxmemory'.
This also aborts the scan.
As reported elsewhere, the immediate culprit was "caching of web pages during CGI scanning".
An alternative fix that appears to work is this:
--- greenbone-community-container_vt_data_vol/_data/http_keepalive.inc.orig 2024-03-18 15:46:31.480951508 +0100
+++ greenbone-community-container_vt_data_vol/_data/http_keepalive.inc 2024-03-18 15:52:51.764904305 +0100
@@ -726,7 +726,8 @@ function http_get_cache( port, item, hos
# Internal Server Errors (5xx)
# Too Many Requests (429)
# Request Timeout (408)
- if( res !~ "^HTTP/1\.[01] (5(0[0-9]|1[01])|4(08|29))" )
+ # Size of response must be less than 1.5*64k
+ if( res !~ "^HTTP/1\.[01] (5(0[0-9]|1[01])|4(08|29))" && strlen( res ) < 98304 )
replace_kb_item( name:"Cache/" + host + "/" + port + "/" + key + "/URL_" + item, value:res );
}
This reduces the effectiveness of caching, but now all these large web results are not cached and memory stays well below 2GB even when running multiple scans simultaneously.
Limiting caching to pages shorter than 96kB is a rather crude way. Better would be if we could make the limit more dynamic:
Right now I don't know of ways to get the current memory usage of a Keyspace from redis, but the library storing the values could record it itself in a separate redis key using INCRBY and maybe stop adding more to the cache once it hits a limit.
Links into the source / places to look when considering a fix:
Usage of the following might be also an option (AFAICT this needs adjustments to the redis-server.conf):
When OpenVAS start to scanning
we are getting memory leak by process:
This seems is some kind of bug. If you can, replicate it, all software are free