Memory leak on scan - Githubissues

y0d4a commented 1 year ago

When OpenVAS start to scanning

https://developer.hashicorp.com/boundary/docs https://developer.hashicorp.com/vault

we are getting memory leak by process:

gb_log4j_CVE-2021-44228_http_web_dirs_active.nasl

This seems is some kind of bug. If you can, replicate it, all software are free

ArnoStiefvater commented 1 year ago

Hey @y0d4a

Thanks for creating the issue.

Just to make sure. You are scanning the Products "boundary" and "Vault" from haschicorp?

How do you know that the posted nasl script is the culprit? Do you have additional logs or reasons to believe so?

y0d4a commented 1 year ago

Hi, yes those two products. I saw with monitoring the process, when he come to IPs with that services he scanning for very long time and on the end, he used all of memory and interrupt the scan (sometimes crash openvas service - when is run from docker)

cfi-gb commented 11 months ago

Ref to two relevant community forum postings (for tracking purposes)

and a more recent issue posted over at greenbone/ospd-openvas#974

wdoekes commented 7 months ago

In https://github.com/greenbone/ospd-openvas/issues/974 I've commented my findings. Summarizing:

Example of "leak"

I dropped the db that seemed to be transient:

-server_time_usec:1709405329080351
-uptime_in_seconds:588
+server_time_usec:1709406156005681
+uptime_in_seconds:1415
...
 # Memory
-used_memory:3344099096
-used_memory_human:3.11G
-used_memory_rss:3001507840
-used_memory_rss_human:2.80G
+used_memory:171441592
+used_memory_human:163.50M
+used_memory_rss:194514944
+used_memory_rss_human:185.50M
...

 # Keyspace
 db0:keys=1,expires=0,avg_ttl=0
 db1:keys=177456,expires=0,avg_ttl=0
-db6:keys=3468,expires=0,avg_ttl=0

Dropping those 3400 entries in db6 freed a whopping 2GB.

The keys I freed look like:

$ sudo docker exec -it greenbone-community-container_redis-server_1 redis-cli -s /run/redis/redis.sock -n 6
redis /run/redis/redis.sock[6]> keys *
   1) "Cache/node.example.com/8200/excluding_404_body/URL_/ui/vcav-bootstrap/rest/vcav-providers/config.neon"
   2) "Cache/node.example.com/8200/excluding_404_body/URL_/ui/vcav-bootstrap/rest/WEB-INF/local.properties"
   3) "Cache/node.example.com/8200/excluding_404_body/URL_/ui/vropspluginui/rest/services/.env.example"

And it turned out that the contents were that of a HashiCorp Vault instance: any URL after /ui/ would return a 200 and about 700kB of html:

$ curl --fail -k https://vault.example.com:8200/ui/whatever -o/dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  786k  100  786k    0     0  4990k      0 --:--:-- --:--:-- --:--:-- 5009k

With 28000+ URLs scanned, this would quickly add up (about 350kB stored in redis per URL: 10GB).

Workarounds to keep redis from getting killed

Changes for community docker-compose.yml:

  redis-server:
    image: greenbone/redis-server
    command:
      # https://forum.greenbone.net/t/redis-oom-killed-on-one-host-scan/15722/5
      - /bin/sh
      - -c
      - 'rm -f /run/redis/redis.sock && cat /etc/redis/redis.conf >/run/redis/redis.conf && printf "%s\n" "maxmemory 12884901888" "maxmemory-policy allkeys-lru" "maxclients 150" "tcp-keepalive 15" >>/run/redis/redis.conf && redis-server /run/redis/redis.conf'
    logging:
      driver: journald
    restart: on-failure
    volumes:
      - redis_socket_vol:/run/redis/

The allkeys-lru above is wrong. You'll end up losing the important stuff in Keyspaces 0 and 1. Better is using volatile-ttl, but it doesn't do anything effectively as none of the stored items has a non-INF ttl. So for now, I went with noeviction.

The settings:

maxmemory 12884901888 12GB, adjust as needed
maxmemory-policy noeviction
maxclients 150 a single run with 6 simultaneous hosts and 3 simultaneous scans per host already does about ~40 open redis connections; tweak as appropriate
tcp-keepalive 15 not sure, copied from the forum

Workaround effects

Redis now won't die, but instead the users of redis report failures:

openvas dying with segfaults due to NULL pointer accesses: kernel: openvas[166925]: segfault at 0 ip 000055c611a13fe6 sp 00007ffe63419aa0 error 4 in openvas[55c611a13000+9000]
ospd.py dying because of redis refusing to do an LRANGE: redis.exceptions.OutOfMemoryError: Command # 1 (LRANGE internal/results 0 -1) of pipeline caused error: command not allowed when used memory > 'maxmemory'.

This also aborts the scan.

Workaround to reduce memory usage of redis

As reported elsewhere, the immediate culprit was "caching of web pages during CGI scanning".

An alternative fix that appears to work is this:

--- greenbone-community-container_vt_data_vol/_data/http_keepalive.inc.orig 2024-03-18 15:46:31.480951508 +0100
+++ greenbone-community-container_vt_data_vol/_data/http_keepalive.inc  2024-03-18 15:52:51.764904305 +0100
@@ -726,7 +726,8 @@ function http_get_cache( port, item, hos
     # Internal Server Errors (5xx)
     # Too Many Requests (429)
     # Request Timeout (408)
-    if( res !~ "^HTTP/1\.[01] (5(0[0-9]|1[01])|4(08|29))" )
+    # Size of response must be less than 1.5*64k
+    if( res !~ "^HTTP/1\.[01] (5(0[0-9]|1[01])|4(08|29))" && strlen( res ) < 98304 )
       replace_kb_item( name:"Cache/" + host + "/" + port + "/" + key + "/URL_" + item, value:res );

   }

This reduces the effectiveness of caching, but now all these large web results are not cached and memory stays well below 2GB even when running multiple scans simultaneously.

Better workarounds

Limiting caching to pages shorter than 96kB is a rather crude way. Better would be if we could make the limit more dynamic:

stopping caching of a run as soon as there is memory pressure;
flagging certain objects as less important (starting with a ttl for everything in Keyspaces above 1).

Right now I don't know of ways to get the current memory usage of a Keyspace from redis, but the library storing the values could record it itself in a separate redis key using INCRBY and maybe stop adding more to the cache once it hits a limit.

Links into the source / places to look when considering a fix:

cfi-gb commented 7 months ago

Usage of the following might be also an option (AFAICT this needs adjustments to the redis-server.conf):

greenbone/openvas-scanner#682
greenbone/gvm-libs#460
greenbone/openvas-scanner#687

greenbone / openvas-scanner

Memory leak on scan #1488

Example of "leak"

Workarounds to keep redis from getting killed

Workaround effects

Workaround to reduce memory usage of redis

Better workarounds