greenbone / gvmd

Greenbone Vulnerability Manager - The database backend for the Greenbone Community Edition
GNU Affero General Public License v3.0
286 stars 154 forks source link

[gvm-11 / gvm-20.08] Short network connection error will automatically stop running scan task (possible ospd-openvas / redis issue) #1188

Closed wisukind closed 3 years ago

wisukind commented 4 years ago

Environment:

Ubuntu 18.04
Greenbone Vulnerability Manager 9.0.1~git-e250176b-gvmd-9.0
GIT revision e250176b-gvmd-9.0
Manager DB revision 221
gvm@ov-master-eqi:~$ gsad --version
Greenbone Security Assistant 9.0.1~git-9fb2e63cd-gsa-9.0
gvm@ov-master-eqi:~$ openvas --version
OpenVAS 7.0.1
gvm-libs 11.0.1

Current Behaviour: When using a master / slave architecture; a short network connection error with the slave will Stop related running tasks automatically. This happens randomely, and is not related to the similar bug #1061

Expected bahviour: Short network connection error should not interrupt running tasks. Gvmd should wait a bit longer before considering the network connection done.

On gvmd the following error would be notified:

lib  serv:WARNING:2020-07-09 13h49.49 UTC:8819: Failed to connect to server
md manage:WARNING:2020-07-09 13h49.49 UTC:8819: Could not connect to Scanner at ov-slave-eqi:9390
lib  serv:WARNING:2020-07-09 13h49.54 UTC:8819: Failed to connect to server
md manage:WARNING:2020-07-09 13h49.54 UTC:8819: Could not connect to Scanner at ov-slave-eqi:9390
lib  serv:WARNING:2020-07-09 13h49.56 UTC:8819: Failed to connect to server
md manage:WARNING:2020-07-09 13h49.56 UTC:8819: Could not connect to Scanner at ov-slave-eqi:9390
event task:MESSAGE:2020-07-09 13h49.57 UTC:8819: Status of task Pacy, All Networks, Offices (93694043-92df-4f1e-a194-0457b7d78191) has changed to Stopped

On the ospd-openvas remote side; no particular errors is logged but 0 results are returned:

2020-07-09 13:49:02,624 OSPD - openvas: DEBUG: (ospd.ospd) Returning 0 results
2020-07-09 13:49:02,751 OSPD - openvas: DEBUG: (ospd.server) New connection from ('10.194.157.7', 39850)
2020-07-09 13:49:17,800 OSPD - openvas: DEBUG: (ospd.server) New connection from ('10.194.157.7', 39860)
2020-07-09 13:49:28,065 OSPD - openvas: DEBUG: (ospd.server) New connection from ('10.194.157.7', 39868)
2020-07-09 13:49:38,100 OSPD - openvas: DEBUG: (ospd.ospd) Returning 0 results
2020-07-09 13:49:38,126 OSPD - openvas: DEBUG: (ospd.server) New connection from ('10.194.157.7', 39874)
2020-07-09 13:49:53,192 OSPD - openvas: DEBUG: (ospd.server) New connection from ('10.194.157.7', 39884)

Around the same time, openvas logs it's last scanning events.

sd   main:MESSAGE:2020-07-09 13h49.01 utc:30643: The remote host 10.201.31.57 is dead
sd   main:MESSAGE:2020-07-09 13h49.02 utc:30643: Finished testing 10.201.31.57. Time : 3.59 secs
sd   main:MESSAGE:2020-07-09 13h49.02 utc:31385: Testing 10.204.42.238 [31385]
sd   main:MESSAGE:2020-07-09 13h49.06 utc:31385: The remote host 10.204.42.238 is dead
sd   main:MESSAGE:2020-07-09 13h49.06 utc:31385: Finished testing 10.204.42.238. Time : 3.53 secs
sd   main:MESSAGE:2020-07-09 13h49.06 utc:32003: Testing 10.201.31.88 [32003]
sd   main:MESSAGE:2020-07-09 13h49.10 utc:32003: The remote host 10.201.31.88 is dead
sd   main:MESSAGE:2020-07-09 13h49.10 utc:32003: Finished testing 10.201.31.88. Time : 3.82 secs
sd   main:MESSAGE:2020-07-09 13h49.11 utc:32574: Testing 10.204.41.45 [32574]
sd   main:MESSAGE:2020-07-09 13h49.14 utc:32574: The remote host 10.204.41.45 is dead
sd   main:MESSAGE:2020-07-09 13h49.14 utc:32574: Finished testing 10.204.41.45. Time : 3.54 secs

And shortly after that, openvas log the following: lib nvticache:MESSAGE:2020-07-09 13h54.44 utc:33535: Updated NVT cache from version 0 to 202007071016

After this, openvas stopped logged any events until a scan was restarted. No NVT update was performed at this time. It seems something wrong has happened with redis, since the redis service has started at the same time (UTC+2h); although no shutdown / error was logged:

33348:C 09 Jul 15:49:20.210 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
33348:C 09 Jul 15:49:20.210 # Redis version=4.0.9, bits=64, commit=00000000, modified=0, pid=33348, just started
33348:C 09 Jul 15:49:20.210 # Configuration loaded
33370:M 09 Jul 15:49:20.263 * Running mode=standalone, port=6379.
33370:M 09 Jul 15:49:20.263 # Server initialized
33370:M 09 Jul 15:49:20.264 * DB loaded from disk: 0.000 seconds
33370:M 09 Jul 15:49:20.264 * Ready to accept connections
33370:M 09 Jul 15:49:20.264 * The server is now ready to accept connections at /var/run/redis/redis.sock

And redis status confirm it received the special USR1 kill signal:

root@ov-slave-eqi:~# systemctl status redis
● redis-server.service - Advanced key-value store
   Loaded: loaded (/lib/systemd/system/redis-server.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2020-07-09 15:49:20 CEST; 21h ago
     Docs: http://redis.io/documentation,
           man:redis-server(1)
  Process: 63294 ExecStop=/bin/kill -s TERM $MAINPID (code=exited, status=0/SUCCESS)
  Process: 33348 ExecStart=/usr/bin/redis-server /etc/redis/redis.conf (code=exited, status=0/SUCCESS)
 Main PID: 33370 (redis-server)
    Tasks: 4 (limit: 9830)
   CGroup: /system.slice/redis-server.service
           └─33370 /usr/bin/redis-server 127.0.0.1:6379

juil. 09 15:49:19 ov-slave-eqi systemd[1]: redis-server.service: Main process exited, code=killed, status=10/USR1
juil. 09 15:49:19 ov-slave-eqi systemd[1]: redis-server.service: Failed with result 'signal'.
juil. 09 15:49:19 ov-slave-eqi systemd[1]: redis-server.service: Service hold-off time over, scheduling restart.
juil. 09 15:49:19 ov-slave-eqi systemd[1]: redis-server.service: Scheduled restart job, restart counter is at 2.
juil. 09 15:49:19 ov-slave-eqi systemd[1]: Stopped Advanced key-value store.
juil. 09 15:49:19 ov-slave-eqi systemd[1]: Starting Advanced key-value store...
juil. 09 15:49:20 ov-slave-eqi systemd[1]: redis-server.service: Can't open PID file /var/run/redis/redis-server.pid (yet?) after start: No such file or directory
juil. 09 15:49:20 ov-slave-eqi systemd[1]: Started Advanced key-value store.

At this time; nobody was logged on the system and since the kill signal is a specific one; could it be that redis was actually USR1 killed by ospd-openvas ? Only ospd-openvas / openvas are using redis on this system.

Thanks

liuk78 commented 4 years ago

Same problem

wisukind commented 4 years ago

Note that this problem also occurs on gvmd 20.08, latest build

cfi-gb commented 3 years ago

Might be already solved by https://github.com/greenbone/gvmd/pull/1452

y0urself commented 3 years ago

@wisukind does this still appear in the latest build, or is it fixed for you?

wisukind commented 3 years ago

Hello,

I confirm I no longer see those issues with latest build.