prbinu / tls-scan

An Internet scale, blazing fast SSL/TLS scanner ( non-blocking, event-driven )
https://prbinu.github.io/tls-scan
Other
283 stars 54 forks source link

Orphaned Sockets in FIN-WAIT-1 State Causing tls-scan to Halt #56

Closed taco-killchain closed 1 year ago

taco-killchain commented 1 year ago

I have been encountering a significant issue while using tls-scan version 1.5.1 on Ubuntu 20.04 LTS to conduct scans against a large number of hosts. The tool seems to generate a substantial number of orphaned sockets during operation, leading to a state where it ceases to progress further. Notably, the tool doesn't crash, nor does it produce any error message. Instead, it just appears to stop executing without indication.

This problem becomes particularly evident when the ss -s command is executed, revealing thousands of sockets stuck in the FIN-WAIT-1 state. This excessive number of sockets in the FIN-WAIT-1 state, coupled with the system's kernel repeatedly logging "TCP: too many orphaned sockets," points to a bottleneck in the system's ability to manage these sockets.

Kernel log examples:

Apr 27 08:19:42 curious-rosetaupe-raven kernel: [ 5932.164427] TCP: too many orphaned sockets Apr 27 08:20:28 curious-rosetaupe-raven kernel: [ 5978.499299] TCP: too many orphaned sockets Apr 27 08:20:36 curious-rosetaupe-raven kernel: [ 5986.275047] TCP: too many orphaned sockets Apr 27 08:20:39 curious-rosetaupe-raven kernel: [ 5990.115057] TCP: too many orphaned sockets Apr 27 08:21:08 curious-rosetaupe-raven kernel: [ 6018.946519] TCP: too many orphaned sockets Apr 27 08:22:00 curious-rosetaupe-raven kernel: [ 6070.657708] TCP: too many orphaned sockets Apr 27 08:22:02 curious-rosetaupe-raven kernel: [ 6072.193706] TCP: too many orphaned sockets Apr 27 08:22:04 curious-rosetaupe-raven kernel: [ 6074.337643] TCP: too many orphaned sockets

The accumulation of orphaned sockets, particularly those in the FIN-WAIT-1 state, seems to be directly related to tls-scan's halting issue. This becomes a significant problem when scanning larger sets of hosts. Any insights into mitigating the excessive generation of these orphaned sockets or advice on potential workarounds would be extremely beneficial.

prbinu commented 1 year ago

Hi @taco-killchain, thanks for trying tls-scan! Though a casual use of tls-scan works out of box (say to scan few hundreds or thousand hosts), scanning large set of hosts in a fast manner requires tweaking the system's kernel/TCP/IP parameters in addition to the general system capacity (cores, memory, nw-bandwidth etc.)

You are seeing FIN-WAIT-1 because tls-scan closed the connection, and is yet to receive ACK from the server. I believe this is expected, and if i remember correct, we are using SO_LINGER with linger interval of zero to reset the connection immediately to make the port available quickly for the next scan.

I believe this is a system behavior than a tls-scan issue. If you have any suggestion, I'm happy to try it out.

Meanwhile, these are few things you could try and see if that helps.

  1. tls-scan has an option -t --timeout and--concurrency=<number> that would help throttle your requests.
  2. If you are scanning large IP blocks, consider tweaking your kernel parameters (and use powerful machine with high-bandwidth conn). Here are some helpful links: (a) https://levelup.gitconnected.com/linux-kernel-tuning-for-high-performance-networking-high-volume-incoming-connections-196e863d458a (b) https://medium.com/@pawilon/tuning-your-linux-kernel-and-haproxy-instance-for-high-loads-1a2105ea553e
taco-killchain commented 1 year ago

Hey @prbinu,

Thanks for the reply. After posting this, I was playing with some network settings, and found lowering the tcp_fin_timeout to 5 helped keep the orphans under control, without adversely affecting the efficacy of the collection.

sudo sysctl -w net.ipv4.tcp_fin_timeout=5 sudo sysctl -w net.ipv4.tcp_max_orphans=10000