ponylang / ponyc

Pony is an open-source, actor-model, capabilities-secure, high performance programming language
http://www.ponylang.io
BSD 2-Clause "Simplified" License
5.73k stars 415 forks source link

TCPListener may have memory exploit by clients that don't send FIN. #1998

Open mfelsche opened 7 years ago

mfelsche commented 7 years ago

EDIT by @jemc: This turns out to be due to a bug in wrk/wrk2, but may have security implications for TCP in general, when dealing with a malicious/misbehaving client. Read on to the rest of the comments for more info.

I wanted to reproduce the benchmark discussed on the pony user mailing list: https://pony.groups.io/g/user/topic/pony_did_poorly_in_a_recent/5411536

A just slightly modified example of the example HTTPServer is used to return only the string "Hello world." and to run with DiscardLog.

A run of wrk with Connection: close header set results in > 99% of the requests to result in socket read errors and the RAM of the httpserver process increases rapidly (a 30 s benchmark run resulted in 2.8 GB VIRT RAM) in contrast to a fresh httpserver process with a benchmark without Connection: close where RAM remains at 621MB VIRT during and after the whole benchmark.

Example output:

~/dev/wrk/wrk -t 4 -c 10 -d30s --timeout 2000 -H 'Connection: close' http://localhost:50000
Running 30s test @ http://localhost:50000
  4 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.97ms   14.85ms 333.71ms   98.19%
    Req/Sec     1.07k   244.22     1.79k    78.19%
  127616 requests in 30.02s, 6.57MB read
  Socket errors: connect 0, read 127615, write 0, timeout 0
Requests/sec:   4251.25
Transfer/sec:    224.19KB

Example output without Connection: close:

 ~/dev/wrk/wrk -t 4 -c 10 -d30s --timeout 2000 http://localhost:50000 
Running 30s test @ http://localhost:50000
  4 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    39.96ms    1.50ms  52.50ms   98.83%
    Req/Sec    50.00     10.00    60.00    100.00%
  6000 requests in 30.01s, 457.20KB read
Requests/sec:    199.96
Transfer/sec:     15.24KB

I dug throught the net and net/http source code to understand the architecture and possible problems of the tcp/http code in pony. I also tried to profile the server but i have not been very successful. Most of the time seems to be spent in epoll which is fine afaik. But then there is lots of time spent in socket close syscalls but most importantly in gc and alloc activity around the _sessions set where connections are kept in order to close them. But i am not 100% sure about this.

My best guess was the _sessions code to be the bottleneck, as it kept references to the tcp-connection actors. Those could only be fully gc'ed after their reference has been removed from the server actor. But removing the code storing a reference to them did not change the results.

I guess there is a bug in how tcp connection close is handled but i was not able to find the root cause. That is why i file this issue here.

Operating system: Ubuntu 16.04 on Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz Pony: 0.14.0-0d920b8ff [debug] compiled with: llvm 3.9.1 -- cc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609

Further observations

mfelsche commented 7 years ago

To make it clear, when i talk about the results that are problematic here i mean that nearly all connections (requests) have been closed with socket read errors on client side. This issue is actually not concerned about performance.

mfelsche commented 7 years ago

when testing with wrk2 from another machine, the number of successful requests reported by the tool equals the number of connections configured with the limit parameter to HTTPServer. All further requests seem to run into a socket timeout.

Looking at the wireshark package capture data, i can see that:

An example session wireshark session screenshot is attached:

image

It also seems it could be related to wrk/wrk2 misbehaving:

mfelsche commented 7 years ago

aaaaaaaaaand .....

it really is a wrk2 problem. Running it from https://github.com/giltene/wrk2/pull/33 fixes the memory and socket read error issue.

I actually don't know if this issue is worth keeping and safeguarding against malicious clients making the httpserver eating up memory. So i keep it open and leave the decision to you.

jemc commented 7 years ago

Great work with your investigation, it's much appreciated!

I'm going to change the ticket title and add a "needs discussion" tag to this to discuss whether it is an important security issue to keep open (for TCP in general, not just HTTP), and how it might be mitigated.