Closed jdmcd closed 6 years ago
Just to confirm, this does not appear to be happening on macOS Sierra.
Engine: 2.1.3 Vapor: 2.1.2
~$ wrk -t 4 -c 128 -d 10 http://localhost:8080/hello
Running 10s test @ http://localhost:8080/hello
4 threads and 128 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.90ms 130.49us 13.97ms 92.81%
Req/Sec 17.66k 3.60k 23.53k 54.46%
710091 requests in 10.10s, 56.89MB read
Requests/sec: 70303.63
Transfer/sec: 5.63MB
~$
But i'm able to recreate on Ubuntu 16.04 Desktop w/ same dep versions.
parallels@ubuntu:~/Desktop/Speedtest$ wrk -t 4 -c 128 -d 10 http://localhost:8080/hello
Running 10s test @ http://localhost:8080/hello
4 threads and 128 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 40.72ms 23.94ms 1.00s 99.82%
Req/Sec 81.64 38.22 151.00 59.50%
1631 requests in 10.03s, 194.08KB read
Socket errors: connect 0, read 0, write 0, timeout 9
Requests/sec: 162.64
Transfer/sec: 19.35KB
Looks right to me
As far as I can tell, this appears to be an issue w/ wrk
not properly handling multiple writes to the stream. @vzsg @vi4m were you able to recreate this issue on tools other than wrk
?
Hm, I think @vzsg did it with multiple versions of Engine though? I'm not sure why different versions would cause different results with wrk in particular
But it's totally possible I'm missing something too.
@mcdappdev Prior to Engine 2.0.3, if the response was less than 2048 bytes in length, it would be sent entirely in one call to libc.send()
. With 2.0.3, the body is always sent as a separate write call.
Running wrk
on macOS sierra against Vapor running in Linux VM:
~$ wrk -t 4 -c 128 -d 10 http://10.211.55.3:8080/hello
Running 10s test @ http://10.211.55.3:8080/hello
4 threads and 128 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 109.77us 34.20us 5.30ms 95.09%
Req/Sec 16.18k 3.62k 18.64k 82.79%
196749 requests in 10.10s, 22.70MB read
Socket errors: connect 0, read 0, write 0, timeout 1
Requests/sec: 19480.57
Transfer/sec: 2.25MB
Running wrk
from inside the Linux VM Vapor is running in:
parallels@ubuntu:~/Desktop/Speedtest$ wrk -t 4 -c 128 -d 5 http://localhost:8080/hello
Running 5s test @ http://localhost:8080/hello
4 threads and 128 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 41.91ms 43.22ms 1.00s 99.80%
Req/Sec 100.38 35.86 161.00 60.00%
501 requests in 5.03s, 59.99KB read
Socket errors: connect 0, read 0, write 0, timeout 4
Requests/sec: 99.70
Transfer/sec: 11.94KB
Seems likely that the implementation of wrk
on Linux is not continuing to read from the socket until it gets the entire response.
Closing this issue since all indication I can find points to this being a bug in wrk
. I'd attempt to find a workaround so that Vapor's wrk
scores on Linux still look good, but this seems like a waste of time considering Swift Server APIs could soon replace our HTTP server implementation.
I was running wrk on macOS in all tests, never inside the container, or perhaps I'm misunderstanding something about the experiments above.
(Those Swift Server APIs cannot arrive soon enough.)
I just tried to benchmark using Apache Benchmark (ab), the results got even weirder. It seems that ab
gets stuck after each sent response, waiting 30 seconds until timeout – rendering this benchmark pretty useless.
What's really strange is that Node doesn't show this symptom with ab
... Anyway, I'll look for another benchmark tool.
Another benchmark tool, vegeta works fine however, and shows the slowdown between 2.0.2 and 2.1.3:
Latencies [mean, 50, 95, 99, max] 2.173735ms, 2.313166ms, 2.78579ms, 2.878385ms, 5.671501ms // 2.0.2
...
Latencies [mean, 50, 95, 99, max] 33.34679ms, 50.587758ms, 50.835336ms, 50.937763ms, 61.981379ms // 2.1.3
Just to clarify @tanner0101 - I run my tests using wrk client on macOS, not linux, same as @vzsg
Also, I see, that you run your vapor server on "Parallels" - could you try with ubuntu docker image too? Our production servers run on docker, so for me it's the only way to scale it in production.
@tanner0101 I think the issue is related to the fact you don't init your socket with TCP_CORK
or you don't call send
with MSG_MORE
.
TCP_CORK (since Linux 2.2)
If set, don't send out partial frames. All queued partial
frames are sent when the option is cleared again. This is
useful for prepending headers before calling sendfile(2), or
for throughput optimization.
MSG_MORE (since Linux 2.4.4)
The caller has more data to send. This flag is used with TCP
sockets to obtain the same effect as the TCP_CORK socket
option (see tcp(7)), with the difference that this flag can be
set on a per-call basis.
If this is the case, I recommend adding an optional flags
parameter to this function and on the first call that writes out the headers pass in MSG_MORE
@vzsg seems like ab
also expects the entirety of the response to arrive in one read
call.
@vzsg @vi4m wrk
works fine on macOS for me (I got 70303.63 req/s
in https://github.com/vapor/engine/issues/153#issuecomment-320771021). The only problem I had with wrk
was running the Linux compiled version. Are you saying you're getting bad benchmark results with wrk
for macOS?
@BrettRToomey I haven't been able to read anywhere that sending HTTP req/res in multiple write calls is not allowed (in fact it is required for stuff like SSE). Does anyone know of any documentation about what behavior should be expected in terms of number of calls to send
? Perhaps even though it's not specifically mentioned in the HTTP spec, it's become a standard because of how much simpler it is to implement an HTTP parser the doesn't need to handle that edge case.
I'll take a look at implementing socket flushing using the MSG_MORE
option and see if that fixes
Yes, I've been continuously testing with wrk
(and ab
and vegeta
) running on macOS, while the Vapor helloworld was running inside Docker.
WRK seems to be buggy on macOS high Sierra here. I read the HTTP docs lately, didn't see anything about specific requirements for send
ing data on a socket.
TCP_NODELAY may be of use too, sounds like Nagle algorithm is definitely reducing throughput by delaying small packets.
This isn't a TCP issue. If it's the blocking on small packets, it's a TCP issue, a layer below.
@robertjpayne Yeah, seems something like that. It may be easier to check in Wireshark or something and see the actual packets being sent out.
I ended up running it in virtual box myself. It looks like it comes in as two TCP packets (as expected) and then it gets reassembled into a single packet and then interpreted as HTTP, as expected. I'll try running this again with wrk and try to filter through the noise.
Edit: Wow... It looks like when spamming the response type for HTTP packets goes up a lot! It smells a lot like you're getting hit by Nagle's algorithm. ~If this is true, hypothetically, I should be able to increase the payload size and the performance should go up.~ Let me go try that.
Edit edit: I think I successfully disabled Nagle's algorithm system-wide on Linux and I'm still seeing poor performance under wrk. Although, while using cURL all of the statistics, including latency, seem to be perfectly fine.
@vi4m I think Docker has something to do with it too -- running Vapor inside Docker on macOS (Regular Docker.app install) A simple "Hello World" plain text benchmark caps at about 200/req/s.
This is with docker set to use 10GB of ram and 6vCPU since it does run inside a VM.
I'd be interested to see if this fixes the issue https://github.com/vapor/engine/pull/168 (for anyone who can recreate it). That PR modifies the basic server in engine to send the entire response as one libc.send()
(unless it's chunked of course)
Doesn't compile for me :(
/code/.build/checkouts/engine.git-3458089043233217718/Sources/HTTP/Server/BasicServer.swift:156:44: error: cannot convert value of type 'UnsafeRawBufferPointer' to expected argument type 'UnsafeBufferPointer<UInt8>'
static let empty = DispatchData(bytes: UnsafeRawBufferPointer(start: nil, count: 0))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/code/.build/checkouts/engine.git-3458089043233217718/Sources/HTTP/Server/BasicServer.swift:110:48: error: cannot convert value of type 'UnsafeRawBufferPointer' to expected argument type 'UnsafeBufferPointer<UInt8>'
let data = DispatchData(bytes: buffer)
^~~~~~
/code/.build/checkouts/engine.git-3458089043233217718/Sources/HTTP/Server/BasicServer.swift:122:48: error: cannot convert value of type 'UnsafeRawBufferPointer' to expected argument type 'UnsafeBufferPointer<UInt8>'
let data = DispatchData(bytes: buffer)
^~~~~~
Ah looks like a Swift 4 change.
Please see https://github.com/vapor/vapor/issues/1101