richardhundt / luv

libuv bindings for Lua
Apache License 2.0
118 stars 19 forks source link

tcp server load test #7

Open dvv opened 11 years ago

dvv commented 11 years ago

Hi! I've drafted a simple hello-world HTTP server, to test luv under the load via ab -n100000 -c500 ...

The result is that the server stopped responding after circa 50000 requests.What can be wrong?

I wonder do we have any explicit ot implicit limit on the number of concurrent coroutines?

richardhundt commented 11 years ago

Hey, thanks for the feedback.

No there's no limit on the number of coroutines. If you're throwing 500 concurrent connections at it, then you need to increase your listen backlog:

server:listen(1024) -- default is 128

I don't think that was the problem though, ab would just get connection refused and it would have to back off.

So if you try calling client:shutdown() after client:write() I think you'll get better behaviour from ab. close is a bit harsh, so it's probably getting incomplete responses. I've also seen these 20 second pauses (although node.js was much worse under the same work load), which might have to do with file-descriptor reuse and linger on them.

Finally, you can do ab -k for keepalive and try this guy:

https://gist.github.com/3857202

and run with:

ab -k -r -n 10000 -c 500 http://127.0.0.1:8080/

I'm getting in the order of 16k req/sec with very few I/O errors.

miko commented 11 years ago

Seems I have the same issue: $ ab -k -r -n 10000 -c 500 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 655654 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests apr_poll: The timeout specified has expired (70007) Total of 3955 requests completed

The server printed lots of: got zero, EOF?

I have modified the line to: print("got zero, EOF?", str)

and then got:

accept: userdata: 0x945124c accept: userdata: 0x94517ec enter child enter child got zero, EOF? GET / HTTP/1.0 Connection: Keep-Alive Host: 127.0.0.1:8080 User-Agent: ApacheBench/2.3 Accept: /

ApacheBench/2.3 Accept: /

got zero, EOF? GET / HTTP/1.0 Connection: Keep-Alive Host: 127.0.0.1:8080 User-Agent: ApacheBench/2.3 Accept: /

Accept: /

got zero, EOF? GET / HTTP/1.0 Connection: Keep-Alive Host: 127.0.0.1:8080 User-Agent: ApacheBench/2.3 Accept: /

So clearly there is some bufferring issue: request data gets mixed up.

richardhundt commented 11 years ago

I've pushed a change to git head which should fix it. I wasn't pushing the length of the read buffer if there was pending data.

richardhundt commented 11 years ago

Try git head now.

On Oct 9, 2012, at 10:27 AM, miko wrote:

Seems I have the same issue: $ ab -k -r -n 10000 -c 500 http://127.0.0.1:8080/ This is ApacheBench, Version 2.3 <$Revision: 655654 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests apr_poll: The timeout specified has expired (70007) Total of 3955 requests completed

The server printed lots of: got zero, EOF?

I have modified the line to: print("got zero, EOF?", str)

and then got:

accept: userdata: 0x945124c accept: userdata: 0x94517ec enter child enter child got zero, EOF? GET / HTTP/1.0 Connection: Keep-Alive Host: 127.0.0.1:8080 User-Agent: ApacheBench/2.3 Accept: /

ApacheBench/2.3 Accept: /

got zero, EOF? GET / HTTP/1.0 Connection: Keep-Alive Host: 127.0.0.1:8080 User-Agent: ApacheBench/2.3 Accept: /

Accept: /

got zero, EOF? GET / HTTP/1.0 Connection: Keep-Alive Host: 127.0.0.1:8080 User-Agent: ApacheBench/2.3 Accept: /

So clearly there is some bufferring issue: request data gets mixed up.

— Reply to this email directly or view it on GitHub.

miko commented 11 years ago

Much better now: I get all the request data. But when it runs succesfully; I get at the exit: lua: src/unix/stream.c:934: uvread: Assertion `!uvio_active(&stream->read_watcher)' failed.

But most of the time it just hangs as in the original report, until ab times out.

miko commented 11 years ago

Repeated the test with luajit instead of lua (and modified print line), got: got zero, EOF? GET / HTTP/1.0 Connection: Keep-Alive Host: 127.0.0.1:8080 User-Agent: ApacheBench/2.3 Accept: /

got zero, EOF? GET / HTTP/1.0 Connection: Keep-Alive Host: 127.0.0.1:8080 User-Agent: ApacheBench/2.3 Accept: /

� � got zero, EOF? GET / HTTP/1.0 Connection: Keep-Alive Host: 127.0.0.1:8080 User-Agent: ApacheBench/2.3 Accept: /

� � got zero, EOF? GET / HTTP/1.0 Connection: Keep-Alive Host: 127.0.0.1:8080 User-Agent: ApacheBench/2.3 Accept: /

got zero, EOF? GET / HTTP/1.0 Connection: Keep-Alive Host: 127.0.0.1:8080 User-Agent: ApacheBench/2.3 Accept: /

� � got zero, EOF? GET / HTTP/1.0 Connection: Keep-Alive Host: 127.0.0.1:8080 User-Agent: ApacheBench/2.3 Accept: /

(in case it is not visible: the data has appended some binary bytes, which my terminal interprets as unicode)

After another modification: print("got zero, EOF?", #str, str) I can see thet all data has the size of 65536 bytes, which can't be true, so some garbage is added somewhere to the received data.

richardhundt commented 11 years ago

Ah, okay, buf.len is the size of the buffer and not how much is read. Try now.

On Oct 9, 2012, at 10:54 AM, miko wrote:

Repeated the test with luajit instead of lua (and modified print line), got: got zero, EOF? GET / HTTP/1.0 Connection: Keep-Alive Host: 127.0.0.1:8080 User-Agent: ApacheBench/2.3 Accept: /

got zero, EOF? GET / HTTP/1.0 Connection: Keep-Alive Host: 127.0.0.1:8080 User-Agent: ApacheBench/2.3 Accept: /

� � got zero, EOF? GET / HTTP/1.0 Connection: Keep-Alive Host: 127.0.0.1:8080 User-Agent: ApacheBench/2.3 Accept: /

� � got zero, EOF? GET / HTTP/1.0 Connection: Keep-Alive Host: 127.0.0.1:8080 User-Agent: ApacheBench/2.3 Accept: /

got zero, EOF? GET / HTTP/1.0 Connection: Keep-Alive Host: 127.0.0.1:8080 User-Agent: ApacheBench/2.3 Accept: /

� � got zero, EOF? GET / HTTP/1.0 Connection: Keep-Alive Host: 127.0.0.1:8080 User-Agent: ApacheBench/2.3 Accept: /

(in case it is not visible: the data has appended some binary bytes, which my terminal interprets as unicode)

After another modification: print("got zero, EOF?", #str, str) I can see thet all data has the size of 65536 bytes, which can't be true, so some garbage is added somewhere to the received data.

— Reply to this email directly or view it on GitHub.

richardhundt commented 11 years ago

This error comes when you try to read after an error has occurred.

try doing:

if client:write(http_response) == -1 then client:close() end

I'll have to fix it so that it doesn't crash like that.

Thanks again for the report.

On Oct 9, 2012, at 10:49 AM, miko wrote:

Much better now: I get all the request data. But when it runs succesfully; I get at the exit: lua: src/unix/stream.c:934: uvread: Assertion `!uvio_active(&stream->read_watcher)' failed.

But most of the time it just hangs as in the original report, until ab times out.

— Reply to this email directly or view it on GitHub.

dvv commented 11 years ago

seems behaving much better, thank you. am still not fluent in coro-based async way -- couldn't you mangle my example to add delay to the response logic, to simulate async work. tia

miko commented 11 years ago

The buffer size issue is now resolved: when using ab everything works great! I do still get uv__read assertion, and I do use client:close(), as it is your httpd-k.lua example. The reason may be the ab just drops connection, but still the server should not crash like this. I get this also for ab -k -n 1 -c 1 http://127.0.0.1:8080/

miko commented 11 years ago

dvv: I have put my version of httpd-k.lua with timers in https://gist.github.com/3857202#comments

richardhundt commented 11 years ago

Hi miko,

can you try git head now? I've made stream:close() yield and made a couple of other changes which hopefully should fix it.

Thanks for all your support.

On Oct 9, 2012, at 12:21 PM, miko wrote:

The buffer size issue is now resolved: when using ab everything works great! I do still get uv__read assertion, and I do use client:close(), as it is your httpd-k.lua example. The reason may be the ab just drops connection, but still the server should not crash like this. I get this also for ab -k -n 1 -c 1 http://127.0.0.1:8080/

— Reply to this email directly or view it on GitHub.

richardhundt commented 11 years ago

I've been throwing ab at this thing all morning and it seems that it's a bit pathological. It'll pump in the same headers repeatedly even without keepalive, so I'm see some 20k buffers coming in in a single libuv read_cb from the socket.

So basically libuv will keep happily reading from the socket as long as ab is writing, and ab doesn't stop writing as long as libuv is reading. So if you get these big chunks occasionally.

What I've done now is give stream:read and parameter which specifies the buffer size to allocate. So stream:read(1024) stops libuv reading past that size and it fires the callback to rouse the coroutine.

Another other issue is that I'm getting occasional 20 second stalls and I think it's related to this discussion:

https://github.com/joyent/libuv/pull/498

Unfortunately there's no uv_tcp_linger implementation (just the keepalive probes) :(

Another problem is when reading EOF. Sometimes, immediately after a socket is accepted, reading from it gets an EOF from libuv. I think the only sane thing to do is to propagate this back to the caller and wake the coroutine, but what's tricky then, is knowing whether to close the socket, or to try again.

If you close the socket, ab will barf with EPIPE or ECONNRESET, which sucks.

To summarize, I'm finding that libuv + ab aren't really happy with each other. Perhaps it's just my code. I'll keep at it, since you guys seem hell bent on building an HTTP deamon out of this :)

dvv commented 11 years ago

i switched to https://github.com/wg/wrk and things are going well so far. ab is too slow and dumb to test luv :)

richardhundt commented 11 years ago

awesome, thanks for the tip!

On Oct 9, 2012, at 2:05 PM, Vladimir Dronnikov wrote:

i switched to https://github.com/wg/wrk and things are going well so far. ab is too slow and dumb to test luv :)

— Reply to this email directly or view it on GitHub.

dvv commented 11 years ago

That's what I got on my slow setup:

$ wrk -t8 -c2048 -r1m http://localhost:8080/
Making 1000000 requests to http://localhost:8080/
  8 threads and 2048 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   214.97ms    2.25s    1.88m    98.43%
    Req/Sec     0.07      8.53     1.00k    99.99%
  1000035 requests in 4.85m, 60.08MB read
  Socket errors: connect 0, read 2406, write 0, timeout 159084
Requests/sec:   3438.60
Transfer/sec:    211.55KB
miko commented 11 years ago

Thanks, that fixed it for me! And yes, http daemon built into an application is nice ;)

dvv commented 11 years ago

i believe we just need http-parser binding and the lua-level http request/response logic. i wonder if @creationix 's luvit/web collection would fit

creationix commented 11 years ago

I want luvit/web to be as portable as possible. But I don't think I can get away from having a defined spec for readable and writable streams as well as a data format (currently lua strings). We can probably add support for multiple data formats (lua strings, ffi cdata, and lev cdata buffers).

My stream interface is very simple, but is callback based. I'm not sure how that fits into this project.

Readable stream: any table that has a :read()(callback(err, chunk)) method. Read is a method that returns a function that accepts a callback that gets err and chunk.

Writable stream: any table that has a :write(chunk)(callback(err)) method. Write is a method that accepts the chunk and returns a function that accepts the callback that gets the err.

Since the continuable is returned from the methods, it should be easy to write wrappers for this for other systems. I have coro sugar in my continuable library where you can do things like chunk = await(stream:read())

creationix commented 11 years ago

For data encoding, we could add encoding arguments to both :read(encoding) and :write(chunk, encoding) to allow supporting multiple data types. :write could probably even auto-detect the type and convert for you.

The harder issue for "luv" is I use continuables (functions that return functions that accept callbacks)

richardhundt commented 11 years ago

On Oct 9, 2012, at 5:23 PM, Tim Caswell wrote:

For data encoding, we could add encoding arguments to both :read(encoding) and :write(chunk, encoding) to allow supporting multiple data types. :write could probably even auto-detect the type and convert for you.

The harder issue for "luv" is I use continuables (functions that return functions that accept callbacks)

Yeah, I think we'd need to start with https://github.com/joyent/http-parser.git and work our way up from there, as you did.

dvv commented 11 years ago

wonder how to write a wrapper for callback-style logic to be used in these project fibers? we could keep http stuff extrrnal then easily

dvv commented 11 years ago

just fyi i explored common logic for http_parser in C at https://github.com/dvv/luv/blob/master/src/uhttp.c#L270 ago -- got stuck at C memory management. but if we want kinda generic http parser which reports data/end, i believe most of its callbacks may be hardcoded in C.

richardhundt commented 11 years ago

You could do something like:

my.wrapper.read = function() local curr = luv.self() -- store the current state local data some.event.library.read(fd, function(...) data = ... curr:ready() end) curr:suspend() -- go to sleep until the callback fires return data end

On Oct 9, 2012, at 7:46 PM, Vladimir Dronnikov wrote:

wonder how to write a wrapper for callback-style logic to be used in these project fibers? we could keep http stuff extrrnal then easily

— Reply to this email directly or view it on GitHub.

richardhundt commented 11 years ago

I'd start at the top. I think the API should look something like this:

local req = httpd:recv()

where req is a RACK/PSGI [1] like environment table with:

req = { REQUEST_METHOD = …, SCRIPT_NAME = …, CONTENT_LENGTH = …,

 … the rest of the CGI environment vars… 

 luv_input  = <input stream>,
 luv_output = <output stream>,
 luv_errors = <errors stream>,

}

The body of the request (if any) is read, by the application, from the luv_input stream once the headers are parsed and you know your CONTENT_LENGTH. For all the rest, the parsing and it's C callbacks should be internal. There's no need to expose that. You can still do non-blocking read via libuv callbacks, and feed the HTTP parser with chunks, you just don't call luvL_state_ready() until you've got the headers, wired up the pipes and built the req table. There does not need to be a 1-1 correspondence between a libuv callback and waking a suspended fiber. You can run all the C callbacks you like until you're ready.

A response can be either streamed out via luv_output (streaming video, or whatever), or sent as:

httpd:send({ 200, { ["Content-Type"] = "text/html", … }, })

[1] http://search.cpan.org/~miyagawa/PSGI-1.101/PSGI.pod#The_Environment

On Oct 9, 2012, at 8:11 PM, Vladimir Dronnikov wrote:

just fyi i explored common logic for http_parser in C at https://github.com/dvv/luv/blob/master/src/uhttp.c#L270 ago -- got stuck at C memory management. but if we want kinda generic http parser which reports data/end, i believe most of its callbacks may be hardcoded in C.

— Reply to this email directly or view it on GitHub.

dvv commented 11 years ago

that looks interesting, i'd love to try it

miko commented 11 years ago

After update to the latest head I no longer get seg faults on archlinux. Thanks! I think this issue can be closed now.

Regarding parsing http, I suggest opening a new issue (feature request), as it is hard to follow.

dvv commented 11 years ago

indeed. #10