Open meilke opened 3 years ago
Hey, thanks for the detailed report. I think the solution would be to ensure the state machine goes through aborting
like it would when the client connection aborts. But I'm a bit unsure because I thought that calling setkeepalive
on a connection where data remains to be read on the wire will be safely denied and closed automatically - that is, it won't be reused. Whereas your symptoms suggest otherwise.
I'll try to write some tests that provoke the situation, hopefully soon. Sorry, I realise this is actually a pretty serious issue. If you're able to get a test case to demonstrate the issue that would be great.
I never tried to get the test cases involved, I only ever used the library in an actual openresty installation. I can reproduce it fairly easy by setting a ridiculously low value to upstream_read_timeout
and in parallel or short after doing a lot of other requests. Our UI makes this easy as there is a frameset involved that refreshes a few frames from HTTP requests (that then show the wrong part in a wrong frame).
I could reproduce the issue at #200 in a test case (without my - probably - incorrect fix, of course). I have one timed out request and three other requests where the timed out request response shows up later for a different request.
Hi @pintsized, as my colleague has moved on in the meantime, I'm trying to take over here. Did you find @meilke's suggested test case helpful so far? Do you see anything else we could contribute atm for advancing this issue? Thx!
Hey, at this stage no, but thanks. The PR is super useful, I just need a little time to run it myself and figure out what the best place for a fix is (and to understand what detail I'm missing that led to this issue). Your best short term workaround is a longer timeout, because I'm pretty sure this can only happen when our socket timeout is exhausted (i.e. it can't happen if the downstream client disconnects or if the origin gives up before our timeout).
I'll try to get to it asap.
I noticed that from time to time I get wrong responses to my HTTP requests. In one of our UI's some users all of a sudden would get HTML responses that originated from other users and that were not looking like the data they expected, obviously.
After looking into the openresty logs I would see
524
responses near the actual problematic requests (that were receiving the wrong response data). The524
did not happen in the connection establishing but in the actual request being done by the HTTP client. The wrong response data came from these timed out requests. So it seems like the timed out request is still going on and when data arrives it might get "re-used" elsewhere.A similar problem was reported in
lua-resty-http
: https://github.com/ledgetech/lua-resty-http/issues/138I found that I can solve the issue by calling
client:close()
after generating the524
but that might not be enough. I could follow up with a PR but maybe there is a different solution out there.My setup:
lua-ffi-zlib
v0.5-0lua-resty-cookie
v0.1.0-1lua-resty-http
v0.16.1-0lua-resty-qless
v0.11-0lua-resty-redis-connector
v0.10-0My config: