openresty / lua-nginx-module

Embed the Power of Lua into NGINX HTTP servers
https://openresty.org/
11.27k stars 2.03k forks source link

Response body missing when using ngx.location.capture() to fetch response from a location that uses proxy_pass, cache & aio #587

Open ram-nadella opened 8 years ago

ram-nadella commented 8 years ago

I have a setup where requests are captured by a location that has a content_by_lua_file directive. The lua code here contains some pre and post processing, but the important bit is that it uses ngx.location.capture to make a request to another nginx location. This location is setup with proxy_pass and cache.

I wanted to measure the impact of the dedicated io threads feature that was added to nginx core recently and so recompiled openresty (1.9.3.1) with --with-threads and --with-file-aio flags passed to the configure script and after the rebuild and nginx restart, added the aio threads; directive to the second location that handles proxying and caching.

The result was that the response body was missing when I hit the location handled by lua. The headers are present. Response looks fine when I hit the cached location directly instead of hitting the lua location that uses ngx.localtion.capture

Here is what the example setup looks like:

location /location1 {
    content_by_lua_file "location1.lua";
}

location /location2 {
    rewrite                 /location2/(.*) /$1 break;
    proxy_pass              http://localhost:8081;

    proxy_cache             my_cache;
    proxy_cache_key         $uri$is_args$args;

    proxy_cache_lock        on;
    proxy_cache_valid       200 7d;
    proxy_ignore_headers    "Expires" "Cache-Control" "Set-Cookie";

    # add a cache hit status header
    more_set_headers "X-Nginx-Cache: $upstream_cache_status";

    # newly added
    aio     threads;
}

The contents of location1.lua look something like this:

local url = "/location2" .. ngx.var.uri
sr_response = ngx.location.capture(url)

-- additional logic

ngx.status = sr_response.status
ngx.print(sr_response.body)

The goal is to take advantage of nginx's HTTP caching capabilities and to execute some additional logic in lua based on the response.

Removing the aio directive makes it work as expected again. Same results with aio on; instead of aio threads;

Tried it with the error_log level set to debug, didn't see anything except for a couple lines that said route matched location2 and that the connection was closed after that.

agentzh commented 8 years ago

@ram-nadella You need to call ngx.req.read_body() before the ngx.location.capture call. It's your duty to explicitly read the request body (since there's various different ways to read the request body).

ram-nadella commented 8 years ago

Hey @agentzh. Thanks for the quick reply and hope you are doing well.

Isn't ngx.req.read_body() meant for the request body coming from the client? Am I overlooking something? In my case, the response body is missing.

(I omitted some details in my sample conf above so I could make a clear point. I have the lua_need_request_body on; directive in my nginx conf and I also pass in {always_forward_body = true} to ngx.location.capture()). As far as I can tell, I don't think there is a problem with the request body, because the requests I am testing with are all GET requests.

The problem is that the response body is missing when I include the aio threads; directive in the location2 block and hit location1 (which will call location2 using ngx.location.capture()).

agentzh commented 8 years ago

@ram-nadella Okay, I've just had a closer look at your description. Yes, it's a known issue with this particular combination (aio + proxy_cache + ngx.location.capture) [1]. I haven't had the chance to investigate and fix this. You're welcome to do in-depth investigation on your side :)

[1] https://groups.google.com/d/topic/openresty/ixthVWfwc8Y/discussion

agentzh commented 8 years ago

@ram-nadella If you don't have the tuits to take care of this, then I'll eventually look into this issue as soon as I can manage. Too busy at the moment. Sorry.

agentzh commented 8 years ago

@ram-nadella I appreciate your report.

ram-nadella commented 8 years ago

Thanks for looking into this, let me see what I can do with my limited knowledge :) Will post here if I find anything.

agentzh commented 8 years ago

@ram-nadella Thanks!

ram-nadella commented 8 years ago

I re-compiled nginx with the debug flag and got the debug logs. Here is the comparison:

Without aio:

2016/03/07 20:41:40 [debug] 13138#0: *1 http proxy header done
2016/03/07 20:41:40 [debug] 13138#0: *1 http file cache send: /var/nginx-cache/data/88/e1/88722a410a3607dd19e8b081354fe188
2016/03/07 20:41:40 [debug] 13138#0: *1 headers more header filter, uri "<uri>"
2016/03/07 20:41:40 [debug] 13138#0: *1 http script var: "HIT"
2016/03/07 20:41:40 [debug] 13138#0: *1 http script copy: ""
2016/03/07 20:41:40 [debug] 13138#0: *1 lua capture header filter, uri "<uri>"
2016/03/07 20:41:40 [debug] 13138#0: *1 lua capturing response body
2016/03/07 20:41:40 [debug] 13138#0: *1 http output filter "<uri_with_query_params>"
2016/03/07 20:41:40 [debug] 13138#0: *1 http copy filter: "<uri_with_query_params>"
2016/03/07 20:41:40 [debug] 13138#0: *1 malloc: 000000000138EE50:4533
2016/03/07 20:41:40 [debug] 13138#0: *1 read: 15, 000000000138EE50, 4533, 911
2016/03/07 20:41:40 [debug] 13138#0: *1 lua capture body filter, uri "<uri>"
2016/03/07 20:41:40 [debug] 13138#0: *1 lua capture body filter capturing response body, uri "<uri>"
2016/03/07 20:41:40 [debug] 13138#0: *1 lua allocate new chainlink and new buf of size 4533, cl:00000000013195A0
2016/03/07 20:41:40 [debug] 13138#0: *1 malloc: 0000000001390010:4533
2016/03/07 20:41:40 [debug] 13138#0: *1 http copy filter: 0 "<uri_with_query_params>"
2016/03/07 20:41:40 [debug] 13138#0: *1 http finalize request: 0, "<uri_with_query_params>" a:0, c:3
2016/03/07 20:41:40 [debug] 13138#0: *1 lua run post subrequest handler, rc:0 c:3
2016/03/07 20:41:40 [debug] 13138#0: *1 lua restoring write event handler
2016/03/07 20:41:40 [debug] 13138#0: *1 malloc: 00000000013911D0:4533
2016/03/07 20:41:40 [debug] 13138#0: *1 http wake parent request: "<uri_with_query_params>"
2016/03/07 20:41:40 [debug] 13138#0: *1 http finalize request: -4, "<uri_with_query_params>" a:0, c:2
2016/03/07 20:41:40 [debug] 13138#0: *1 http request count:2 blk:0
2016/03/07 20:41:40 [debug] 13138#0: *1 http posted request: "<uri_with_query_params>"
2016/03/07 20:41:40 [debug] 13138#0: *1 lua run subrequests done, resuming lua thread
2016/03/07 20:41:40 [debug] 13138#0: *1 lua handle subrequest responses
2016/03/07 20:41:40 [debug] 13138#0: *1 free: 00000000013911D0
2016/03/07 20:41:40 [debug] 13138#0: *1 lua run thread, top:0 c:1

With aio:

2016/03/07 21:05:04 [debug] 23884#23884: *4 http proxy header done
2016/03/07 21:05:04 [debug] 23884#23884: *4 http file cache send: /var/nginx-cache/data/88/e1/88722a410a3607dd19e8b081354fe188
2016/03/07 21:05:04 [debug] 23884#23884: *4 headers more header filter, uri "<uri>"
2016/03/07 21:05:04 [debug] 23884#23884: *4 http script var: "HIT"
2016/03/07 21:05:04 [debug] 23884#23884: *4 http script copy: ""
2016/03/07 21:05:04 [debug] 23884#23884: *4 lua capture header filter, uri "<uri>"
2016/03/07 21:05:04 [debug] 23884#23884: *4 lua capturing response body
2016/03/07 21:05:04 [debug] 23884#23884: *4 http output filter "<uri_with_query_params>"
2016/03/07 21:05:04 [debug] 23884#23884: *4 http copy filter: "<uri_with_query_params>"
2016/03/07 21:05:04 [debug] 23884#23884: *4 malloc: 0000000001564F30:4533
2016/03/07 21:05:04 [debug] 23884#23884: *4 thread read: 14, 0000000001564F30, 4533, 911
2016/03/07 21:05:04 [debug] 23884#23884: pthread_mutex_lock(0000000001507DE8) enter
2016/03/07 21:05:04 [debug] 23884#23884: pthread_cond_signal(0000000001507E28)
2016/03/07 21:05:04 [debug] 23884#23884: pthread_mutex_unlock(0000000001507DE8) exit
2016/03/07 21:05:04 [debug] 23884#23884: task #4 added to thread pool "default"
2016/03/07 21:05:04 [debug] 23884#23884: *4 http copy filter: -2 "<uri_with_query_params>"
2016/03/07 21:05:04 [debug] 23884#23884: *4 http finalize request: -2, "<uri_with_query_params>" a:0, c:2
2016/03/07 21:05:04 [debug] 23884#23884: *4 lua run post subrequest handler, rc:-2 c:2
2016/03/07 21:05:04 [debug] 23884#23884: *4 lua restoring write event handler
2016/03/07 21:05:04 [debug] 23884#23884: *4 event timer add: 13: 60000:1457384764022
2016/03/07 21:05:04 [debug] 23884#23884: *4 http posted request: "<uri_with_query_params>"
2016/03/07 21:05:04 [debug] 23884#23884: *4 lua run subrequests done, resuming lua thread
2016/03/07 21:05:04 [debug] 23884#23884: *4 lua handle subrequest responses
2016/03/07 21:05:04 [debug] 23884#23884: *4 lua run thread, top:0 c:2

The http copy filter and http finalize request lines both have a return code 0 in the first case where with aio it is -2

The http copy filter debug log line appears to come from here

_Note: The <uri> and <uri_with_query_params> are the actual request uri, which I removed from the log lines._

agentzh commented 8 years ago

@ram-nadella Okay, I see what is going on here. Thanks for the debugging logs. Basically, when the subrequest is using AIO, the post_subrequest handler registered by the parent request is called prematurely (that is, it is called with the rc parameter set to NGX_AGAIN, as you have noted). And when the AIO completes, it does not call the post_subrequest handler again, which is counter-intuitive, but just call the parent request's write event handler directly. One possible fix is to register a special write event handler for the parent request in the post_subrequest handler when rc == NGX_AGAIN and in that special write event handler of the parent, we prepare the subrequest's response body data and continue calling the original write event handler of the parent. Patches welcome :)

totallyunknown commented 8 years ago

I've hit this bug right now. In the meantime, any ideas how to fix this?

ram-nadella commented 8 years ago

@totallyunknown your options are to either turn off aio on the target location where your sub-request is pointing or to avoid the sub-request until this issue is fixed.

See @agentzh's comment above yours that has hints for implementing a fix. Here is my understanding of it: ngx_http_lua_post_subrequest is being called with rc -2 (NGX_AGAIN) but the body is not available at that point, so we need to detect this case and capture the body in the parent request because we are not seeing a call to ngx_http_lua_post_subrequest later on as expected (when the body is available).

BTW, thanks @agentzh for all those details. I made a quick attempt but haven't been able to come up with a solution yet. Is it fair to say this observed behaviour of not calling the post sub-request handler when body becomes available later is unexpected? If so, should this be fixed in nginx?

agentzh commented 8 years ago

@ram-nadella I'm not sure if it's the desired behavior or a bug in nginx. This is not documented anyway. We'll have a deeper look at the logic there. @doujiang24 Will you have a look please? ;)

piotrp commented 5 years ago

Hello,

Any progress on this issue?

agentzh commented 5 years ago

@Crack Just turn off aio in that proxy location when doing subrequests for now. We don't have the time to look into this so we rely on community patches to address this for real for now.