Open wangfakang opened 5 years ago
@wangfakang Thanks for the report. Will you create a proper github pull request with a test case demonstrating the problem and covering the fix? Thanks!
@agentzh Thanks for your reply and https://github.com/openresty/lua-nginx-module/pull/1555 has been submitted. The problem within our internal module that will cause the process coredump. In the log and charset modules of Nginx, the merge_loc_conf
function will change the value of main_conf
, but the change is insignificant, so the test case is a little hard to cover. Do you have any good Suggestions?
@wangfakang I suggest you devise a minimal fake nginx module to mimic the same problem in your internal module and make it part of our test suite. We already have some fake nginx modules in our existing test suite to cover such kind of things:
https://github.com/openresty/lua-nginx-module/tree/master/t/data
Thanks for your contribution!
Cool, that's a good idea.
@agentzh I have added the ngx_http_fake_merge_module module to reproduce the problem, and use the following configuration to reproduce the problem. Plz Review https://github.com/openresty/lua-nginx-module/pull/1555, thx.
The ngx_http_fake_merge_module module adds a internal variable fake_var
with a default value of 1.
http {
#init_worker_by_lua '
# local a = 1
#';
server {
listen 80;
location /t {
content_by_lua '
ngx.say("fake_var = ", ngx.var.fake_var)
';
}
}
}
curl localhost/t
result: fake_var = 1
http {
init_worker_by_lua '
local a = 1
';
server {
listen 80;
location /t {
content_by_lua '
ngx.say("fake_var = ", ngx.var.fake_var)
';
}
}
}
curl localhost/t
result: fake_var = 0
@wangfakang Thanks!
@thibaultcha Will you please help take care of this? Many thanks!
@thibaultcha Please help to review it. Thx.
Closed by #1555. Thanks for the catch!
Cloning the main conf seems to make the issue worse since it's common for 3rd-party modules to use main conf to keep track of global state. Cloning the main conf would make the init worker context go out of sync with the request context (we should keep in mind that the init worker context may have a long life when it fires off recurring timers). So modifying main conf later is indeed expected by design and does not justify the cloning of main conf (but actually go against it). I guess there might be something deeper here. But definitely cloning main conf to make this particular use case pass is going the completely wrong direction.
We should revert #1326 in master before the next release.
Agreed; what sparked this conversation with @agentzh is a failing test that appeared with lua-resty-upstream-healthcheck after having merged this fix (#1326). Here is the failure in question: https://travis-ci.org/openresty/lua-resty-upstream-healthcheck/jobs/636662963.
This failure is due to the underlying lua-upstream-nginx-module relying on ngx_http_upstream_main_conf_t
having its upstreams
array member populated. However, the mock main_conf
allocated by the fix in #1326 isn't properly initialized (since configuration parsing isn't triggered), and ngx_http_upstream_main_conf_t
has exactly 0
upstreams in ngx_lua's init_worker
phase.
In more details, this is what happens:
init_worker
, the user calls hc.spawn_checker()
(as per the module's usage)hc.spawn_checker()
calls lua-upstream-nginx-module's upstream.get_primary_peers()
upstream.get_primary_peers()
calls ngx_http_lua_upstream_find_upstream()
(https://github.com/openresty/lua-upstream-nginx-module/blob/master/src/ngx_http_lua_upstream_module.c#L263)ngx_http_lua_upstream_find_upstream()
reads the mock main conf, and thus reports 0 upstreams (https://github.com/openresty/lua-upstream-nginx-module/blob/master/src/ngx_http_lua_upstream_module.c#L519)In short, both approaches (with and without the fix in #1326) present some issues:
merge_srv_conf
or merge_loc_conf
can experience side effects (since these functions are effectively called twice: once by NGINX and once by ngx_lua, both with the original main_conf
pointer).init_worker
phase (as detailed above).Here are a few fixes that come up to my mind:
main_conf
is properly initialized; two possibilities:
2a. Make a full copy of each module's main_conf
(this would be tricky with NGINX memory pools)
2b. Re-parse the configuration file and call ngx_http_*_init_main_conf()
for each mock main_conf
, which would, in this example, ensure that ngx_http_upstream
properly populates its umcf->upstreams
array (I think this is absolutely out of question)@thibaultcha Thanks for the detailed analysis. I don't like the idea of making a new copy of the main conf at all. Since it should be unique by design. Keeping a copy might incur other out-of-sync problems (like a request handler populates a state change in main conf but the timer handler in the init worker context would never see it).
It is really a surprise to see that the merge loc conf handlers cannot run more than once. I think the best way is to make this thing safe to get called repeatedly. Is that possible?
I suggest hold this thing until a new OpenResty release is made. It is nontrivial to fix.
For now, I just reverted the fix in c2390ab.
In the
ngx_http_lua_init_worker
function, use thehttp_ctx.main_conf = cycle->conf_ctx->main_conf
to initialize. However, the value ofmain_conf
may be modified in the later bymerge_srv_conf
ormerge_loc_conf
functions, which may cause some unexpected problems.Forexample
ngx_http_charset_merge_loc_conf
function will be modify the value ofmcf->recodes
.So we should create of our own
http_ctx.main_conf
instead of directly reusing the currenthttp_ctx.main_conf
. Maybe it's fixed by PR , Plz check.