openresty / lua-nginx-module

Embed the Power of Lua into NGINX HTTP servers
https://openresty.org/
11.3k stars 2.03k forks source link

Maybe it's a bug about the ngx_http_lua_init_worker function. #1553

Open wangfakang opened 5 years ago

wangfakang commented 5 years ago

In the ngx_http_lua_init_worker function, use the http_ctx.main_conf = cycle->conf_ctx->main_conf to initialize. However, the value of main_conf may be modified in the later by merge_srv_conf or merge_loc_conf functions, which may cause some unexpected problems.

Forexample ngx_http_charset_merge_loc_conf function will be modify the value of mcf->recodes.

static char *
ngx_http_charset_merge_loc_conf(ngx_conf_t *cf, void *parent, void *child)
{
    ngx_http_charset_loc_conf_t *prev = parent;
    ngx_http_charset_loc_conf_t *conf = child;

    ngx_uint_t                     i;
    ngx_http_charset_recode_t     *recode;
    ngx_http_charset_main_conf_t  *mcf;

    if (ngx_http_merge_types(cf, &conf->types_keys, &conf->types,
                             &prev->types_keys, &prev->types,
                             ngx_http_charset_default_types)
        != NGX_OK)
    {
        return NGX_CONF_ERROR;
    }

    ngx_conf_merge_value(conf->override_charset, prev->override_charset, 0);
    ngx_conf_merge_value(conf->charset, prev->charset, NGX_HTTP_CHARSET_OFF);
    ngx_conf_merge_value(conf->source_charset, prev->source_charset,
                         NGX_HTTP_CHARSET_OFF);

    if (conf->charset == NGX_HTTP_CHARSET_OFF
        || conf->source_charset == NGX_HTTP_CHARSET_OFF
        || conf->charset == conf->source_charset)
    {
        return NGX_CONF_OK;
    }

    if (conf->source_charset >= NGX_HTTP_CHARSET_VAR
        || conf->charset >= NGX_HTTP_CHARSET_VAR)
    {
        return NGX_CONF_OK;
    }

    mcf = ngx_http_conf_get_module_main_conf(cf,
                                             ngx_http_charset_filter_module);
    recode = mcf->recodes.elts;
    for (i = 0; i < mcf->recodes.nelts; i++) {
        if (conf->source_charset == recode[i].src
            && conf->charset == recode[i].dst)
        {
            return NGX_CONF_OK;
        }
    }

    recode = ngx_array_push(&mcf->recodes);
    if (recode == NULL) {
        return NGX_CONF_ERROR;
    }

    recode->src = conf->source_charset;
    recode->dst = conf->charset;

    return NGX_CONF_OK;
}

So we should create of our own http_ctx.main_conf instead of directly reusing the current http_ctx.main_conf. Maybe it's fixed by PR , Plz check.

agentzh commented 5 years ago

@wangfakang Thanks for the report. Will you create a proper github pull request with a test case demonstrating the problem and covering the fix? Thanks!

wangfakang commented 5 years ago

@agentzh Thanks for your reply and https://github.com/openresty/lua-nginx-module/pull/1555 has been submitted. The problem within our internal module that will cause the process coredump. In the log and charset modules of Nginx, the merge_loc_conf function will change the value of main_conf, but the change is insignificant, so the test case is a little hard to cover. Do you have any good Suggestions?

agentzh commented 5 years ago

@wangfakang I suggest you devise a minimal fake nginx module to mimic the same problem in your internal module and make it part of our test suite. We already have some fake nginx modules in our existing test suite to cover such kind of things:

https://github.com/openresty/lua-nginx-module/tree/master/t/data

Thanks for your contribution!

wangfakang commented 5 years ago

Cool, that's a good idea.

wangfakang commented 5 years ago

@agentzh I have added the ngx_http_fake_merge_module module to reproduce the problem, and use the following configuration to reproduce the problem. Plz Review https://github.com/openresty/lua-nginx-module/pull/1555, thx.

Introduction

The ngx_http_fake_merge_module module adds a internal variable fake_var with a default value of 1.

Case 1 - expectations

http {

    #init_worker_by_lua ' 
    #    local a = 1
    #';

    server {
        listen 80;
        location /t {
            content_by_lua '
                ngx.say("fake_var = ", ngx.var.fake_var)
            ';
        }
    }
}   

curl localhost/t
result: fake_var = 1

Case 2 - un-expectations

http {

    init_worker_by_lua '
        local a = 1
    ';

    server {
        listen 80;
        location /t {
            content_by_lua '
                ngx.say("fake_var = ", ngx.var.fake_var)
            ';
        }
    }
}   

curl localhost/t
result: fake_var = 0

agentzh commented 5 years ago

@wangfakang Thanks!

@thibaultcha Will you please help take care of this? Many thanks!

wangfakang commented 5 years ago

@thibaultcha Please help to review it. Thx.

thibaultcha commented 5 years ago

Closed by #1555. Thanks for the catch!

agentzh commented 4 years ago

Cloning the main conf seems to make the issue worse since it's common for 3rd-party modules to use main conf to keep track of global state. Cloning the main conf would make the init worker context go out of sync with the request context (we should keep in mind that the init worker context may have a long life when it fires off recurring timers). So modifying main conf later is indeed expected by design and does not justify the cloning of main conf (but actually go against it). I guess there might be something deeper here. But definitely cloning main conf to make this particular use case pass is going the completely wrong direction.

agentzh commented 4 years ago

We should revert #1326 in master before the next release.

thibaultcha commented 4 years ago

Agreed; what sparked this conversation with @agentzh is a failing test that appeared with lua-resty-upstream-healthcheck after having merged this fix (#1326). Here is the failure in question: https://travis-ci.org/openresty/lua-resty-upstream-healthcheck/jobs/636662963.

This failure is due to the underlying lua-upstream-nginx-module relying on ngx_http_upstream_main_conf_t having its upstreams array member populated. However, the mock main_conf allocated by the fix in #1326 isn't properly initialized (since configuration parsing isn't triggered), and ngx_http_upstream_main_conf_t has exactly 0 upstreams in ngx_lua's init_worker phase.

In more details, this is what happens:

  1. In init_worker, the user calls hc.spawn_checker() (as per the module's usage)
  2. hc.spawn_checker() calls lua-upstream-nginx-module's upstream.get_primary_peers()
  3. upstream.get_primary_peers() calls ngx_http_lua_upstream_find_upstream() (https://github.com/openresty/lua-upstream-nginx-module/blob/master/src/ngx_http_lua_upstream_module.c#L263)
  4. ngx_http_lua_upstream_find_upstream() reads the mock main conf, and thus reports 0 upstreams (https://github.com/openresty/lua-upstream-nginx-module/blob/master/src/ngx_http_lua_upstream_module.c#L519)
  5. lua-resty-upstream-healthchecks tests break (there should be more than 0 upstreams)

In short, both approaches (with and without the fix in #1326) present some issues:

Here are a few fixes that come up to my mind:

  1. First and foremost, reverting #1326 since the behavior - while not ideal as per this issue - is at least already established, thus the next release would not be breaking anything.
  2. Reintroduce the fix, but ensure that the mock main_conf is properly initialized; two possibilities: 2a. Make a full copy of each module's main_conf (this would be tricky with NGINX memory pools) 2b. Re-parse the configuration file and call ngx_http_*_init_main_conf() for each mock main_conf, which would, in this example, ensure that ngx_http_upstream properly populates its umcf->upstreams array (I think this is absolutely out of question)
agentzh commented 4 years ago

@thibaultcha Thanks for the detailed analysis. I don't like the idea of making a new copy of the main conf at all. Since it should be unique by design. Keeping a copy might incur other out-of-sync problems (like a request handler populates a state change in main conf but the timer handler in the init worker context would never see it).

It is really a surprise to see that the merge loc conf handlers cannot run more than once. I think the best way is to make this thing safe to get called repeatedly. Is that possible?

agentzh commented 4 years ago

I suggest hold this thing until a new OpenResty release is made. It is nontrivial to fix.

thibaultcha commented 4 years ago

For now, I just reverted the fix in c2390ab.