openresty / memc-nginx-module

An extended version of the standard memcached module that supports set, add, delete, and many more memcached commands.
http://wiki.nginx.org/NginxHttpMemcModule
213 stars 56 forks source link

Any way to make nginx return empty_gif of result with "return ..;" instead of memcache result? #18

Open Profforgr opened 9 years ago

Profforgr commented 9 years ago

Hi.

Is there any way to make nginx return empty_gif or result with "return ...;" instead of returning result received from memcache?

If this read by "agentzh", i was inspired by your slides at http://agentzh.org/misc/slides/nginx-conf-scripting/nginx-conf-scripting.html .

location ~ /app/counter/(\d+)$ {
   empty_gif;
   #echo_exec /safe-memc?cmd=incr&key=id$1&val=1;
   echo_location_async /safe-memc?cmd=incr&key=id$1&val=1&exptime=0;
}

# (not quite) REST interface to our memcached server
location = /memc {
   internal;
   set $memc_cmd $arg_cmd;
   set $memc_key $arg_key;
   set $memc_value $arg_val;
   set $memc_exptime $arg_exptime;

   memc_pass memcached_upstream2;
}

# (not quite) REST interface to our memcached server
location = /safe-memc {
   internal;
   set $memc_cmd $arg_cmd;
   set $memc_key $arg_key;
   set $memc_value $arg_val;
   set $memc_exptime $arg_exptime;

   memc_pass memcached_upstream2;

   error_page 404 @safe-memc-add-and-retry;
}

location @safe-memc-add-and-retry {
   internal;
   echo_location /memc?cmd=add&key=$arg_key&val=0;
   echo_location /memc?$query_string;
}

I just basically use the example from http://agentzh.org/misc/slides/nginx-conf-scripting/nginx-conf-scripting.html#45 slightly modified to adopt for my case.

I want to do a simple counter here.

The only problem is that in result to /app/counter/1 and so on it return a number (the result of SET query to memcache). But i want it simply return empty_gif. I've tried various settings but can't get it working. It completely ignores empty_gif and return directivities. Can you give me an idea please?

If possible, it would be also to cool to know how to make it really async. Because now, when it returns result, it basically waits for it (otherwise it should not return anything). I want this operation to happen in background after connection close (or better request end, not sure does that really possible - to stay keepalive connection but end current request).

agentzh commented 9 years ago

@Profforgr It sounds trivial if you use the ngx_lua module with the (nonblocking) lua-resty-memcached library. BTW, I think it's much more efficient and simpler to use ngx_lua's shm-backed lua_shared_dictionary to store your counters instead of using memcached. See https://blog.cloudflare.com/pushing-nginx-to-its-limit-with-lua/ for an example at CloudFlare.

BTW, you're welcome to join the openresty-en mailing list to discuss further. Please see http://openresty.org/#Community Thank you!

Profforgr commented 9 years ago

It sounds trivial if you use the ngx_lua module with the (nonblocking) lua-resty-memcached library.

OK, i see you like lua a lot -) Well, i believe you always. I see lua is used in most highloaded environments. I can't fight anymore -) I decided to wrote that in lua...

I would love if you can comment my code and answer few noob questions.

location ~ ^/app/counter/(\d+)$ {
    more_clear_headers "Date" "Content-Type"; # disables CloudFlare gzip & less traffic

    set $id $1;

    access_by_lua '
        ngx.eof()
        -- ngx.sleep(10) -- sec

        -- simple abuse prevention (assuming we have no more than 100 million ids)
        if tonumber(ngx.var.id) > 100000000 then
            ngx.say("id should be less than 100 million")
            return
        end
        local memcached = require "resty.memcached"
        local memc, err = memcached:new()
        if not memc then
            ngx.say("failed to instantiate memc: ", err)
            return
        end

        memc:set_timeout(100000) -- 100 sec

        -- local ok, err = memc:connect("127.0.0.1", 11211)
        local ok, err = memc:connect("unix:/tmp/memcached.sock")
        if not ok then
            ngx.say("failed to connect: ", err)
            return
        end

        local key = ngx.var.lower_host .. "|post" .. ngx.var.id
        local ok, err = memc:incr(key, 1)
        if not ok then
            ngx.say("failed to incr " .. key, err)
            local ok, err = memc:add(key, "0")
            if not ok then
                ngx.say("failed to add " .. key, err)
            else
                ngx.say("add ok")
                local ok, err = memc:incr(key, 1)
                if not ok then
                    ngx.say("failed to incr " .. key .. " after add", err)
                    return
                else
                    ngx.say("incr ok after add")
                end
            end
        else
            ngx.say("incr" .. key .. "ok")
        end

        -- put it into the connection pool of size 100,
        -- with 10 seconds max idle timeout
        local ok, err = memc:set_keepalive(10000, 100)
        if not ok then
            ngx.say("cannot set keepalive: ", err)
            return
        end
    ';
}

My questions:

  1. Does this code fine in terms of performance and asyncronous/non-blocking?
  2. Should i compile the code and use access_by_lua_file instead? Will it improve performance/throuput?
  3. Should i use another _by_lua here? The point is to be "as early as possible" but to be able to execute functions which are used here.
  4. Does ngx.eof() the best way to do "post_action" (result to user will be returned without delay, then in background there will be request to memcached) or there are better alternative?
  5. Why does more than 1 keepalive connections preferred? This is to provide non-blocking behaviour while staying keepalive? What if all these keepalived connection will be blocked? So making 100 connections in pool we limit the persistent connection and other (more than 100) will be done without keepalive (so the number of concurrent requests = total number of connections)? Let's said we want to support 1 miliion concurrent requests. It means that we just need memcache to support this number of connections (-c 1000000), right?

BTW, I think it's much more efficient and simpler to use ngx_lua's shm-backed lua_shared_dictionary to store your counters instead of using memcached.

I thought about it and came to a conclusion that i'll not able to make external application extract the keys from lua_shared_dictinary without issues. I need external application to get keys from here periodically (once a minute). But ngx.shared.DICT.get_keys will block entire nginx, this is not acceptable. And ngx.shared.DICT.get for all the possible keys (~1 million and growing) i assume will be slower than memcached because of HTTP and some nginx logic. I can also make it without external application, using lua again, but it will depend on nginx status (nginx restart will destroy all). For now i can not live this way. I restart nginx rather oftenly. You most probably never fully restart nginx (or at least you do it safely when you need to update binary), so it's fine for you.

BTW, you're welcome to join the openresty-en mailing list to discuss further. Please see http://openresty.org/#Community Thank you!

I know, thanks. But i find google groups style not comfortable.

agentzh commented 9 years ago

@Profforgr Please don't use github issues for such general lengthy discussions, which makes me uncomfortable.

Regarding the mailing lists, they are mailing lists. You can use emails to read, post and reply in your own favourite mail client (be it gmail, thunderbird, outlook, or whatever). See https://openresty.org/#Community

agentzh commented 9 years ago

Hello!

On Fri, Jun 19, 2015 at 7:11 AM, Profforgr wrote:

location ~ ^/app/counter/(\d+)$ { default_type image/gif;

set $id $1;

access_by_lua '
    ngx.eof()

To quote the related documentation for ngx.eof:

"When you disable the HTTP 1.1 keep-alive feature for your downstream connections, you can rely on descent HTTP clients to close the connection actively for you when you call this method. This trick can be used do back-ground jobs without letting the HTTP clients to wait on the connection, as in the following example: ... A better way to do background jobs is to use the ngx.timer.at API."

https://github.com/openresty/lua-nginx-module#ngxeof

Pay special attention to the last sentence if you do need HTTP 1.1 keep-alive and HTTP 1.1 pipelining.

    memc:set_timeout(100000) -- 100 sec

Use of such long timeout may accumulate pending handlers and open connecitons very quickly when the route to your memcached always drops packets. This hurts reliability.

Does this code fine in terms of performance and asyncronous/non-blocking?

Everything is nonblocking if you use ngx_lua's Lua API and Lua libraries specifically designed atop this API.

Should i compile the code and use access_by_lua_file instead? Will it improve performance/throuput?

You already use access_by_lua in the code snippet above?

Use of content_by_lua has no measurable difference as compared to access_by_lua. And the former is recommended otherwise you need to use ngx.exit(code) where code >= 200 in earlier handlers (like access_by_lua) to ensure short-circuiting the request handler.

Should i use another _by_lua here? The point is to be "as early as possible" but to be able to execute functions which are used here.

content_by_lua is recommended.

Does ngx.eof() the best way to do "post_action" (result to user will be returned without delay, then in background there will be request to memcached) or there are better alternative?

post_action is buggy and that's why it is undocumented. One should always avoid it in production.

Why does more than 1 keepalive connections preferred?

For concurrent client requests. If the idle connections in the pool are already exhausted, nginx has to create new connections to the backend.

This is to provide non-blocking behaviour while staying keepalive?

Keep-alive and nonblocking are two separate unrelated things. Nonblocking is a must-have in the nginx context while keep-alive is optional, depending on your use cases.

What if all these keepalived connection will be blocked?

I'd assume you mean all the idle connections in the pool are taken and get in use. Then in that case, new backend requests will result in short connections. The connection pool limit does not limit backend concurrency; it only limits the idle connections in the pool. So you need ngx_limit_conns or other things to limit downstream concurrency instead for DDoS protection. This may change in the near future.

So making 100 connections in pool we limit the persistent connection and other (more than 100) will be done without keepalive (so the number of concurrent requests = total number of connections)?

Yes.

Let's said we want to support 1 miliion concurrent requests. It means that we just need memcache to support this number of connections (-c 1000000), right?

I don't think a single memcached instance can support this many concurrency on common hardware. But, yes. You can queue up exceeded concurrent requests in Lua yourself so that you don't have to equate backend concurrency level to downstream concurrency level. ngx_lua's built-in cosocket pool will provide this queueing feature in the near future.

BTW, I think it's much more efficient and simpler to use ngx_lua's shm-backed lua_shared_dictionary to store your counters instead of using memcached.

I thought about it and came to a conclusion that i'll not able to make external application extract the keys from lua_shared_dictinary without issues. I need external application to get keys from here periodically (once a minute). But ngx.shared.DICT.get_keys will block entire nginx, this is not acceptable. And ngx.shared.DICT.get for all the possible keys (~1 million and growing) i assume will be slower than memcached because of HTTP and some nginx logic.

Maybe you can store key names to a single value yourself under a special key named "keys" in shdict. But yeah, 1 million may be too many for this.

I can also make it without external application, using lua again, but it will depend on nginx status (nginx restart will destroy all). For now i can not live this way. I restart nginx rather oftenly. You most probably never fully restart nginx (or at least you do it safely when you need to update binary), so it's fine for you.

Another option is to use Redis, which supports persistency. And you can also utilize lua-resty-redis's pipeline API to avoid sending a backend request to a socket upon every downstream request handled by nginx by "batching" the backend queries.

BTW, this discussion is already OT for this memc-nginx-module project.

agentzh commented 9 years ago

@Profforgr Oh, forgot to mention that you cannot call ngx.say after the ngx.eof call, which just does not make sense. I do understand you copy&paste the sample config from lua-resty-memcached's documentation, but it's apparently for a different context.