Open jsierles opened 4 years ago
This appears to be happening due to the worker connection timing out. So now will look into passing a connection timeout there.
This turned out not to be the case. Setting a high timeout for both connections does not work. So I ended up proxying redis through an openresty stream block locally. However, this should not be necessary. How can I debug this issue further?
Can you post a minimal yet complete configuration to replicate the issue? Obviously if something this fundamental wasn't working we would know about it, so it's going to be a configuration detail most likely.
Specifically you're getting "connection refused" (not timed out), so I'd be looking into why OpenResty can't see your redis host. Are you using hostnames or literal IPs?
We've tried with direct IPs and hosts. It's definitely not a hostname issue since we use these same variables elsewhere in the config with resolver local=on
. If it weren't able to resolve, we'd see a different error (and have tried using invalid hostnames to test that).
Once we switch to the localhost proxy, it works fine. This is inside a Kubernetes cluster inside AWS EKS, if that matters.
user www-data;
# Automatically scale processes based on detected CPU count
worker_processes auto;
# Redis for ledge cache storage
env REDIS_HOST;
env RACK;
error_log stderr debug;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
access_log /dev/stdout;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
gzip on;
gzip_http_version 1.0;
gzip_comp_level 2;
gzip_proxied any;
gzip_vary off;
gzip_types text/plain text/css application/x-javascript text/xml application/xml application/rss+xml application/atom+xml text/javascript application/javascript application/json text/mathml;
gzip_min_length 1000;
gzip_disable "MSIE [1-6]\.";
server_names_hash_bucket_size 64;
types_hash_max_size 2048;
types_hash_bucket_size 64;
client_max_body_size 250m;
lua_shared_dict my_locks 100k;
lua_package_path "/etc/nginx/conf.d/?.lua;./lua/?.lua;$prefix/conf/?.lua;$prefix/conf.d/?.lua;/usr/local/lib/lua/ledge/?.lua;/usr/local/openresty/site/lualib/?.lua;/usr/local/openresty/site/lualib/resty/?.lua;/usr/local/openresty/site/lualib/resty/qless/?.lua;;";
resolver local=on ipv6=off;
resolver_timeout 5s;
if_modified_since Off;
lua_check_client_abort On;
init_by_lua_block {
local ledge = require "ledge"
local upstream_host = "web.rails." .. os.getenv("RACK") .. ".local"
local redis_host = os.getenv("REDIS_HOST")
ledge.configure({
redis_connector_params = {
url = "redis://".. redis_host .. ":6379",
connect_timeout = 1000
}
})
ledge.set_handler_defaults({
upstream_host = upstream_host,
upstream_port = 80
})
}
init_worker_by_lua_block {
require("ledge").create_worker():run()
}
server {
listen 80;
location / {
content_by_lua_block {
local handler = require("ledge").create_handler()
handler:run()
}
}
}
And you're getting connection refused
only on the background worker connections, not the in-flight ones?
On both connections. While you can't really tell from the logs - even removing the background worker leads to this error.
Here's the config we use for proxying to Redis. This works, but of course is an extra step we'd love to avoid.
stream {
resolver local=on ipv6=off;
resolver_timeout 5s;
lua_add_variable $redis_host;
preread_by_lua_block { ngx.var.redis_host = os.getenv("REDIS_HOST") }
server {
listen 6379;
proxy_pass $redis_host:6379;
}
}
Here's the config we use for proxying to Redis. This works, but of course is an extra step we'd love to avoid.
Yeah, it really shouldn't be necessary.
Are all connections failing, or is it in any way intermittent over time?
Nothing is jumping out at me from your config. But remember, there's no magic here. In the end, whatever you are specifying for host and port end up in tcpsock:connect.
Can you try a super minimal content_by_lua_block
that connects to your host and port manually?
Then next layer up is lua-resty-redis-connector, which again you could do a quick manual experiment with your config to see where it's failing.
Bottom line, if your connection is being refused, it's because tcpsock:connect(host, port)
is returning nil
and an error string.
Fair enough - we will test that. Meanwhile, while connecting through the proxy, by default I believe keepalive is not supported so we may be creating more connections than we should, and are seeing lua redis connect timeouts there as well. Any tips on improving that situation?
I can reproduce this as well, either by using the inline stream
method or using a local proxy using haproxy
. This is the only way to get it working on a Docker Compose network or in Kubernetes. In Kubernetes even the upstream host's have to be FQDN's, on Docker Compose it lets me get away with the short host names. There are definitely DNS issues and there seems to be a difference in name resolution between the settings for resolving the upstream hosts and the Redis host.
With the following config, I consistently see a
Connection refused
error:If I point CACHE_URL at localhost Redis, it works. I've tried higher timeout settings. I am able to connect to redis instantly from the command line with redis-cli.