Closed pezholio closed 5 years ago
Have you read through https://github.com/mperham/sidekiq/wiki/Using-Redis and checked the AWS monitoring / stats for your Redis? If the workers can't get a response from Redis, you need to verify its health.
Here's the output from redis-cli info
:
# Server
redis_version:4.0.10
redis_git_sha1:0
redis_git_dirty:0
redis_build_id:0
redis_mode:standalone
os:Amazon ElastiCache
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:0.0.0
process_id:1
run_id:c100a2c7acc54a18b2b64712dab7c9e3751cd02c
tcp_port:6379
uptime_in_seconds:18669098
uptime_in_days:216
hz:10
lru_clock:4756577
executable:-
config_file:-
# Clients
connected_clients:34
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:20
# Memory
used_memory:4809680
used_memory_human:4.59M
used_memory_rss:8261632
used_memory_rss_human:7.88M
used_memory_peak:13644120
used_memory_peak_human:13.01M
used_memory_peak_perc:35.25%
used_memory_overhead:4402604
used_memory_startup:3662152
used_memory_dataset:407076
used_memory_dataset_perc:35.47%
used_memory_lua:37888
used_memory_lua_human:37.00K
maxmemory:1248854016
maxmemory_human:1.16G
maxmemory_policy:volatile-lru
mem_fragmentation_ratio:1.72
mem_allocator:jemalloc-4.0.3
active_defrag_running:0
lazyfree_pending_objects:0
# Persistence
loading:0
rdb_changes_since_last_save:77628914
rdb_bgsave_in_progress:0
rdb_last_save_time:1529591351
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
# Stats
total_connections_received:8732
total_commands_processed:343489932
instantaneous_ops_per_sec:10
total_net_input_bytes:44065304161
total_net_output_bytes:40742361864
instantaneous_input_kbps:3.67
instantaneous_output_kbps:0.05
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:652
expired_stale_perc:0.00
expired_time_cap_reached_count:0
evicted_keys:0
keyspace_hits:15539604
keyspace_misses:5502167
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
# Replication
role:master
connected_slaves:0
master_replid:96590f99c4f1d269ccd3e82f3ab52c57cb45cfe9
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
# CPU
used_cpu_sys:13074.49
used_cpu_user:14656.29
used_cpu_sys_children:0.00
used_cpu_user_children:0.00
# SSL
ssl_enabled:no
ssl_connections_to_previous_certificate:0
ssl_connections_to_current_certificate:0
ssl_current_certificate_not_before_date:(null)
ssl_current_certificate_not_after_date:(null)
ssl_current_certificate_serial:0
# Cluster
cluster_enabled:0
# Keyspace
db0:keys=332,expires=258,avg_ttl=151559734577
And the latency seems good:
min: 1, max: 12, avg: 1.25 (5198 samples)
We've got 34 connected clients and 20 blocked clients, and it's claiming there are no rejected connections, which is odd.
Aside from using an unsafe maxmemory_policy, I don't see any issues.
My guess is you got unlucky and got an unstable Redis instance or bum networking somehow. Nothing else is out of the ordinary.
We're using Sidekiq for a semi-mature project and have been happily using it for a while now, without any issues. We use AWS Elasticache for Redis and have just bumped our worker memory allocation from 512mb to 1028mb, and we're now noticing the Redis connections randomly failing. Is there anything that we could be doing wrong?
Ruby version: 2.4.1 Sidekiq / Pro / Enterprise version(s): Sidekiq 5.2.1
Initializer is here:
Sidekiq.yml is here:
Error messages: