psyhomb / sensu-extension-flapjack

Flapjack Sensu extension (compatible with new sensu 0.25+ and flapjack v1.6 and v.2.0)
MIT License
4 stars 1 forks source link

Sensu crashes Flapjack Redis #2

Closed mdzidic closed 7 years ago

mdzidic commented 7 years ago

Hello,

My new Sensu 0.26 with this Flapjack extension crashing Flapjack Redis database, every 5-6 hours I find 2+ millions of events in queue which cause crash of Redis, my old setup with Sensu 0.22 and old Flapjack handler worked like a charm.

Do You have any suggetions?

psyhomb commented 7 years ago

Hey,

Could you please provide some more information, like configuration that you're using, version of Flapjack etc. Currently I'm using this same extension with Sensu 0.25 and Flapjack 1.6 and everything's working just fine.

Also please check Internal Statistics and Event queue length, it seems to me like issue on the Flapjack side, maybe you need to scale up, to increase number of flapjack-processor instances... It's been explained in the Performance part of this faq http://flapjack.io/docs/1.0/usage/faq/

The only job of this extension (bridge) is to create event and push one into the events queue and that's it, everything else from that point on is Flapjack's job.

mdzidic commented 7 years ago

Hey @psyhomb ,

I'm already watching Internal Statistics and everything seems okay for hours, Redis receive ~500-1000 events per second and Flapjack process them just fine, but at once it doesn't process any events and and Redis fill queue with over 2 million of events...

I understand that Flapjack may need to scale up, but what bothering me is coincidence that everything worked just fine with Sensu 0.22 and old Flapjack handler...

image

psyhomb commented 7 years ago

Hey @mdzidic,

Like I stated in the previous post this is most definitely issue on the Flapjack/Redis side but I can't tell you more unless you provide additional info.

From your explanation this looks like repeating process which means you can easily track and collect data during that time frame, have you checked flapjack-processor log, what does it say, is everything ok with Redis connection, have you checked redis-cli info?

mdzidic commented 7 years ago

Current Redis info:

127.0.0.1:6380> info
# Server
redis_version:2.8.19
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:c0359e7aa3798aa2
redis_mode:standalone
os:Linux 4.7.0-x86_64-linode72 x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.8.3
process_id:29
run_id:d6980c3ab45c4ed40c1eebcc9b5f2defc7589774
tcp_port:6380
uptime_in_seconds:12595
uptime_in_days:0
hz:10
lru_clock:16567818
config_file:/etc/redis.conf

# Clients
connected_clients:13
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:4

# Memory
used_memory:1970610392
used_memory_human:1.84G
used_memory_rss:2001489920
used_memory_peak:3201525472
used_memory_peak_human:2.98G
used_memory_lua:33792
mem_fragmentation_ratio:1.02
mem_allocator:jemalloc-3.6.0

# Persistence
loading:0
rdb_changes_since_last_save:167429
rdb_bgsave_in_progress:1
rdb_last_save_time:1476185540
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:10
rdb_current_bgsave_time_sec:9
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok

# Stats
total_connections_received:272797
total_commands_processed:48009579
instantaneous_ops_per_sec:23
total_net_input_bytes:6125033759
total_net_output_bytes:1980541675
instantaneous_input_kbps:2.51
instantaneous_output_kbps:0.75
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:6
evicted_keys:0
keyspace_hits:14238073
keyspace_misses:3933222
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:70648

# Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:586.03
used_cpu_user:596.14
used_cpu_sys_children:124.71
used_cpu_user_children:813.55

# Keyspace
db0:keys=139553,expires=6389,avg_ttl=821341516
psyhomb commented 7 years ago

At first glance it looks to me like you have a bit higher memory usage on Redis, how many gigs of RAM do you have on that host? You should check all that but during event surge, also you should check what kind of events are flooding events queue: redis-cli -n 0 lrange events -10 -1 and ofc flapjack-processor log

mdzidic commented 7 years ago

@psyhomb I have 4GB of RAM and 2x CPU cores...

For now I've updated Redis to latest version 3.2.4 and I'll start few additional Flapjack workers, I'll update You if this helps :)

psyhomb commented 7 years ago

Just a heads up, like I mentioned before this issue definitely isn't related to flapjack sensu extension... I'm closing it