sensu / sensu-go

Simple. Scalable. Multi-cloud monitoring.
https://sensu.io
MIT License
1.03k stars 174 forks source link

sensu backend does not provide api port 8080 anymore #4201

Closed runningman84 closed 2 years ago

runningman84 commented 3 years ago

Expected Behavior

The sensu backend should run fine for days...

Current Behavior

After a few days of operation, the sensu backend does not expose the api on port 8080 anymore. The logs look like this: https://pastebin.com/FjQ6Psc2

Possible Solution

Steps to Reproduce (for bugs)

  1. setup sensu in k8s
  2. connect external etcd store
  3. publish a lot of events

Context

We have an existing sensu classic env with a lot of clients and we try to migrate them to sensu go oss.

Your Environment

sensu-discourse commented 3 years ago

This issue has been mentioned on Sensu Community. There might be relevant details there:

https://discourse.sensu.io/t/sensu-classic-vs-sensu-go-scalability/2438/1

portertech commented 3 years ago

It seems etcd was overwhelmed with the load put onto the cluster, putting the cluster into an unrecoverable crash-loop. The postgres store is a feature designed to address the scaling issues around etcd, perhaps spin up a postgres instance and try to reproduce the issue once again?

runningman84 commented 3 years ago

Does the oss version support postgres datastore at all? The etcd cluster running in k8s did not throw any error.

I do not think that the sensu-go free version with the 100 entity limit would suffer from any problem...

portertech commented 3 years ago

An OSS build does not contain the Postgres store.

acrawly commented 3 years ago

An OSS build does not contain the Postgres store.

We tried scaling etcd to 7 nodes (using the embedded etcd install) and this issue still persisted with 12k events / 600 entities.

echlebek commented 2 years ago

We recently found and fixed an issue where Sensu was crashing and deadlocking on its way there. That issue has now been patched, so overloaded Sensu instances should crash properly instead of deadlocking.

echlebek commented 2 years ago

For context, here is the papertrail https://github.com/sensu/sensu-go/issues/4461