integr8ly / application-monitoring-operator

Operator for installing the Application Monitoring Stack on OpenShift (Prometheus, AlertManager, Grafana)
Apache License 2.0
30 stars 45 forks source link

Fix cluster/clean target & install script #76

Closed grdryn closed 4 years ago

grdryn commented 4 years ago

Locally in minishift I've found that when the AMO starts & reacts to the CR, errors are thrown about the prometheus kinds not being found. This appears to be because, well, they didn't exist when it started (they're created by the prometheus operator?), and the AMO can't find them in its cache probably.

I also noticed errors being ignored about the role/rolebinding being done in the wrong order, but execution continues anyway, so I've just explicitly put them in the right order, and also put in a simple check to make sure that the SA has the right access.

I also updated the cluster/clean target since that was broken in a couple of ways before (causing the namespace to be undeletable without figuring out what in there had a finalizer and cleaning it up). Also the cluster/prepare was referencing a file that didn't exist.

grdryn commented 4 years ago

There's still a problem even after this PR, on the master branch at least. The prometheus container that gets created has the following logs, then crashes:

level=warn ts=2019-08-13T14:29:01.106Z caller=main.go:282 deprecation_notice="'storage.tsdb.retention' flag is deprecated use 'storage.tsdb.retention.time' instead."
level=info ts=2019-08-13T14:29:01.106Z caller=main.go:329 msg="Starting Prometheus" version="(version=2.11.0, branch=master, revision=ca080f0ac97f903b2ea45419eb2fb370c9d6a467)"
level=info ts=2019-08-13T14:29:01.106Z caller=main.go:330 build_context="(go=go1.11.10, user=root@prometheus-build, date=20190715-09:36:16)"
level=info ts=2019-08-13T14:29:01.106Z caller=main.go:331 host_details="(Linux 3.10.0-957.21.3.el7.x86_64 #1 SMP Fri Jun 14 02:54:29 EDT 2019 x86_64 prometheus-application-monitoring-0 (none))"
level=info ts=2019-08-13T14:29:01.106Z caller=main.go:332 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-08-13T14:29:01.106Z caller=main.go:333 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2019-08-13T14:29:01.107Z caller=web.go:448 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2019-08-13T14:29:01.107Z caller=main.go:652 msg="Starting TSDB ..."
level=info ts=2019-08-13T14:29:01.110Z caller=main.go:521 msg="Stopping scrape discovery manager..."
level=info ts=2019-08-13T14:29:01.110Z caller=main.go:535 msg="Stopping notify discovery manager..."
level=info ts=2019-08-13T14:29:01.110Z caller=main.go:557 msg="Stopping scrape manager..."
level=info ts=2019-08-13T14:29:01.110Z caller=main.go:531 msg="Notify discovery manager stopped"
level=info ts=2019-08-13T14:29:01.110Z caller=main.go:517 msg="Scrape discovery manager stopped"
level=info ts=2019-08-13T14:29:01.110Z caller=manager.go:776 component="rule manager" msg="Stopping rule manager..."
level=info ts=2019-08-13T14:29:01.110Z caller=manager.go:782 component="rule manager" msg="Rule manager stopped"
level=info ts=2019-08-13T14:29:01.110Z caller=notifier.go:602 component=notifier msg="Stopping notification manager..."
level=info ts=2019-08-13T14:29:01.110Z caller=main.go:722 msg="Notifier manager stopped"
level=info ts=2019-08-13T14:29:01.110Z caller=main.go:551 msg="Scrape manager stopped"
level=error ts=2019-08-13T14:29:01.110Z caller=main.go:731 err="opening storage failed: list block dirs in \"/prometheus\": open /prometheus: permission denied"
davidkirwan commented 4 years ago

@grdryn pushed a slight change to the ordering in the script/install.sh

This appears to be working without issue now with:

make cluster/clean
make cluster/install

I'm no longer seeing any of the issues you were encountering above. Can you confirm you see the same behaviour ?

grdryn commented 4 years ago

@davidkirwan cool, thanks. I'll try it out and let you know! :+1:

grdryn commented 4 years ago

@davidkirwan I still see the same issue with the prometheus container in the prometheus pod that I mentioned in this comment. I don't think that has anything to do with your commit though, and that commit is fine, so feel free to leave it there :+1: