Closed ibuziuk closed 5 years ago
Should we have the root epic openshiftio/openshift.io#4730 under this repository?
@slemeur having user-story under openshift.io works just fine IMO (I personally do not think we should have it under rh-che since the status service would be a separate repo)
Does this service allow you to feed metrics or programmatic configuration changes to it? e.g. can you tell it to start monitoring a given route url that doesn't exist yet, and measure time until it does?
can you tell it to start monitoring a given route url that doesn't exist yet, and measure time until it does ?
@fche hmm.. why would you like to start monitoring non-existing route ? The main question we are currently having is if statuspage.io can support Prometheus format properly - https://github.com/redhat-developer/rh-che/issues/1237
why would you like to start monitoring non-existing route ?
Related to the other need to track openshift route-creation times. Notify service at oc api call start time, let it determine time taken for route to be actually accessible.
@fche AFAIK, it is planned to be done on che-server side and exposing via prometheus metric - https://github.com/eclipse/che/issues/12699
cc: @gorkem In other than the short term, does this sounds like the sort of tool we should provide for ourselves, as opposed to outsourcing it?
it is planned to be done on che-server side
OK, assuming it is in a position to reliably tell whether the routes are externally accessible. BTW, submitted this RFE for openshift to consider supplying this info itself: https://github.com/openshift/origin/issues/22107
In other than the short term, does this sounds like the sort of tool we should provide for ourselves, as opposed to outsourcing it?
@fche if we opt for a custom dsaas service the major question is, who will be the primary owner / maintainer ?
who will be the primary owner / maintainer
aye, there is the rub
But independent of that question, one can work out in greater detail just what info you'd like to see there.
@fche I believe most of the details are covered in the following user-story - https://github.com/openshiftio/openshift.io/issues/4730
What do you think the chances are that many or all of the datasets you are talking about could be rendered entirely as grafana (or perhaps pcp) dashboards? So, assume there is a queriable metric database nearby the rhche server. Assume it's been gathering the status/health metrics being discussed over at openshiftio/openshift.io#4730. Does the "system status" have to be anything other than a preconfigured dashboard - with some combination of graphical or textual forms we can generate?
What do you think the chances are that many or all of the datasets you are talking about could be rendered entirely as grafana (or perhaps pcp) dashboards?
I believe everything could be rendered entirely via grafana, but the goal of statuspage is to make it user-friendly, easy to update, easy to notify users, easy to create incident, easy to scheduled maintenance etc. So, graphana and status page are two different beasts.
Could we think about it as the public status-page being downstream of our internal status dashboards & machinery? i.e., not tightly coupled to che, but rather to a hypothetical dev-console health dashboard?
IMO, che.openshift.io is a very special case not tightly related to the SaaS which deserves own status page
Understood, just trying to minimize number of bits of machinery and maximize reusability. Maybe think of it more like - a running copy of che should have its own health display for benefit of each of its users. Can the public dashboard be another consumer of that same data & maybe even some of the same renderings?
well, potentially it could, but ideally status page should be deployed separately from the monitored service - if the service is down, status page should be still up with the reported accident (if status page is part of the service itself it would be down together with the service during incident / scheduled maintenance)
Yup, kind of like a reliable mirror.
As a prototype, before we do a full proper operator / openshift4 / prometheus flavoured thing, we could perhaps layer a small piece of new code on top of the existing osd-monitor-poc pcp-based infrastructure, to relay metric threshold crossing events to statuspage.io. We'd need to know a sample metric name and threshold predicate, and statuspage.io api credentials.
@fche will you be able to give a hand with impl. push part in the next sprint (first we need to figure out which metrics are we going to push - hobby plan offers only 2 system metrics, so we need to be picky) ?
Can indeed help with a quick prototype, presuming building on the present osd-monitor-poc machinery, not major new stuff. It's about as complicated as adding a new outbound zabbix relay.
Sounds good, I will reach you once I would have more details about params for statuspage API
Closing this epic since https://che.statuspage.io/ is setup and we have a separate issue for contributing system metrics to statuspage (which is currently not a priority) - #1286
Currently there is no status page for
che.openshift.io
which would provide information about the state of the platform. There are many different online services that are providing information about the state of their platform:It was decided instead of creating custom dsaas service use account on https://www.statuspage.io/
sub-tasks:
Related openshift.io user-story - https://github.com/openshiftio/openshift.io/issues/4730