kinvolk / lokomotive

🪦 DISCONTINUED Further Lokomotive development has been discontinued. Lokomotive is a 100% open-source, easy to use and secure Kubernetes distribution from the volks at Kinvolk
https://kinvolk.io/lokomotive-kubernetes/
Apache License 2.0
320 stars 49 forks source link

AWS: monitoring does not seem to work / be usable out of the box #350

Closed schu closed 4 years ago

schu commented 4 years ago

lokoctl version v0.1.0-169-gef479dc9.

I installed the following components to setup monitoring:

component "metrics-server" {}

component "openebs-operator" {}

component "openebs-storage-class" {}

component "contour" {}

component "prometheus-operator" {
  grafana_admin_password = var.grafana_admin_password
}

component "cert-manager" {
  email = var.cert_manager_email
}

While that appears to work (all pods in namespace monitoring are running), the existing documentation doesn't help with how to actually make use of it.

In order to be able to access the Grafana UI, I added an Ingress object:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: grafana
  namespace: monitoring
  annotations:
    kubernetes.io/tls-acme: "true"
    cert-manager.io/cluster-issuer: "letsencrypt-production"
    kubernetes.io/ingress.class: contour
spec:
  tls:
  - secretName: monitoring.example.com-tls
    hosts:
    - monitoring.example.com
  rules:
  - host: monitoring.example.com
    http:
      paths:
      - backend:
          serviceName: prometheus-operator-grafana
          servicePort: 80

The default dashboards don't seem to work:

grafana-1

It looks like the Prometheus server cannot be reached (pressing the "Test" button in the data source settings results in a timeout "upstream request timeout").

What I was looking for:

schu commented 4 years ago

I installed the following components to setup monitoring ... While that appears to work (all pods in namespace monitoring are running)

Actually I overlooked both prometheus-prometheus-operator-prometheus-0 and alertmanager-prometheus-operator-alertmanager-0 stuck pending due to:

<unknown>   Warning   FailedScheduling       pod/alertmanager-prometheus-operator-alertmanager-0                          running "VolumeBinding" filter plugin for pod "alertmanager-prometheus-operator-alertmanager-0": pod has unbound immediate PersistentVolumeClaims
4m7s        Normal    ExternalProvisioning   persistentvolumeclaim/data-alertmanager-prometheus-operator-alertmanager-0   waiting for a volume to be created, either by external provisioner "openebs.io/provisioner-iscsi" or manually created by system administrator
15m         Normal    Provisioning           persistentvolumeclaim/data-alertmanager-prometheus-operator-alertmanager-0   External provisioner is provisioning volume for claim "monitoring/data-alertmanager-prometheus-operator-alertmanager-0"
15m         Warning   ProvisioningFailed     persistentvolumeclaim/data-alertmanager-prometheus-operator-alertmanager-0   failed to provision volume with StorageClass "openebs-cstor-disk-replica-3": Internal Server Error: failed to create volume 'pvc-339c0c9f-0f1c-4c99-b652-6fe440e440ba': response: not enough pools available to create replicas
[...]
<unknown>   Warning   FailedScheduling       pod/prometheus-prometheus-operator-prometheus-0                              running "VolumeBinding" filter plugin for pod "prometheus-prometheus-operator-prometheus-0": pod has unbound immediate PersistentVolumeClaims
schu commented 4 years ago

I figured out that I need to explicitly define a storage class.

component "openebs-storage-class" {
  storage-class "openebs-default-sc" {
    replica_count = 1
    default       = true
  }
}
schu commented 4 years ago

Issue regarding storage: https://github.com/kinvolk/lokomotive/issues/351

surajssd commented 4 years ago