splunk / splunk-operator

Splunk Operator for Kubernetes
Other
209 stars 115 forks source link

Splunk Operator: no way to utilize pre-created EBS #1269

Open yaroslav-nakonechnikov opened 10 months ago

yaroslav-nakonechnikov commented 10 months ago

Please select the type of request

Feature Request

Tell us more

Describe the request At the moment storage is being created dynamically, which is not good in a lot of cases: no way to put correct tags, for example.

especially very useful for single nodes, like standalone or license manager.

Expected behavior there is a way to pass vol-id identifier. maybe as array(or map), as there maybe several replicas. and in that case field "replicas" may be omitted, if several ids passed.

yaroslav-nakonechnikov commented 8 months ago

this also leads to one of the issues with creating pvc's: no way to extend disks in case it is needed. You will be forced to recreate ebs.

if somebody will try to edit pvc, so there will be next error:

error: persistentvolumeclaims "pvc-var-splunk-prod-monitoring-console-0" could not be patched: persistentvolumeclaims "pvc-var-splunk-prod-monitoring-console-0" is forbidden: only dynamically provisioned pvc can be resized and the storageclass that provisions the pvc must support resize

storageclass:

$ kubectl get storageclass gp3 -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"name":"gp3"},"parameters":{"csi.storage.k8s.io/fstype":"ext4","encrypted":"true","kmsKeyId":"arn:aws:kms:eu-central-1:111111111111:alias/proj-prod","type":"gp3"},"provisioner":"ebs.csi.aws.com","volumeBindingMode":"WaitForFirstConsumer"}
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2023-12-05T17:40:23Z"
  name: gp3
  resourceVersion: "202093653"
  uid: 6cae9dd7-be8e-479a-9e5d-39c99948b502
parameters:
  csi.storage.k8s.io/fstype: ext4
  encrypted: "true"
  kmsKeyId: arn:aws:kms:eu-central-1:111111111111:alias/proj-prod
  type: gp3
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
vivekr-splunk commented 6 months ago

@yaroslav-nakonechnikov we will look into this issue and get back to you. have you tried recreating the pvc before , CR is created with the same names?

yaroslav-nakonechnikov commented 1 month ago

so, fianlly we can return back to that thing.

now i tried to create pvc:

resource "kubernetes_persistent_volume_claim" "site1" {
  count = 3
  metadata {
    name      = "splunkdb-site1-${count.index}"
    namespace = local.splunk_operator_namespace
  }
  spec {
    access_modes       = ["ReadWriteMany"]
    storage_class_name = "${local.cluster_name}-splunk-indexer"
    resources {
      requests = {
        storage = local.private_env ? "100Gi" : "11000Gi"
      }
    }
  }
}

and tried to specify for test purposes:

"persistentVolumeClaim" = {
          "claimName" = "splunkdb-site1"
        }
        "name" = "splunkdb"
        },

and as it is cluster of indexers and one IndexerCluster resource creates several indexers, so i expected that it will try to get pvcs as splunkdb-site1-0, splunkdb-site1-1, splunkdb-site1-2

but i see error: Warning FailedScheduling 47s (x3 over 52s) default-scheduler 0/12 nodes are available: persistentvolumeclaim "splunkdb-site1" not found. preemption: 0/12 nodes are available: 12 Preemption is not helpfulfor scheduling

it tries to get "splunkdb-site1", which is strange, because all 3 pods are getting same pvc. How it is possible to create cluster with additional disks?

yaroslav-nakonechnikov commented 1 month ago

same question raises with other options, like awsElasticBlockStore

yaroslav-nakonechnikov commented 1 month ago

csi also doesn't look that will help:

Warning  FailedMount             22s (x8 over 86s)   kubelet                  MountVolume.SetUp failed for volume "splunkdb" : kubernetes.io/csi: mounter.SetupAt failed to check volume lifecycle mode: volumemode "Ephemeral" not supported by driver ebs.csi.aws.com (only supports ["Persistent"])
yaroslav-nakonechnikov commented 1 month ago

@vivekr-splunk so, are there any options to provide setting for clusters? (indexers and searheads)

for single instance based roles (lm, mc, standalone) there is no issue.

yaroslav-nakonechnikov commented 1 month ago

so, we have discovered issue with SPLUNK_DB placed on /opt/splunk/var.

in some cases there is a big deployment raising, so var directory may be very fast overfilled. Which will cause pod issues.

adding support of pre-created EBS volumes will more flexability and moving splunk_db to separate volume.

reported: https://splunk.my.site.com/customer/5005a0000314mcdAAA

yaroslav-nakonechnikov commented 3 weeks ago

so, workaround atm is to use ephemeral type of volume:

{
          ephemeral = {
            volumeClaimTemplate = {
              spec = {
                accessModes      = ["ReadWriteOnce"]
                storageClassName = "gp3"
                resources = {
                  requests = {
                    storage = "300Gi"
                  }
                }
              }
            }
          }
          name = "var-run"
        }