strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.85k stars 1.3k forks source link

kafka cluster Persistent storage Permission denied #3847

Closed lanzhiwang closed 4 years ago

lanzhiwang commented 4 years ago

deploy kafka cluster

apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    version: 2.5.0
    replicas: 3
    jmxOptions: {}
    listeners:
      plain: {}
      tls: {}
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      log.message.format.version: '2.5'
    storage:
      type: persistent-claim
      size: 10Gi
      class: intceph
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 10Gi
      class: intceph
  entityOperator:
    topicOperator: {}
    userOperator: {}

apply this yaml

$ kubectl -n kafka get pods
NAME                                               READY   STATUS             RESTARTS   AGE
my-cluster-zookeeper-0                             0/1     CrashLoopBackOff   6          7m10s
my-cluster-zookeeper-1                             0/1     CrashLoopBackOff   6          7m10s
my-cluster-zookeeper-2                             0/1     CrashLoopBackOff   6          7m9s
strimzi-cluster-operator-v0.18.0-5586648b4-hh5rt   1/1     Running            0          5h35m

$ kubectl -n kafka logs my-cluster-zookeeper-0
Detected Zookeeper ID 1
mkdir: cannot create directory '/var/lib/zookeeper/data': Permission denied
/opt/kafka/zookeeper_run.sh: line 26: /var/lib/zookeeper/data/myid: No such file or directory
Preparing truststore
Adding /opt/kafka/cluster-ca-certs/ca.crt to truststore /tmp/zookeeper/cluster.truststore.p12 with alias ca
Certificate was added to keystore
Preparing truststore is complete
Looking for the right CA
Found the right CA: /opt/kafka/cluster-ca-certs/ca.crt
Preparing keystore for client and quorum listeners
Preparing keystore for client and quorum listeners is complete
Starting Zookeeper with configuration:
# The directory where the snapshot is stored.
dataDir=/var/lib/zookeeper/data

# Other options
4lw.commands.whitelist=*
standaloneEnabled=false
reconfigEnabled=true
clientPort=12181
clientPortAddress=127.0.0.1

# TLS options
serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
ssl.clientAuth=need
ssl.quorum.clientAuth=need
secureClientPort=2181
sslQuorum=true

ssl.trustStore.location=/tmp/zookeeper/cluster.truststore.p12
ssl.trustStore.password=frAAnZ8AjaGlhuaYOtQwOAEvVhaAzSJb
ssl.trustStore.type=PKCS12
ssl.quorum.trustStore.location=/tmp/zookeeper/cluster.truststore.p12
ssl.quorum.trustStore.password=frAAnZ8AjaGlhuaYOtQwOAEvVhaAzSJb
ssl.quorum.trustStore.type=PKCS12

ssl.keyStore.location=/tmp/zookeeper/cluster.keystore.p12
ssl.keyStore.password=frAAnZ8AjaGlhuaYOtQwOAEvVhaAzSJb
ssl.keyStore.type=PKCS12
ssl.quorum.keyStore.location=/tmp/zookeeper/cluster.keystore.p12
ssl.quorum.keyStore.password=frAAnZ8AjaGlhuaYOtQwOAEvVhaAzSJb
ssl.quorum.keyStore.type=PKCS12

# Provided configuration
tickTime=2000
initLimit=5
syncLimit=2
autopurge.purgeInterval=1

# Zookeeper nodes configuration
server.1=my-cluster-zookeeper-0.my-cluster-zookeeper-nodes.kafka.svc:2888:3888:participant;127.0.0.1:12181
server.2=my-cluster-zookeeper-1.my-cluster-zookeeper-nodes.kafka.svc:2888:3888:participant;127.0.0.1:12181
server.3=my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.kafka.svc:2888:3888:participant;127.0.0.1:12181

mkdir: cannot create directory '/var/lib/zookeeper/logs': Permission denied
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
2020-10-20 13:15:13,152 INFO Reading configuration from: /tmp/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig) [main]
2020-10-20 13:15:13,157 INFO clientPortAddress is 127.0.0.1:12181 (org.apache.zookeeper.server.quorum.QuorumPeerConfig) [main]
2020-10-20 13:15:13,157 INFO secureClientPortAddress is 0.0.0.0:2181 (org.apache.zookeeper.server.quorum.QuorumPeerConfig) [main]
2020-10-20 13:15:13,159 INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util) [main]
2020-10-20 13:15:13,170 ERROR Invalid config, exiting abnormally (org.apache.zookeeper.server.quorum.QuorumPeerMain) [main]
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing /tmp/zookeeper.properties
    at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:156)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:113)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
Caused by: java.lang.IllegalArgumentException: myid file is missing
    at org.apache.zookeeper.server.quorum.QuorumPeerConfig.checkValidity(QuorumPeerConfig.java:736)
    at org.apache.zookeeper.server.quorum.QuorumPeerConfig.setupQuorumPeerConfig(QuorumPeerConfig.java:607)
    at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parseProperties(QuorumPeerConfig.java:422)
    at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:152)
    ... 2 more
Invalid config, exiting abnormally
$
$ kubectl -n kafka get pvc
NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-my-cluster-zookeeper-0   Bound    pvc-b760837b-d1c7-4fc4-961b-d91010aad8f7   10Gi       RWO            intceph        8m43s
data-my-cluster-zookeeper-1   Bound    pvc-6e2ee86f-2424-4d4d-a076-7b99f3fe99ce   10Gi       RWO            intceph        8m43s
data-my-cluster-zookeeper-2   Bound    pvc-45941124-0716-4440-919e-28b5c8f72bbc   10Gi       RWO            intceph        8m43s
$
$ kubectl -n kafka describe pvc data-my-cluster-zookeeper-0
Name:          data-my-cluster-zookeeper-0
Namespace:     kafka
StorageClass:  intceph
Status:        Bound
Volume:        pvc-b760837b-d1c7-4fc4-961b-d91010aad8f7
Labels:        app.kubernetes.io/instance=my-cluster
               app.kubernetes.io/managed-by=strimzi-cluster-operator
               app.kubernetes.io/name=zookeeper
               app.kubernetes.io/part-of=strimzi-my-cluster
               strimzi.io/cluster=my-cluster
               strimzi.io/kind=Kafka
               strimzi.io/name=my-cluster-zookeeper
Annotations:   pv.kubernetes.io/bind-completed: yes
               strimzi.io/delete-claim: false
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      10Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    my-cluster-zookeeper-0
Events:
  Type    Reason                 Age                    From                                                                                      Message
  ----    ------                 ----                   ----                                                                                      -------
  Normal  ExternalProvisioning   9m11s (x2 over 9m11s)  persistentvolume-controller                                                               waiting for a volume to be created, either by external provisioner "ceph.com/cephfs" or manually created by system administrator
  Normal  Provisioning           9m11s                  ceph.com/cephfs_cephfs-provisioner-7478b8658c-l9zzh_8212d4b2-b65e-4c5d-99b1-8a6d44193bee  External provisioner is provisioning volume for claim "kafka/data-my-cluster-zookeeper-0"
  Normal  ProvisioningSucceeded  9m9s                   ceph.com/cephfs_cephfs-provisioner-7478b8658c-l9zzh_8212d4b2-b65e-4c5d-99b1-8a6d44193bee  Successfully provisioned volume pvc-b760837b-d1c7-4fc4-961b-d91010aad8f7
$
$ kubectl describe pv pvc-b760837b-d1c7-4fc4-961b-d91010aad8f7
Name:            pvc-b760837b-d1c7-4fc4-961b-d91010aad8f7
Labels:          <none>
Annotations:     cephFSProvisionerIdentity: cephfs-provisioner-1
                 cephShare: kubernetes-dynamic-pvc-5ce47e49-81b5-476e-aec1-0be526a7932a
                 pv.kubernetes.io/provisioned-by: ceph.com/cephfs
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    intceph
Status:          Bound
Claim:           kafka/data-my-cluster-zookeeper-0
Reclaim Policy:  Delete
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        10Gi
Node Affinity:   <none>
Message:
Source:
    Type:        CephFS (a CephFS mount on the host that shares a pod's lifetime)
    Monitors:    [192.168.16.172:6789 192.168.16.173:6790 192.168.16.174:6791]
    Path:        /kubernetes/kubernetes/kubernetes/kubernetes-dynamic-pvc-5ce47e49-81b5-476e-aec1-0be526a7932a
    User:        kubernetes-dynamic-user-9a9d9797-0c87-4c9f-8740-93e8273d2912
    SecretFile:
    SecretRef:   &SecretReference{Name:ceph-kubernetes-dynamic-user-9a9d9797-0c87-4c9f-8740-93e8273d2912-secret,Namespace:cpaas-system,}
    ReadOnly:    false
Events:          <none>
$

additional, I test sc intceph, it work ok

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: kafka-pvc
spec:
  resources:
    requests:
      storage: 10Gi
  accessModes:
  - ReadWriteOnce
  storageClassName: intceph

---

apiVersion: v1
kind: Pod
metadata:
  name: fortune
spec:
  containers:
  - image: nginx
    name: web-server
    volumeMounts:
    - name: html
      mountPath: /usr/share/nginx/html
    ports:
    - containerPort: 80
      protocol: TCP
  volumes:
  - name: html
    persistentVolumeClaim:
      claimName: kafka-pvc

What is the cause of this problem? How should I debug?

scholzj commented 4 years ago

You will need to configure the SecurityContext to match whatever the ownership of the storage is or the other way around. So that Zookeeepr and Kafka can write to the storage. The Security Context can be set using https://strimzi.io/docs/operators/latest/full/using.html#type-PodTemplate-reference

Also, please keep in mind that block storage should be used. So you should probably double check that your intceph storage class does block storage.

lanzhiwang commented 4 years ago

I configure SecurityContext

...
    template:
      pod:
        securityContext:
          runAsUser: 0
          fsGroup: 0
...

kafka operator create pvc and kafka cluster is normal,it can producer and consumer message.

$ kubectl -n kafka get pvc
NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-my-cluster-kafka-0       Bound    pvc-44fad51e-4d0c-46e7-a2be-7af511d2a989   10Gi       RWO            intceph        3h24m
data-my-cluster-kafka-1       Bound    pvc-6aa820da-de73-4614-93e1-ec012eab100e   10Gi       RWO            intceph        3h24m
data-my-cluster-kafka-2       Bound    pvc-433efcf6-3815-4d5f-8170-f535a528e275   10Gi       RWO            intceph        3h24m
data-my-cluster-zookeeper-0   Bound    pvc-16224431-b719-4b46-a6f0-c72f12b1f56a   10Gi       RWO            intceph        3h34m
data-my-cluster-zookeeper-1   Bound    pvc-d6ca1589-50fe-41c1-9c4b-b7cb03897eb6   10Gi       RWO            intceph        3h34m
data-my-cluster-zookeeper-2   Bound    pvc-2e5dd6bd-ae1b-429b-89a2-ea2714a1d4fe   10Gi       RWO            intceph        3h34m
$

I delete kafka cluster, and create new kafka cluster whit some configure, new cluster use some pvc. but I can't consumer old message. How should I verify data persistent stotage?

scholzj commented 4 years ago

I think you should first describe what exactly are you doing. For example how do you delete the cluster. Depending on that you might for example need to follow this guide to recover the cluster just from the volumes: https://strimzi.io/docs/operators/latest/full/using.html#cluster-recovery_str

lanzhiwang commented 4 years ago

I depoy kafka cluster、kafkatopic and kafkauser

apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    version: 2.5.0
    replicas: 3
    jmxOptions: {}
    listeners:
      plain: {}
      tls: {}
      external:
        type: nodeport
        tls: true
        authentication:
          type: tls
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      log.message.format.version: '2.5'
    storage:
      type: persistent-claim
      size: 10Gi
      class: intceph
      deleteClaim: true
    template:
      pod:
        securityContext:
          runAsUser: 0
          fsGroup: 0
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 10Gi
      class: intceph
      deleteClaim: true
    template:
      pod:
        securityContext:
          runAsUser: 0
          fsGroup: 0
  entityOperator:
    topicOperator: {}
    userOperator: {}

---

apiVersion: kafka.strimzi.io/v1beta1
kind: KafkaUser
metadata:
  name: my-user
  labels:
    strimzi.io/cluster: my-cluster
spec:
  authentication:
    type: tls
  authorization:
    type: simple
    acls:
      - resource:
          type: topic
          name: my-topic
          patternType: literal
        operation: Read
        host: '*'
      - resource:
          type: topic
          name: my-topic
          patternType: literal
        operation: Describe
        host: '*'
      - resource:
          type: group
          name: my-group
          patternType: literal
        operation: Read
        host: '*'
      - resource:
          type: topic
          name: my-topic
          patternType: literal
        operation: Write
        host: '*'
      - resource:
          type: topic
          name: my-topic
          patternType: literal
        operation: Create
        host: '*'
      - resource:
          type: topic
          name: my-topic
          patternType: literal
        operation: Describe
        host: '*'

---

apiVersion: kafka.strimzi.io/v1beta1
kind: KafkaTopic
metadata:
  name: my-topic
  labels:
    strimzi.io/cluster: my-cluster
spec:
  partitions: 3
  replicas: 3
  config:
    retention.ms: 604800000
    segment.bytes: 1073741824

when all resources are created,it work ok

$ kubectl -n kafka get pvc -o wide
NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE     VOLUMEMODE
data-my-cluster-kafka-0       Bound    pvc-2e59d519-7b60-41c5-8f7d-364100a14eaf   10Gi       RWO            intceph        2m21s   Filesystem
data-my-cluster-kafka-1       Bound    pvc-a9849088-1acd-48e3-bdcf-1c0240a861b7   10Gi       RWO            intceph        2m21s   Filesystem
data-my-cluster-kafka-2       Bound    pvc-e60fb453-a0ae-443b-9471-19cfea46c133   10Gi       RWO            intceph        2m21s   Filesystem
data-my-cluster-zookeeper-0   Bound    pvc-2d8edad7-cef1-46e0-99e0-24e4c2fbcfbb   10Gi       RWO            intceph        3m15s   Filesystem
data-my-cluster-zookeeper-1   Bound    pvc-b58c074a-3ebd-4749-b8ab-6966276af34b   10Gi       RWO            intceph        3m15s   Filesystem
data-my-cluster-zookeeper-2   Bound    pvc-415cb469-1de5-4ca5-9622-8f251b6ce1fc   10Gi       RWO            intceph        3m15s   Filesystem
$
$ kubectl get pv pvc-2e59d519-7b60-41c5-8f7d-364100a14eaf pvc-a9849088-1acd-48e3-bdcf-1c0240a861b7 pvc-e60fb453-a0ae-443b-9471-19cfea46c133 pvc-2d8edad7-cef1-46e0-99e0-24e4c2fbcfbb pvc-b58c074a-3ebd-4749-b8ab-6966276af34b pvc-415cb469-1de5-4ca5-9622-8f251b6ce1fc -o wide
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                               STORAGECLASS   REASON   AGE     VOLUMEMODE
pvc-2e59d519-7b60-41c5-8f7d-364100a14eaf   10Gi       RWO            Delete           Bound    kafka/data-my-cluster-kafka-0       intceph                 5m      Filesystem
pvc-a9849088-1acd-48e3-bdcf-1c0240a861b7   10Gi       RWO            Delete           Bound    kafka/data-my-cluster-kafka-1       intceph                 5m      Filesystem
pvc-e60fb453-a0ae-443b-9471-19cfea46c133   10Gi       RWO            Delete           Bound    kafka/data-my-cluster-kafka-2       intceph                 5m      Filesystem
pvc-2d8edad7-cef1-46e0-99e0-24e4c2fbcfbb   10Gi       RWO            Delete           Bound    kafka/data-my-cluster-zookeeper-0   intceph                 5m54s   Filesystem
pvc-b58c074a-3ebd-4749-b8ab-6966276af34b   10Gi       RWO            Delete           Bound    kafka/data-my-cluster-zookeeper-1   intceph                 5m54s   Filesystem
pvc-415cb469-1de5-4ca5-9622-8f251b6ce1fc   10Gi       RWO            Delete           Bound    kafka/data-my-cluster-zookeeper-2   intceph                 5m54s   Filesystem

edit all pv persistentVolumeReclaimPolicy Retain

$ kubectl get pv pvc-2e59d519-7b60-41c5-8f7d-364100a14eaf pvc-a9849088-1acd-48e3-bdcf-1c0240a861b7 pvc-e60fb453-a0ae-443b-9471-19cfea46c133 pvc-2d8edad7-cef1-46e0-99e0-24e4c2fbcfbb pvc-b58c074a-3ebd-4749-b8ab-6966276af34b pvc-415cb469-1de5-4ca5-9622-8f251b6ce1fc -o wide
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                               STORAGECLASS   REASON   AGE     VOLUMEMODE
pvc-2e59d519-7b60-41c5-8f7d-364100a14eaf   10Gi       RWO            Retain           Bound    kafka/data-my-cluster-kafka-0       intceph                 9m2s    Filesystem
pvc-a9849088-1acd-48e3-bdcf-1c0240a861b7   10Gi       RWO            Retain           Bound    kafka/data-my-cluster-kafka-1       intceph                 9m2s    Filesystem
pvc-e60fb453-a0ae-443b-9471-19cfea46c133   10Gi       RWO            Retain           Bound    kafka/data-my-cluster-kafka-2       intceph                 9m2s    Filesystem
pvc-2d8edad7-cef1-46e0-99e0-24e4c2fbcfbb   10Gi       RWO            Retain           Bound    kafka/data-my-cluster-zookeeper-0   intceph                 9m56s   Filesystem
pvc-b58c074a-3ebd-4749-b8ab-6966276af34b   10Gi       RWO            Retain           Bound    kafka/data-my-cluster-zookeeper-1   intceph                 9m56s   Filesystem
pvc-415cb469-1de5-4ca5-9622-8f251b6ce1fc   10Gi       RWO            Retain           Bound    kafka/data-my-cluster-zookeeper-2   intceph                 9m56s   Filesystem
$

back all resource yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-my-cluster-kafka-0
  namespace: kafka
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: intceph
  volumeMode: Filesystem
  volumeName: pvc-2e59d519-7b60-41c5-8f7d-364100a14eaf

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-my-cluster-kafka-1
  namespace: kafka
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: intceph
  volumeMode: Filesystem
  volumeName: pvc-a9849088-1acd-48e3-bdcf-1c0240a861b7

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-my-cluster-kafka-2
  namespace: kafka
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: intceph
  volumeMode: Filesystem
  volumeName: pvc-e60fb453-a0ae-443b-9471-19cfea46c133

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-my-cluster-zookeeper-0
  namespace: kafka
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: intceph
  volumeMode: Filesystem
  volumeName: pvc-2d8edad7-cef1-46e0-99e0-24e4c2fbcfbb

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-my-cluster-zookeeper-1
  namespace: kafka
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: intceph
  volumeMode: Filesystem
  volumeName: pvc-b58c074a-3ebd-4749-b8ab-6966276af34b

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-my-cluster-zookeeper-2
  namespace: kafka
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: intceph
  volumeMode: Filesystem
  volumeName: pvc-415cb469-1de5-4ca5-9622-8f251b6ce1fc

verify topic

$ kafka-producer-perf-test.sh --num-records 500 --topic my-topic --throughput -1 --record-size 1000 --producer-props bootstrap.servers=10.0.128.237:30609 --producer.config ./client-ssl.properties
500 records sent, 551.876380 records/sec (0.53 MB/sec), 320.75 ms avg latency, 613.00 ms max latency, 319 ms 50th, 438 ms 95th, 440 ms 99th, 613 ms 99.9th.

$ kafka-console-consumer.sh --bootstrap-server 10.0.128.237:30647 --topic my-topic --consumer.config ./client-ssl.properties --from-beginning --group my-group

$ kafka-consumer-groups.sh --bootstrap-server 10.0.128.237:30647 --command-config ./client-ssl.properties --describe --group my-group
GROUP           TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                              HOST            CLIENT-ID
my-group        my-topic        0          -               176             -               consumer-my-group-1-4b067430-fd88-4cfb-b4a2-6345298f33cb /10.0.128.64    consumer-my-group-1
my-group        my-topic        1          -               192             -               consumer-my-group-1-4b067430-fd88-4cfb-b4a2-6345298f33cb /10.0.128.64    consumer-my-group-1
my-group        my-topic        2          -               132             -               consumer-my-group-1-4b067430-fd88-4cfb-b4a2-6345298f33cb /10.0.128.64    consumer-my-group-1

$ kafka-consumer-groups.sh --bootstrap-server 10.0.128.237:30609 --command-config ./client-ssl.properties --describe --group my-group
GROUP           TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                              HOST            CLIENT-ID
my-group        my-topic        0          176             176             0               consumer-my-group-1-4b067430-fd88-4cfb-b4a2-6345298f33cb /10.0.128.64    consumer-my-group-1
my-group        my-topic        1          192             192             0               consumer-my-group-1-4b067430-fd88-4cfb-b4a2-6345298f33cb /10.0.128.64    consumer-my-group-1
my-group        my-topic        2          132             132             0               consumer-my-group-1-4b067430-fd88-4cfb-b4a2-6345298f33cb /10.0.128.64    consumer-my-group-1
$

back topic yaml

apiVersion: kafka.strimzi.io/v1beta1
kind: KafkaTopic
metadata:
  labels:
    strimzi.io/cluster: my-cluster
  name: my-topic
  namespace: kafka
spec:
  config:
    retention.ms: 604800000
    segment.bytes: 1073741824
  partitions: 3
  replicas: 3

---

apiVersion: kafka.strimzi.io/v1beta1
kind: KafkaTopic
metadata:
  labels:
    strimzi.io/cluster: my-cluster
  name: consumer-offsets---84e7a678d08f4bd226872e5cdd4eb527fadc1c6a
  namespace: kafka
spec:
  config:
    cleanup.policy: compact
    compression.type: producer
    segment.bytes: "104857600"
  partitions: 50
  replicas: 3
  topicName: __consumer_offsets

delete kafka cluster、kafkatopic、kafkauser

$ kubectl -n kafka delete kafkatopic consumer-offsets---84e7a678d08f4bd226872e5cdd4eb527fadc1c6a my-topic

$ kubectl -n kafka delete kafkauser my-user

$ kubectl -n kafka delete kafka my-cluster

pv is not delete

$ kubectl -n kafka get pvc
No resources found in kafka namespace.

$ kubectl get pv pvc-2e59d519-7b60-41c5-8f7d-364100a14eaf pvc-a9849088-1acd-48e3-bdcf-1c0240a861b7 pvc-e60fb453-a0ae-443b-9471-19cfea46c133 pvc-2d8edad7-cef1-46e0-99e0-24e4c2fbcfbb pvc-b58c074a-3ebd-4749-b8ab-6966276af34b pvc-415cb469-1de5-4ca5-9622-8f251b6ce1fc -o wide
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                               STORAGECLASS   REASON   AGE   VOLUMEMODE
pvc-2e59d519-7b60-41c5-8f7d-364100a14eaf   10Gi       RWO            Retain           Released   kafka/data-my-cluster-kafka-0       intceph                 35m   Filesystem
pvc-a9849088-1acd-48e3-bdcf-1c0240a861b7   10Gi       RWO            Retain           Released   kafka/data-my-cluster-kafka-1       intceph                 35m   Filesystem
pvc-e60fb453-a0ae-443b-9471-19cfea46c133   10Gi       RWO            Retain           Released   kafka/data-my-cluster-kafka-2       intceph                 35m   Filesystem
pvc-2d8edad7-cef1-46e0-99e0-24e4c2fbcfbb   10Gi       RWO            Retain           Released   kafka/data-my-cluster-zookeeper-0   intceph                 36m   Filesystem
pvc-b58c074a-3ebd-4749-b8ab-6966276af34b   10Gi       RWO            Retain           Released   kafka/data-my-cluster-zookeeper-1   intceph                 36m   Filesystem
pvc-415cb469-1de5-4ca5-9622-8f251b6ce1fc   10Gi       RWO            Retain           Released   kafka/data-my-cluster-zookeeper-2   intceph                 36m   Filesystem

Recreate the original PVC

Edit the PV specifications to delete the claimRef properties

$ kubectl -n kafka get pvc
NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-my-cluster-kafka-0       Bound    pvc-2e59d519-7b60-41c5-8f7d-364100a14eaf   10Gi       RWO            intceph        7m46s
data-my-cluster-kafka-1       Bound    pvc-a9849088-1acd-48e3-bdcf-1c0240a861b7   10Gi       RWO            intceph        7m46s
data-my-cluster-kafka-2       Bound    pvc-e60fb453-a0ae-443b-9471-19cfea46c133   10Gi       RWO            intceph        7m46s
data-my-cluster-zookeeper-0   Bound    pvc-2d8edad7-cef1-46e0-99e0-24e4c2fbcfbb   10Gi       RWO            intceph        7m46s
data-my-cluster-zookeeper-1   Bound    pvc-b58c074a-3ebd-4749-b8ab-6966276af34b   10Gi       RWO            intceph        7m46s
data-my-cluster-zookeeper-2   Bound    pvc-415cb469-1de5-4ca5-9622-8f251b6ce1fc   10Gi       RWO            intceph        7m46s

Recreate all KafkaUser resources.

Recreate all KafkaTopic resources.

Deploy the Kafka cluster

the Topic Operator will delete the topics, Recreate all KafkaTopic resources.

Recreate all resources use back yaml

verify data

$ kafka-consumer-groups.sh --bootstrap-server 10.0.128.237:30647 --command-config ./client-ssl.properties --describe --group my-group
GROUP           TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                              HOST            CLIENT-ID
my-group        my-topic        0          0               0               0               consumer-my-group-1-ce861df6-e32e-4ced-ae3f-e49f5a303b3e /10.0.129.171   consumer-my-group-1
my-group        my-topic        1          0               0               0               consumer-my-group-1-ce861df6-e32e-4ced-ae3f-e49f5a303b3e /10.0.129.171   consumer-my-group-1
my-group        my-topic        2          0               0               0               consumer-my-group-1-ce861df6-e32e-4ced-ae3f-e49f5a303b3e /10.0.129.171   consumer-my-group-1

The previous 500 data are lost.

What's wrong with my steps?

scholzj commented 4 years ago

I don't know. Probably something went wrong in this step and the Topic Operator deleted your topics?

Recreate all KafkaTopic resources.

lanzhiwang commented 4 years ago

Recovering a cluster from persistent volumes

Option 1: If you have all the KafkaTopic resources that existed before you lost your cluster, including internal topics such as committed offsets from __consumer_offsets:
It is essential that you recreate the resources before deploying the cluster, or the Topic Operator will delete the topics.

Is the document wrong?

lanzhiwang commented 4 years ago

Should I create a cluster first or create a topic first ?

scholzj commented 4 years ago

The KafkaTopic resources should be ideally created first. I think the guide is correct and works. But I'm not sure why it did not worked for you ... so you know, maybe it doesn't.

lanzhiwang commented 4 years ago

The difference between my steps and the guide steps is:guide delete namespaces and recreate namespace,deploy operator,recreate topic。I do not delete namespaces, only delete kafka and topic , Will this difference cause this problem?

scholzj commented 4 years ago

So how do you only delete kafka and topic? Can it be that the topics are deleted at this phase?

The procedure is an emergency procedure, right? So it doesn't really expect you to delete anything. It expects you to try to recover your cluster after you for example lost whole Kube cluster. You can of course emulate it by deleting things. But you need to delete them in the right way.

lanzhiwang commented 4 years ago

I delete the namespace , all resources in namespace are delete .

I recreate the namespace and all resource. data in pv are seen.

$ kafka-consumer-groups.sh --bootstrap-server 10.0.128.237:31995 --command-config ./client-ssl.properties --describe --group my-group
GROUP           TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                              HOST            CLIENT-ID
my-group        my-topic        0          176             176             0               consumer-my-group-1-bc7092f8-d531-498c-a971-d8c33c080f41 /10.0.129.171   consumer-my-group-1
my-group        my-topic        1          176             176             0               consumer-my-group-1-bc7092f8-d531-498c-a971-d8c33c080f41 /10.0.129.171   consumer-my-group-1
my-group        my-topic        2          148             148             0               consumer-my-group-1-bc7092f8-d531-498c-a971-d8c33c080f41 /10.0.129.171   consumer-my-group-1
$

if I only delete kafka、KafkaTopic、KafkaUser, the data is lost. why? is bug?

scholzj commented 4 years ago

SO deleting a namespace can have tricky results right -> you do not know what is deleted when and in which order. So it can produce different things.

lanzhiwang commented 4 years ago

ok,I understand,I test delete resource in a different order

Thanks

If you have nothing to direct guide, you can close this issue.

thanks again !