karimra / gnmic

gNMIc is a gNMI CLI client and collector
Apache License 2.0
217 stars 32 forks source link

Add kubernetes type clustering option #560

Open melkypie opened 2 years ago

melkypie commented 2 years ago

Currently the only KV storage we can use for clustering is Consul. A nice feature would be to add Kubernetes type and store all of the key/value information in Kubernetes objects similar to how argocd does it. This would allow the user to not have to maintain another KV storage solution. I know this is quite a big ask but I have already managed to deploy gnmic clustering on Kubernetes with consul and having this would allow me to not worry about having another KV storage. If needed I could write a guide on how to do deploy it to kubernetes and help with the serviceaccount/rolebinding/role objects and other kubernetes related things.

karimra commented 2 years ago

I'm not sure if I understand correctly but this sounds like it needs a separate component acting as a k8s controller for gnmic. It would be responsible for managing the state of the cluster. It could be that I'm overthinking this. How does argocd store an instance state in k8s? Do the objects have a TTL ? Can the TTL be refreshed?

A guide to deploy gnmic on k8s will be very helpful, it would fit nicely with the docs.

melkypie commented 2 years ago

Argocd stores most of its configuration in Secrets (but I am sure ConfigMaps would also be fine for gnmic) and Custom Resource Definitions (which would be too much for the simple use case in gmnic) which are basically key-value stores. They don't have a specific way to set a TTL but I am sure you could just create an entry in the specific configmap with the TTL value if that is needed. Since from what i can currently see that is being stored in Consul is just the leader of the cluster and to which instance a target belongs to which is something that ConfigMaps in k8s can easily hold. The service availability checking feature of Consul is also in k8s.

For the guide, I will start working on it right away.

karimra commented 2 years ago

Thanks for working on the guide and thanks for the details about argocd.

Consul does a little bit more that just storage. What I meant by TTL is a way for a key (leader or target ownership) to be deleted after a certain duration if its owner does not refresh it. Consul handles this natively. The key TTL mechanism makes leader election/reelection as well as target ownership locking/transfer easy. Consul also allows to run a long request to get notifications about services change, basically removing the need for periodic polls to discover instances of a certain service.

About using k8s as KV store for clustering, I think ownerReference can be used for leader election and target ownership:

I believe this should work, open to comments and suggestions, I might have missed something or expected a piece to work differently from its real behavior. I will give this a try and get back to you.

karimra commented 2 years ago

@melkypie if you can give the 0.25.0-beta release you will be able to try k8s based clustering. It uses leases as a locking mechanism.

The deployment method is similar to what you already did with Consul except:

I did some tests on my side, it seems to be stable even when shrinking the SS size

karim@kss:~/github.com/karimra/gnmic$ kubectl get leases
NAME                                  HOLDER       AGE
gnmic-cluster1-leader                 gnmic-ss-0   2d4h
gnmic-cluster1-targets-   gnmic-ss-0   2d4h
gnmic-cluster1-targets-   gnmic-ss-1   2d4h
gnmic-cluster1-targets-   gnmic-ss-0   2d4h
gnmic-cluster1-targets-   gnmic-ss-2   2d4h
gnmic-cluster1-targets-   gnmic-ss-0   2d4h
gnmic-cluster1-targets-   gnmic-ss-2   2d4h
gnmic-cluster1-targets-   gnmic-ss-2   2d4h
gnmic-cluster1-targets-   gnmic-ss-2   2d4h
gnmic-cluster1-targets-   gnmic-ss-1   2d4h
gnmic-cluster1-targets-   gnmic-ss-1   2d4h
gnmic-cluster1-targets-   gnmic-ss-0   2d4h
gnmic-cluster1-targets-   gnmic-ss-1   2d4h
gnmic-cluster1-targets-   gnmic-ss-0   2d4h
gnmic-cluster1-targets-   gnmic-ss-2   2d4h
gnmic-cluster1-targets-   gnmic-ss-1   2d4h

There is no mechanism to redistribute the targets when growing the SS

It would be helpful if you could give it a go to see if it fits your needs.

melkypie commented 2 years ago

Will do, I won't be able to get back to you until Tuesday as I don't have access to cluster where I could test out GNMI due to easter holidays.

melkypie commented 2 years ago

I gave it a try. From my experience it only was able to assign a target to the leader of the cluster as other non-leader instances seem to be failing to acquire locks for targets assigned to them. So it manages to assign 1 target ( the target that leader assigns itself after failing to assign it to other instances ) and then keeps on failing to assign other targets due to them not acquiring locks although if you look at leases you can see that the lease has been created. I am testing this on an RKE2 cluster with 3 masters and 2 workers, kubernetes version: v1.22.5+rke2r1

melkypie:~/projects/kubernetes$ kubectl get leases -n gnmic
NAME                                    HOLDER       AGE
gnmic-ip-net-monit1-leader              gnmic-ss-2   30m
gnmic-ip-net-monit1-targets-device1    gnmic-ss-2   29m
gnmic-ip-net-monit1-targets-device2   gnmic-ss-0   11s


apiVersion: apps/v1
kind: StatefulSet
  name: gnmic-ss
  namespace: gnmic
    app: gnmic
  replicas: 3
      app: gnmic
  serviceName: gnmic-svc
        app: gnmic
        version: 0.25.0-beta
        - args:
            - subscribe
            - --config
            - /app/config.yaml
          image: ghcr.io/karimra/gnmic:0.25.0-beta-scratch
          imagePullPolicy: IfNotPresent
          name: gnmic
            allowPrivilegeEscalation: false
                - all
            readOnlyRootFilesystem: true
            runAsNonRoot: true
            runAsUser: 1000
            - containerPort: 9804
              name: prom-output
              protocol: TCP
            - containerPort: 7890
              name: gnmic-api
              protocol: TCP
              cpu: 100m
              memory: 400Mi
              cpu: 50m
              memory: 200Mi
            - secretRef:
                name: gnmic-login
            - name: GNMIC_API
              value: :7890
                  fieldPath: metadata.name
              value: "$(GNMIC_CLUSTERING_INSTANCE_NAME).gnmic-svc.gnmic.svc.cluster.local"
            - name: GNMIC_OUTPUTS_PROM_LISTEN
              value: "$(GNMIC_CLUSTERING_INSTANCE_NAME).gnmic-svc.gnmic.svc.cluster.local:9804"
            - mountPath: /app/config.yaml
              name: config
              subPath: config.yaml
      serviceAccountName: gnmic-user
        - configMap:
            defaultMode: 420
            name: gnmic-config
          name: config


apiVersion: rbac.authorization.k8s.io/v1
kind: Role
  namespace: gnmic
  name: svc-pod-lease-reader
- apiGroups: [""]
  resources: ["pods", "services"]
  verbs: ["get", "watch", "list"]
- apiGroups: ["coordination.k8s.io"]
  resources: ["leases"]
  verbs: ["get", "list", "watch", "create", "update", "delete"]
apiVersion: v1
kind: ServiceAccount
  name: gnmic-user
  namespace: gnmic
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
  name: read-pods-leases
  namespace: gnmic
- kind: ServiceAccount
  name: gnmic-user
  kind: Role
  name: svc-pod-lease-reader
  apiGroup: rbac.authorization.k8s.io


apiVersion: v1
kind: Service
  name: gnmic-svc
  namespace: gnmic
    app: gnmic
  - name: http
    port: 9804
    protocol: TCP
    targetPort: 9804
    app: gnmic
  clusterIP: None
apiVersion: v1
kind: Service
  name: cluster1-gnmic-api
  namespace: gnmic
  - name: http
    port: 7890
    protocol: TCP
    targetPort: 7890
    app: gnmic
  clusterIP: None


apiVersion: v1
kind: ConfigMap
  name: gnmic-config
  namespace: gnmic
  config.yaml: |
    insecure: true
    encoding: json_ietf
    log: true

      cluster-name: cluster1
      targets-watch-timer: 30s
      leader-wait-timer: 30s
        type: k8s
        namespace: gnmic

        address: device1:6030
          - general
        address: device2:6030
          - general
        address: device3:6030
          - general
        address: device4:6030
          - general

          - /interfaces/interface/state/counters
        stream-mode: sample
        sample-interval: 5s

        type: prometheus
        strings-as-labels: true

Also adding sanitized log files ( also I noticed that gnmic seems to be logging plaintext passwords in logs which would be great if it did not do that ): gnmic-ss-1.log gnmic-ss-0.log gnmic-ss-2.log

The logs are from trying it a second time, so you can't see where it created the device1 lease.

karimra commented 2 years ago

I'm not sure what is going wrong here, I re tested with a single node as well as 1 control and 2 worker nodes (1.23.4 and 1.22.7) I'm using kind clusters. The leader timing out and reassigning the target to another node means that the selected instance was not able to create the lease and/or maintain it.

melkypie commented 2 years ago

The leader assigning the target to itself I understood, but yea the most interesting part is that the lock/lease is not being recognized by the leader although if you look at the leases it is there. My other thought was that maybe something was wrong with RBAC but when I get a pod with kubectl using that same serviceaccount (the one that gnmic uses) it can access all of the leases so not sure what is going on there. I will give it another try tomorrow and try deleting the whole namespace before doing it.

melkypie commented 2 years ago

Finally got around to testing it and I found the error! I had a cluster name with a - in it. So when it tries to list the leases, it replaces the cluster name - with / in here https://github.com/karimra/gnmic/blob/3caa03e83020b0a11ea8fc88b71ca0048a90a196/lockers/k8s_locker/k8s_registration.go#L106 It is my fault for not providing exact configs I used to deploy as then it might have been easier to debug.

EDIT: Also seems to be the case with targets having - in them

karimra commented 2 years ago

That part actually replaces / with -. But I think you put your finger on the problem; the leader won't be able to retrieve a lock if the cluster name or the target name contains a -. Thanks for sharing your findings.

The leader keeps a mapping of the transformed key (/ --> - ) to the original key to be able to revert it back, but it can only map back the keys it locked itself (silly me), that's why only the leader locks are successful. I was hoping to get away with this to maintain compatibility with the consul locker and not have to rewrite the global clustering code.

I got rid of the key mapping and added the original key as an annotation to the lease, that's how the List function will be able to return the list of original keys given a prefix. I did some tests with cluster name cluster-1 and it seems to be fine, a target lease looks like this:

Name:         gnmic-cluster-1-targets-
Namespace:    gnmic
Labels:       app=gnmic
Annotations:  original-key: gnmic/cluster-1/targets/
API Version:  coordination.k8s.io/v1
Kind:         Lease
  Creation Timestamp:  2022-04-26T05:31:05Z
  Managed Fields:
    API Version:  coordination.k8s.io/v1
    Fields Type:  FieldsV1
    Manager:         gnmic
    Operation:       Update
    Time:            2022-04-26T05:31:05Z
  Resource Version:  1876693
  UID:               ea0e4259-b39a-47f2-a62a-60dfb64cccb1
  Acquire Time:            2022-04-26T05:39:53.085031Z
  Holder Identity:         gnmic-ss-2
  Lease Duration Seconds:  10
  Renew Time:              2022-04-26T05:39:53.085031Z
Events:                    <none>

I will issue a release shortly with this code so you can test it (if you don't mid)

melkypie commented 2 years ago

Seems to be fine. Works with both cluster name and targets having - in them.

The targets not being redistributed if the statefulset is scaled up does not currently work as you said is quite an important feature but that is out of scope for this issue.

karimra commented 2 years ago

Thanks for testing it, I will write some docs about k8s based clustering before releasing.

Concerning redistribution, I think this can be done periodically (enabled via a knob redistribution-interval: 5m for e.g) or triggered by an API request to the leader. If you are interested in this, please open another issue we can follow it up there.