samba-in-kubernetes / samba-operator

An operator for a Samba as a service on PVCs in kubernetes
Apache License 2.0
101 stars 24 forks source link

Is it possible to scale down a clustered samba server instance? #318

Open FTS152 opened 7 months ago

FTS152 commented 7 months ago

Hi, I have deployed a Samba server with clustering feature enabled. According to the documentation, I can change the number of Samba servers by configuring scaling.minClusterSize in smbshare CRD, and this will directly impact the number of replicas in the Samba Statefulset.

  • minClusterSize: The minimum number of Samba server "nodes" to host the share. The operator is permitted to run more servers in some conditions.

Based on the documentation and actual testing, it appears that when modifying a running Samba instance, I can only increase the number of replicas by raising scaling.minClusterSize(scale out), but cannot decrease the number of replicas by lowering the parameter(scale down).

I would like to understand why this parameter is designed to represent the "minimum quantity" rather than being designed to be "exactly equal to the current replicas quantity in the Statefulset." What is the difficulty in reducing the cluster quantity? On the other hand, I also whant to know the recommended approach for decreasing the number of replicas manually.(Should I do anything rather than just modify the replica number if I want to do a scale down?)

Thanks!

phlogistonjohn commented 7 months ago

You may remember that you had to enable this experimental feature, it's experimental precisely because of items like this. To try and summarize the main issue - kubernetes pods are dynamic and easy to create and destroy, however Samba's CTDB is more oriented towards managing actual physical nodes. The CTDB configuration is not very flexible and expects to retain references to nodes that have been removed in the config file by we try to generate the config file based on what pods are present. Therefore it was simply not implemmented (yet) due to the complexity.

The minimum quantity question is related but not fully overlapping. There was a desire to not force the operator to prematurely shrink a working cluster. Additionally, we have the ability to host multiple shares on one "instance" (one smbd or a cluster of smbds). It permits the system to host a share with a minimum of 2 pods on a cluster with 3 (or more) actual pods for instance.

FTS152 commented 7 months ago

Thanks for the reply. It seems that scaling down involve considerations related to CTDB configuration. To the best of my knowledge, CTDB uses config file(/etc/ctdb/nodes) to maintain the relationship between IPs and PNNs (node number). When removing a node, one should follow the procedure to remove nodes in CTDB to keep the correct line numbers. Specifically, the config file still stores the information about removed nodes(by commenting out the lines). I agree that ensuring the consistency of CTDB configurations while operators scale up or down pods is a complex issue.

phlogistonjohn commented 7 months ago

That's exactly the issue. I don't think it's insurmountable but what I think would be better would be to work with the Samba team to support alternative means of configuration that work better with k8s or other orchestration frameworks. This is a long term thing IMO. Please continue to try out the samba-operator but be aware that the CTDB mode will have more sharp edges - bug reports and assistance there are very welcome - but I've been pulled in other directions for now and don't have time for large projects improving the operator at the moment.

FTS152 commented 7 months ago

Okay. Still thanks for the advice. I think I can try investigating this issue and see if I can do something.

FTS152 commented 5 months ago

Hi, recently I have more time to study this issue. To scale down the replica of a clustered samba instance, I think there are several steps to do:

  1. Modify updateStatefulSetSize() function in samba-operator to deal with scaling down case.
  2. Once replica number of StatefulSet is modified, samba-operator should inform the remaining pods that there is a deletion.
  3. The remaining pods update CTDB configuration and reload. (follow node remove steps in ctdb docs)

For implementation, My initial idea is using preStop hook in k8s to deal with this. By adding the hook, one can perform CTDB configuration when a pod is terminated by k8s api server. The modification should also include sambacc, which provides a method to update state file and node file in the shared storage, then reload ctdb. For example, the modified Samba server pod in StatefulSet would be like:

apiVersion: v1
kind: Pod
metadata:
  name: samba-server-0
spec:
  containers:
  - name: ctdb-delete-node #or put it in current existing containers
    image: quay.io/samba.org/samba-server:v0.4
    lifecycle:
      preStop:
        exec:
          command: ["samba-contaner", "ctdb-delete-node"]

So this is my current rough idea, not sure if there’s anything I thought wrong or didn’t take into consideration. If everything is fine, I can try to implement this feature and send a PR. Thanks for any advice!

Ref: https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#container-hooks https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/ https://ctdb.samba.org/manpages/ctdb.1.html#:~:text=also%20%22ctdb%20getcapabilities%22-,reloadnodes,-This%20command%20is

FTS152 commented 4 months ago

@phlogistonjohn So this is my current plan for scaling down a clustered Samba server, the situation is when a user wants to reduce the replica number by editing minClusterSize in smbshare. The detailed steps is as follows:

  1. updateStatefulSetSize() updates modified size to the value of replica in statefulset.
  2. K8s automatically deletes the pod with the greatest name index in statefulset.
  3. When the pod is terminating, preStop hook is triggered and execute commands.
    lifecycle:
      preStop:
        exec:
          command:
          - /bin/sh
          - -c
          - sleep 1 && samba-container ctdb-delete-node --hostname=$HOSTNAME --take-node-number-from-hostname=after-last-dash 
  4. samba-container ctdb-delete-node edits the state of deleted node in ctdb-nodes.json from ready to deleted. (I delay this operation for 1 sec considering ctdb reloadnodes must be executed after the node is actually disconnected. I'm not sure this delay is necessary.)
  5. When ctdb.manage_nodes detects the state change of the deleted node, it comments out the ip in /var/lib/ctdb/shared/nodes and do ctdb reload.
  6. After manage_nodes updates the change, the delete node is removed from ctdb cluster
    
    [root@samba-service-smbserver-0 /]# cat /var/lib/ctdb/shared/ctdb-nodes.json
    {"nodes": [{"identity": "samba-service-smbserver-0", "node": "10.42.2.184", "pnn": 0, "state": "ready"}, {"identity": "samba-service-smbserver-1", "node": "10.42.1.95", "pnn": 1, "state": "deleted"}]}

[root@samba-service-smbserver-0 /]# cat /var/lib/ctdb/shared/nodes 10.42.2.184

10.42.1.95

[root@samba-service-smbserver-0 /]# ctdb status Number of nodes:2 (including 1 deleted nodes) pnn:0 10.42.2.184 OK (THIS NODE) Generation:887371738 Size:1 hash:0 lmaster:0 Recovery mode:NORMAL (0) Leader:0



I have implemented and tested it in my local environment. I want to ask some advices to see if there are any mistakes and maybe send a PR to samba-operator and sambacc. Thanks.