lni / dragonboat

A feature complete and high performance multi-group Raft library in Go.
Apache License 2.0
5.06k stars 541 forks source link

the confusion of k8s-deployment #332

Closed wh-afra closed 11 months ago

wh-afra commented 11 months ago

I have tried the draonboat in the hostnetwork mode ,that is the nodeaddress is host ip:port . It runs ok , great ! Thanks for sharing so useful lib .

At present , i hope to deploy it on k8s . However , the IP address will be recreated every reboot . I tried to use the "servicename:serviceport " to replcase "ip:port" , but it won't work . I refer to the link https://github.com/lni/dragonboat/issues/140 and use NodeHostConfig.AddressByNodeHostID and gossip config ,and yet i have some confusing problems . When I deploy the dragonboat server in k8s , i can not get the ip of remote pod . So how could i set the value of config.GossipConfig.seed ? The reference doc says so " Seed is a list of AdvertiseAddress of remote NodeHost instances. Local // NodeHost instance will try to contact all of them to bootstrap the gossip // service. At least one reachable NodeHost instance is required to // successfully bootstrap the gossip service. Each seed address is in the // format of IP:Port, Hostname:Port or DNS Name:Port." So the null value is forbidden . Can you give me some advice ? Thank you and i am looking forward to your reply.

lni commented 11 months ago

@wh-afra

As described by the docs, you should put a list of AdvertiseAddress values in that Seed field. AdvertiseAddress is the Hostname:Port or IP:Port pair from which your node can be accessed, they should be specified every time when you start a NodeHost instance.

It is your application's responsibility to figure out what is the correct Hostname:Port or IP:Port pair. You also need to implement a mechanism to allow at least a few (not all of them) of such NodeHost instances' AdvertiseAddress values to be available to all of your NodeHost instances to bootstrap their gossip process.

wh-afra commented 11 months ago

@lni
Thank you . But I could not get your point. for example, I have 3 host as raft node A,B,C ,and I deploy the raft server on them in k8s。 Of course ,in the pod A ,i can get the local hostname/hostip, so the AdvertiseAddress  can be figured out . However, theI can not get the hostip of pod B and podC ,so how can i set the seed value?

I am not sure if using kuberenetes-releated lib will allow mt to get the hostip of pod B and podC. Even so , the pod might not succeed ,and the problem might be more complex. Have I misunderstood your intention ?Or have i missed any important information? Thank you for your patience.

kevburnsjr commented 11 months ago

Best to use a StatefulSet.
Kubernetes will automatically name the pods in your StatefulSet using a Stable Network ID. So if the name of your StatefulSet is dragonboat-example, then the pods in the set will be named:

Those hostnames are addressable and can be used in your Dragonboat gossip config or shard member list. Even if your cluster has 100 nodes, you can still use 0, 1, 2 as the gossip seed.

cfg := NodeHostConfig {
    AdvertiseAddress: "dragonboat-example-99:63001",
    Gossip: GossipConfig {
        Seed: []string{
            "dragonboat-example-0:63001",
            "dragonboat-example-1:63001", 
            "dragonboat-example-2:63001",
        },
    },
}

StatefulSets have some other critical features like Volume Claim Templates which enable Stable Storage making StatefulSets basically mandatory when running any scalable persistent workload in Kubernetes.

kevburnsjr commented 11 months ago

If a pod crashes, it will be restarted with the same Stable Network ID (hostname), same local disk and same persistent disk.

If a node crashes, a new pod will be created with the same hostname on a different node. Any changes to the local disk will likely be lost, but any persistent disks (block storage) will be attached automatically to the new pod. In this way, the Host ID can survive a machine failure.

Alternatively, if the Host ID is stored on the local disk and it is lost during a crash then the new pod will have a new Host ID but the same AdvertiseAddress as the previous Pod/Host. This should not cause any issues because Gossip and Raft are separate and Kubernetes will always route network traffic to either the new pod or the old pod, never both.

wh-afra commented 11 months ago

@kevburnsjr Thank you for your detailed explanation. In our scenario, there is communication between a business service and a Raft node for data, and I am using the ondisk mode of Dragonboat. Will this be affected? Thank you .

kevburnsjr commented 11 months ago

Will what be affected? I don't understand the question.

wh-afra commented 11 months ago

@kevburnsjr
Perhaps I did not express myself clearly.

In my scenario, PVC currently only supports local storage class and some nodehostid-related information is saved on the local disk because of using the on-disk mode of dragonboat (Use pebble to store key-value pairs.).
I'm not sure if this is feasible using k8s's pvc .

And, How can I associate the physical machine with the final running pod? for example ,i have 3 machine to deploy server in k8s ,A,B,C then i deploy statefulset with replica=3 ,so it can start 3 pod (pod-0,pod-1,pod-2) ,the related pvc : pvc-0,pvc-1,pvc-2。

when pod crashed , it can restart . when node crashed , the pod-0 is not executed on A ,but on B. then pvc-0 will be associated with machine B . The nodehostid of machine B will not match with the storage class, which will cause an error.

Maybe we should extend the storage class ?

kevburnsjr commented 11 months ago

You will need to use anti-affinity. https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity

Configuration of the Kubernetes scheduler is well outside the scope of this repository.
I suggest you read a book on Kubernetes before seeking further advice here on this topic. I highly recommend this book by Brendan Burns:

Kubernetes: Up and Running https://www.amazon.com/dp/109811020X