Open porscheme opened 1 year ago
@porscheme I have some questions about your scenario:
Thanks @MegaByte875 for the reply Cluster version we are using is v3.3.0
@porscheme I have some questions about your scenario:
- How does the graphd, metad, storaged pods distribute on k8s Nodes
[Porsche] Each one of the component has its own separate K8s node pool and subnet. We are using Azure Standard_L16as_v3 SKU. Auto scale disabled
- Do you set the PDB to guarantee service availability
[Porsche] No, we are not using any PDB since Nebula official docs doesn't mention about it. Should we use PDB, can you point me to any Nebula docs?
- Do you use SSD NVMe disks for storaged
[Porsche] Yes, Azure Standard_L16as_v3 SKU comes with two 2 TB SSD NVMe disks attached to the VM
- Whether the partition leader exists on the storage node before upgrading
[Porsche] Yes before upgrade, the partition leader does exists on the storage node and it was balanced with other storage nodes. But during upgrade, leaders on the storage node become zero (SHOW HOSTS)
Azure patched our K8S cluster today and the Nebula cluster was down. Nebula reliability metrics went down!!!
Here is an implementation plan, I wish will help you:
Cluster config metad: 3 Storaged: 3 metad: 5 Each of the storage node has 2 X 2TB SSD NVMe Disks
Space Config VID: String (Length 20) Partition Number: 200 Replica Factor: 3
Our cluster is running in Azure. We enabled auto patching & upgrade (Kubernetes upgrade). Often times manual intervention is required when VM is stuck in upgrade.