projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.05k stars 1.34k forks source link

Calico fails on the new nodes added to the cluster #9284

Open vatsalmoradiya opened 2 months ago

vatsalmoradiya commented 2 months ago

When adding new nodes to the cluster, the calico pods keeps on failing. Even upon deletion as well as restarting the calico-node daemonsets it keeps failing. The old pods are not susceptible to a roll out restart or deletion though.

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

1. 2. 3. 4.

Context

This might have been a problem for quite some time but I didn't encounter it until I tried to add a new node to the cluster. In between, I was trying to install the configuration for SRIOV network on the system and started a few services like multus and whereabouts.

Also, this is a production environment so I cannot perform kubeadm reset.

The difference that I'm finding in the nodes in which the pods are working and which the pods are not working is that the calico.kubeconfig file has server ip mentioned as server: https://10.150.0.1:443 (Working) and server: https://[10.150.0.1]:443 (Not working)

I have tried to manually remove the brackets but the file is re-created upon restarting the daemonset or deleting the pod.

Your Environment

coutinhop commented 1 month ago

@vatsalmoradiya I don't think the [ ] in the IP should be a problem. Could you post logs and describe output for the calico-node pods that are failing? Same output for the working pods could also be useful, for comparison...

coutinhop commented 3 weeks ago

@vatsalmoradiya did you have time to gather logs and describe output?

vatsalmoradiya commented 3 weeks ago

Hey, Actually the system was in production, so we had to remove the cluster and install again as it was not getting resolved anytime soon. Now the error doesn’t exist.

Thanks for your support.

Regards, Vatsal Moradiya

On Tue, 5 Nov 2024 at 22:59, Pedro Coutinho @.***> wrote:

@vatsalmoradiya https://github.com/vatsalmoradiya did you have time to gather logs and describe output?

— Reply to this email directly, view it on GitHub https://github.com/projectcalico/calico/issues/9284#issuecomment-2457776325, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOFCXOAS5OQD4NEFUWDZHCLZ7D6A5AVCNFSM6AAAAABO4OKHU6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJXG43TMMZSGU . You are receiving this because you were mentioned.Message ID: @.***>