projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
5.9k stars 1.31k forks source link

Deploying calico fails when default podCIDR is set to 33:177:177::/112 #7560

Closed lyyao09 closed 3 months ago

lyyao09 commented 1 year ago

Expected Behavior

Use the podCIDR 33:177:177::/112 to deploy successfully, or explain why this podCIDR is unavailable.

Current Behavior

Deploying calico failed.

[root@node2 calico]# kubectl get pod -A
NAMESPACE     NAME                                       READY   STATUS              RESTARTS   AGE
kube-system   calico-kube-controllers-574f876f4d-xczm9   0/1     CrashLoopBackOff    2          25h
kube-system   calico-node-ktps9                          0/1     CrashLoopBackOff    2          25h
kube-system   coredns-59c7f645df-wf5qx                   0/1     ContainerCreating   0          25h
kube-system   kube-apiserver-node2                       1/1     Running             0          25h
kube-system   kube-controller-manager-node2              1/1     Running             2          25h
kube-system   kube-multus-ds-amd64-8cz6p                 1/1     Running             0          25h
kube-system   kube-proxy-wrj5w                           1/1     Running             0          25h
kube-system   kube-scheduler-node2                       1/1     Running             2          25h

Possible Solution

Steps to Reproduce (for bugs)

  1. Deploy k8s cluster with ipv6 or dual-stack protocol;
  2. Prepare the yaml file of calico-node;
  3. Modify the value of the CALICO_IPV6POOL_CIDR field in yaml to 33:177:177::/112;
  4. Deploy calico using calico-node.yaml;

Context

  1. cat felix and calico-kube-controller log:
    
    2023-04-18 15:00:32.587 [INFO][1] ipam.go 45: Synchronizing IPAM data
    panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x137b4dc]

goroutine 151 [running]: github.com/projectcalico/libcalico-go/lib/backend/model.BlockListOptions.KeyFromDefaultPath(0x0, 0xc0007de060, 0x2b, 0x2b, 0xc0007de060) /go/pkg/mod/github.com/projectcalico/libcalico-go@v1.7.2-0.20200616235705-7bb88b19faec/lib/backend/model/block.go:96 +0x17c github.com/projectcalico/libcalico-go/lib/backend/etcdv3.convertListResponse(0xc00013f270, 0x1af26a0, 0x2751320, 0x1) /go/pkg/mod/github.com/projectcalico/libcalico-go@v1.7.2-0.20200616235705-7bb88b19faec/lib/backend/etcdv3/conversion.go:35 +0x16e github.com/projectcalico/libcalico-go/lib/backend/etcdv3.(etcdV3Client).List(0xc000696638, 0x1b1f520, 0xc0001d62c0, 0x1af26a0, 0x2751320, 0x0, 0x0, 0x0, 0x0, 0x0) /go/pkg/mod/github.com/projectcalico/libcalico-go@v1.7.2-0.20200616235705-7bb88b19faec/lib/backend/etcdv3/etcdv3.go:424 +0x65d github.com/projectcalico/kube-controllers/pkg/controllers/node.(NodeController).syncIPAMCleanup(0xc0000d03c0, 0x0, 0x0) /go/src/github.com/projectcalico/kube-controllers/pkg/controllers/node/ipam.go:49 +0x180 github.com/projectcalico/kube-controllers/pkg/controllers/node.(NodeController).syncDelete(0xc0000d03c0, 0xc0007ecf68, 0x2) /go/src/github.com/projectcalico/kube-controllers/pkg/controllers/node/node_controller.go:186 +0x2f github.com/projectcalico/kube-controllers/pkg/controllers/node.(NodeController).acceptScheduleRequests(0xc0000d03c0, 0xc0000b82a0) /go/src/github.com/projectcalico/kube-controllers/pkg/controllers/node/node_controller.go:168 +0xbf created by github.com/projectcalico/kube-controllers/pkg/controllers/node.(*NodeController).Run /go/src/github.com/projectcalico/kube-controllers/pkg/controllers/node/node_controller.go:149 +0x20d

2.cat block.go related code:

func (options BlockListOptions) KeyFromDefaultPath(path string) Key { log.Debugf("Get Block key from %s", path) r := matchBlock.FindAllStringSubmatch(path, -1) if len(r) != 1 { log.Debugf("%s didn't match regex", path) return nil } cidrStr := strings.Replace(r[0][1], "-", "/", 1) , cidr, := net.ParseCIDR(cidrStr) return BlockKey{CIDR: *cidr} }

type BlockKey struct { CIDR net.IPNet json:"-" validate:"required,name" }

func (key BlockKey) defaultPath() (string, error) { if key.CIDR.IP == nil { return "", errors.ErrorInsufficientIdentifiers{} } c := strings.Replace(key.CIDR.String(), "/", "-", 1) e := fmt.Sprintf("/calico/ipam/v2/assignment/ipv%d/block/%s", key.CIDR.Version(), c) return e, nil }

// Version returns the IP version for an IPNet, or 0 if not a valid IP net. func (i *IPNet) Version() int { if i.IP.To4() != nil { return 4 } else if len(i.IP) == net.IPv6len { return 6 } return 0 }


5.cat /calico/ipam/v2/assignment key and value in etcd:
![image](https://user-images.githubusercontent.com/24425476/232701961-f5034926-ede6-4181-a8c0-17f048e4c72f.png)

## Your Environment
<!--- Include as many relevant details about the environment you experienced the bug in -->
* Calico version v3.15.1 and latest v3.26.0-0.dev-403-gf8c46d4273ba
* Orchestrator version (e.g. kubernetes, mesos, rkt): v1.21.14
* Operating System and version: CentOS 7.6
* Link to your project (optional):
mgleung commented 1 year ago

Hey @lyyao09 , just for clarification, could I ask you to describe how you did this step from your reproduction steps?

2. Deploy calico and change default ipv6 podCIDR to 33:177:177::/112;

More specifically, I'm trying to understand how you changed the default IPv6 CIDR and what state the cluster was in before the change in default IPv6 CIDR.

lyyao09 commented 1 year ago

@mgleung , I'm sorry that I didn't describe it clearly. This step means that before deploying calico, change the CALICO_IPV6POOL_CIDR defined in calico-node yaml as below and then apply yaml:

cat calico-node.yaml
           ...
            - name: CALICO_IPV4POOL_CIDR
              value: "177.177.0.0/16"
            - name: CALICO_DISABLE_FILE_LOGGING
              value: "false"
            - name: FELIX_DEFAULTENDPOINTTOHOSTACTION
              value: "ACCEPT"
            - name: FELIX_IPV6SUPPORT
              value: "true"
            - name: CALICO_IPV6POOL_CIDR
              value: "33:177:177::/64"
            - name: CALICO_IPV6POOL_NAT_OUTGOING
              value: "true"
            - name: FELIX_LOGSEVERITYSCREEN
              value: "debug"
            - name: FELIX_HEALTHENABLED
              value: "true"
            - name: FELIX_IPTABLESBACKEND
              value: Auto
              ...
mgleung commented 1 year ago

@lyyao09 thanks for the clarification. Could I ask you to check if the default IPv6 pool is created properly in your cluster? There's something obviously wrong since you're seeing a panic, but I'm just trying to narrow the issue down.

lyyao09 commented 1 year ago

@mgleung yes, IPv6 pool is created properly.

[root@node2 calico]# calicoctl get ippool
NAME                  CIDR              SELECTOR
default-ipv4-ippool   177.177.0.0/16    all()
default-ipv6-ippool   33:177:177::/64   all()

From the value ipv0 of assignment in etcd, the implementation of block assignment thinks that 33:177:177::/64 is not a legal ipv6 address.

caseydavenport commented 3 months ago

I've been unable to reproduce this on modern versions of Calico, so assuming it has been fixed since.

lyyao09 commented 1 month ago

Yes, I also tested version v3.23.5 and it can be deployed successfully.