Open MuhtasimTanmoy opened 1 year ago
If you're getting what looks like a functional cluster from make cluster-create
then I think you're on the right path. After I run that command I get
> export KUBECONFIG=kubeconfig.yaml
> kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-558bd4d5db-4s6jh 0/1 Pending 0 54s
kube-system coredns-558bd4d5db-j26ws 0/1 Pending 0 54s
kube-system etcd-kind-control-plane 1/1 Running 0 68s
kube-system kube-apiserver-kind-control-plane 1/1 Running 0 68s
kube-system kube-controller-manager-kind-control-plane 1/1 Running 0 68s
kube-system kube-proxy-67d9l 1/1 Running 0 35s
kube-system kube-proxy-8h8v4 1/1 Running 0 35s
kube-system kube-proxy-rvw7f 1/1 Running 0 54s
kube-system kube-proxy-z8b9p 1/1 Running 0 35s
kube-system kube-scheduler-kind-control-plane 1/1 Running 0 68s
local-path-storage local-path-provisioner-5545dd49d7-wvj9w 0/1 Pending 0 54s
Also if I look at the crds in the cluster I see the following. I'm including this because I'm wondering if they were not created as they should have been based on the error you've received.
> kubectl get crds | grep operator
amazoncloudintegrations.operator.tigera.io 2023-10-17T13:28:27Z
apiservers.operator.tigera.io 2023-10-17T13:28:27Z
applicationlayers.operator.tigera.io 2023-10-17T13:28:27Z
authentications.operator.tigera.io 2023-10-17T13:28:27Z
compliances.operator.tigera.io 2023-10-17T13:28:27Z
egressgateways.operator.tigera.io 2023-10-17T13:28:27Z
imagesets.operator.tigera.io 2023-10-17T13:28:27Z
installations.operator.tigera.io 2023-10-17T13:28:27Z
intrusiondetections.operator.tigera.io 2023-10-17T13:28:27Z
logcollectors.operator.tigera.io 2023-10-17T13:28:27Z
logstorages.operator.tigera.io 2023-10-17T13:28:27Z
managementclusterconnections.operator.tigera.io 2023-10-17T13:28:27Z
managementclusters.operator.tigera.io 2023-10-17T13:28:27Z
managers.operator.tigera.io 2023-10-17T13:28:27Z
monitors.operator.tigera.io 2023-10-17T13:28:27Z
policyrecommendations.operator.tigera.io 2023-10-17T13:28:27Z
tenants.operator.tigera.io 2023-10-17T13:28:27Z
tigerastatuses.operator.tigera.io 2023-10-17T13:28:27Z
Hello @tmjd. I was able to fix the previous error and get to the exact state that you are in.
At that moment, the cluster nodes were in a NotReady
state, and coredns
pods were in a pending state due to not being able to get an IP Address from Pod Network as there was no CNI
.
So, after installing the default custom resource with
kubectl create -f ./config/samples/operator_v1_installation.yaml
I have the following state.
NAMESPACE NAME READY STATUS RESTARTS AGE
calico-system calico-kube-controllers-6c6d97c87b-4bcdx 0/1 ContainerCreating 0 22m
calico-system calico-node-2dtfx 0/1 ImagePullBackOff 0 22m
calico-system calico-node-j24j9 0/1 ImagePullBackOff 0 22m
calico-system calico-node-qz6xg 0/1 ImagePullBackOff 0 22m
calico-system calico-node-smnjn 0/1 ImagePullBackOff 0 23m
calico-system calico-typha-66cdfb85cf-qgw79 1/1 Running 0 23m
calico-system calico-typha-66cdfb85cf-qks8w 1/1 Running 0 22m
calico-system csi-node-driver-d2zsn 0/2 ContainerCreating 0 22m
calico-system csi-node-driver-lzqkb 0/2 ContainerCreating 0 22m
calico-system csi-node-driver-tcnlm 0/2 ContainerCreating 0 22m
calico-system csi-node-driver-z5ml9 0/2 ContainerCreating 0 22m
kube-system coredns-558bd4d5db-bpt8f 0/1 ContainerCreating 0 27m
kube-system coredns-558bd4d5db-cqvjj 0/1 ContainerCreating 0 27m
kube-system etcd-kind-control-plane 1/1 Running 0 27m
kube-system kube-apiserver-kind-control-plane 1/1 Running 0 27m
kube-system kube-controller-manager-kind-control-plane 1/1 Running 0 27m
kube-system kube-proxy-952lc 1/1 Running 0 27m
kube-system kube-proxy-jsphk 1/1 Running 0 27m
kube-system kube-proxy-nldmc 1/1 Running 0 27m
kube-system kube-proxy-vq5vm 1/1 Running 0 27m
kube-system kube-scheduler-kind-control-plane 1/1 Running 0 27m
local-path-storage local-path-provisioner-778f7d66bf-dmknx 0/1 ContainerCreating 0 27m
Here the csi-node-driver
, calico-kube-controllers
and local-path-provisioner
is currently wating for calico-node
to be up and running. However, it is getting the ImagePullBackOff
error.
Events from the pods show
> kubectl get events -A | grep -i calico-node-qz6xg
calico-system 27m Normal Scheduled pod/calico-node-qz6xg Successfully assigned calico-system/calico-node-qz6xg to kind-worker3
calico-system 27m Normal Pulling pod/calico-node-qz6xg Pulling image "docker.io/calico/pod2daemon-flexvol:master"
calico-system 26m Normal Pulled pod/calico-node-qz6xg Successfully pulled image "docker.io/calico/pod2daemon-flexvol:master" in 17.0363043s
calico-system 26m Normal Created pod/calico-node-qz6xg Created container flexvol-driver
calico-system 26m Normal Started pod/calico-node-qz6xg Started container flexvol-driver
calico-system 26m Normal Pulling pod/calico-node-qz6xg Pulling image "docker.io/calico/cni:master"
calico-system 24m Normal Pulled pod/calico-node-qz6xg Successfully pulled image "docker.io/calico/cni:master" in 2m26.246357483s
calico-system 24m Normal Created pod/calico-node-qz6xg Created container install-cni
calico-system 24m Normal Started pod/calico-node-qz6xg Started container install-cni
calico-system 22m Normal Pulling pod/calico-node-qz6xg Pulling image "docker.io/calico/node:master"
calico-system 22m Warning Failed pod/calico-node-qz6xg Failed to pull image "docker.io/calico/node:master": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/calico/node:master": no match for platform in manifest: not found
calico-system 23m Warning Failed pod/calico-node-qz6xg Error: ErrImagePull
calico-system 3m37s Normal BackOff pod/calico-node-qz6xg Back-off pulling image "docker.io/calico/node:master"
calico-system 22m Warning Failed pod/calico-node-qz6xg Error: ImagePullBackOff
calico-system 27m Normal SuccessfulCreate daemonset/calico-node Created pod: calico-node-qz6xg
Specifically this error
Failed to pull image "docker.io/calico/node:master": rpc error: code = NotFound desc = failed to pull
and unpack image "docker.io/calico/node:master": no match for platform in manifest: not found
So, what needs to be done to fix this when it is trying to fetch docker.io/calico/node:master
?
Note that, docker pull docker.io/calico/node:latest
works whereas node:master
does not.
I was able to resolve the ImagePullBackOff
error by making slight changes to the package/components/calico.go
by replacing the version with "latest".
So currently everything is up and running:
> kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
calico-system calico-kube-controllers-5c6d8778f5-btvp6 1/1 Running 0 17m
calico-system calico-node-22lc7 1/1 Running 0 17m
calico-system calico-node-jvn5n 1/1 Running 0 17m
calico-system calico-node-lvm22 1/1 Running 0 17m
calico-system calico-node-qffqg 1/1 Running 0 17m
calico-system calico-typha-745f498dff-dn26b 1/1 Running 0 17m
calico-system calico-typha-745f498dff-flbw6 1/1 Running 0 17m
calico-system csi-node-driver-4h6w5 2/2 Running 0 60s
calico-system csi-node-driver-59gb8 2/2 Running 0 17m
calico-system csi-node-driver-b5c29 2/2 Running 0 17m
calico-system csi-node-driver-gzvtg 2/2 Running 0 17m
kube-system coredns-558bd4d5db-pz5fc 1/1 Running 0 21m
kube-system coredns-558bd4d5db-wrc7c 1/1 Running 0 21m
kube-system etcd-kind-control-plane 1/1 Running 0 21m
kube-system kube-apiserver-kind-control-plane 1/1 Running 0 21m
kube-system kube-controller-manager-kind-control-plane 1/1 Running 0 21m
kube-system kube-proxy-mkdbs 1/1 Running 0 21m
kube-system kube-proxy-pd7b2 1/1 Running 0 20m
kube-system kube-proxy-q4hq8 1/1 Running 0 20m
kube-system kube-proxy-sgx4l 1/1 Running 0 20m
kube-system kube-scheduler-kind-control-plane 1/1 Running 0 21m
local-path-storage local-path-provisioner-778f7d66bf-44cw5 1/1 Running 0 21m
So, in summary, to set up the cluster in Apple Silicon (M1)
I needed to make the following three changes.
$(BUILDOS)/$(ARCH)/kubectl
to make it OS-independent. https://github.com/tigera/operator/blob/077f7483633898b43050b7cfcca9935eea34ebf0/Makefile#L271kind
binary to sh -c "GOBIN=$(CURDIR)/$(BINDIR) go install sigs.k8s.io/kind"
only. https://github.com/tigera/operator/blob/077f7483633898b43050b7cfcca9935eea34ebf0/Makefile#L277package/components/calico.go
file, change the version as described above.Would you briefly direct if these changes need to be reflected in the source via pull request to support local development on M1 or if this issue needs other approaches due to some side effects?
I expect 1 and 2 would be fine. 3 wouldn't be ideal because I think latest is probably the 'latest' released images which would not be the same as using master images. I'm guessing the issue is that only the amd64 images are built and pushed for master builds so the arm images are not available.
For 3, yes, the 'latest' released image may cause issues compared to stable and tested 'master' images. But as this docker.io/calico/node:master
image is unavailable for the arm what should be the workaround as this is a blocker for creating a cluster?
Additionally, is a pull request needed with changes made in 1 and 2?
Sorry there hasn't been any response here for a while.
For 1: If you want to make the suggested change that would be good.
For 2: I think you are suggesting switch to
sh -c "GOBIN=$(CURDIR)/$(BINDIR) go install sigs.k8s.io/kind"
I don't think that is something we would want in general, since it would no longer be containerized which is something we want to maintain. I'd be ok with a conditional based on BUILDOS, perhaps if BUILDOS != linux then instruct user to copy a functional kind binary to $(BINDIR)/kind
For 3: You could request the projectcalico/calico to push the node image for arm on master builds. Another option would be to have a make target that switches the versions to ease creating a build with latest (or some other tag). Maybe something like the following
set-calico-version:
sed -i -e "s/version: .*$$/version: $(VERSION)/" config/calico_versions.yml
make gen-versions-calico
For 1: ok
For 2:
I don't think that is something we would want in general, since it would no longer be containerized which is something we want to maintain. I'd be ok with a conditional based on BUILDOS, perhaps if BUILDOS != linux then instruct user to copy a functional kind binary to $(BINDIR)/kind
As the kind
binary is being used to create local cluster on host
machine rather then in a container
as given below, should the binary be containerized?
Though might miss some cornercases.
## Create a local kind dual stack cluster.
KIND_KUBECONFIG?=./kubeconfig.yaml
K8S_VERSION?=v1.21.14
cluster-create: $(BINDIR)/kubectl $(BINDIR)/kind
# First make sure any previous cluster is deleted
make cluster-destroy
# Create a kind cluster.
$(BINDIR)/kind create cluster \
--config ./deploy/kind-config.yaml \
--kubeconfig $(KIND_KUBECONFIG) \
--image kindest/node:$(K8S_VERSION)
Does this look ok in the case of conditional based on BUILDOS
? (tested on darwin)
$(BINDIR)/kind:
ifeq ($(BUILDOS), darwin)
sh -c "GOBIN=/go/src/$(PACKAGE_NAME)/$(BINDIR) go install sigs.k8s.io/kind"
else
$(CONTAINERIZED) $(CALICO_BUILD) sh -c "GOBIN=/go/src/$(PACKAGE_NAME)/$(BINDIR) go install sigs.k8s.io/kind"
endif
For 3: Added the following due to this issue with sed.
# https://stackoverflow.com/questions/4247068/sed-command-with-i-option-failing-on-mac-but-works-on-linux/4247319#4247319
set-calico-version:
ifeq ($(BUILDOS), darwin)
sed -i '' -e 's/version: .*/version: $(VERSION)/' config/calico_versions.yml
else
sed -i -e 's/version: .*/version: $(VERSION)/' config/calico_versions.yml
endif
make gen-versions-calico
Should go with the following changes?
I'm good with what you're suggesting For 2, though I'll point out that I don't think you should include GOBIN in the command.
Seems reasonable for 3 also.
On another thought, shouldn't adopting nix would solve compatibility issues altogether? Reference: Using Nix with Dockerfiles
I'll point out that I don't think you should include GOBIN in the command
$(BINDIR)/kind:
ifeq ($(BUILDOS), darwin)
sh -c go install sigs.k8s.io/kind"
else
$(CONTAINERIZED) $(CALICO_BUILD) sh -c go install sigs.k8s.io/kind"
endif
Like this?
I will give a PR with these fix then.
I'd guess there is probably no need for the sh -c
either.
(you've got a trailing "
that you'll need to get rid of too)
On another thought, shouldn't adopting nix would solve compatibility issues altogether?
I'm not sure, we still need to build kind
that can work on darwin or linux, does nix help with that?
Does this look ok?
$(BINDIR)/kind:
ifeq ($(BUILDOS), darwin)
go install sigs.k8s.io/kind
else
$(CONTAINERIZED) $(CALICO_BUILD) go install sigs.k8s.io/kind
endif
I'm not sure, we still need to build kind that can work on darwin or linux, does nix help with that?
Being universal, it should. I have used it for consistent environment for building Docker images.
That does not look ok, I didn't notice you were modifying the "non-darwin" command, it should remain what it has been.
Have you tried what you're suggesting for the "darwin" option? It doesn't look like it would work to me. The result of the commands should result in a kind binary (that works on the host system) at $(BINDIR)/kind. You probably do need a GOBIN but it would be different from the "non-darwin" command.
Please put up a PR that you've tested, ensure you run make clean
before testing to make sure that you don't have any binaries that would make it look like everything is working.
Being universal, it should. I have used it for consistent environment for building Docker images.
But this is not building a Docker image, we're installing a binary that is used. So I don't understand how using nix would help us fetch a darwin binary on darwin and a linux binary on linux.
I am currently following the steps outlined in this guide to create a local cluster for development purposes. However, I'm encountering errors when executing kind and kubectl, which are essential for creating and managing the cluster.
The errors I'm facing include:
build/_output/bin/kind: cannot execute binary file
exec format error: ./build/_output/bin/kubectl
Expected Behavior
The cluster should be created when I run
make cluster-create
, followed by interacting with it throughkubectl
Current Behavior
The first step
make cluster-create
fails with the aforementioned errors.Possible Solution
I suspect the issue might be related to compatibility with Apple Silicon (M1). In an attempt to resolve this, I made changes to the Makefile. It sort of got the cluster up and running though followed by an error in a later step:
I am explaining the rationale behind my changes.
make cluster-create
command depends onkubectl
andkind
so those os specific binaries are fetched beforehand.kubectl
curl -L https://storage.googleapis.com/kubernetes-release/release/v1.25.6/bin/linux/$(ARCH)/kubectl
where linux is hardcoded.BUILDOS
variable already present in our makefile but there is no use for it. So I replacedlinux/$(ARCH)/kubectl
with$(BUILDOS)/$(ARCH)/kubectl
to make it OS independent.kind
sh -c "GOBIN=$(CURDIR)/$(BINDIR) go install sigs.k8s.io/kind"
, ensuring that I had a MacOS-compatible kind binary available.$(CONTAINERIZED) $(CALICO_BUILD) sh -c "GOBIN=/go/src/$(PACKAGE_NAME)/$(BINDIR) go install sigs.k8s.io/kind"
kind
binary is downloaded inside the Linux-based Docker container and then stored in thebuild/_output/bin
directory, which is mounted inside the container with read/write permissions. This setup allows the binary to be accessible both inside and outside the container, essentially persisting the Linux distribution for later use.kind create cluster
is invoked, I have akind
binary for linux which is not compatible.kind control plane
somehow as I am getting this error when I run the operator against the local cluster described below?This
KUBECONFIG=./kubeconfig.yaml go run ./ --enable-leader-election=false
command is not working as expected and gives the following error.Am I heading in the right direction or what should I do to run the cluster on Apple Silicon (M1)?
Steps to Reproduce (for bugs)
Context
I am trying to make a local cluster for development purposes.
Your Environment