Closed eugen-nw closed 1 year ago
@eugen-nw Thanks for contacting us, would you please share with us a sample for your pod spec and the virtual kubelet log?
What is pod spec, the Dockerfile?
Could you please instruct on how can I obtain the virtual kubelet log?
What is pod spec, the Dockerfile?
The pod specification yaml file that you used to deploy the workload on virtual kubelet.
Could you please instruct on how can I obtain the virtual kubelet log?
- If you are using virtual kubelet as an addon via AKS, you will find a pod starting with
aci-connector-linux
running inkube-system
namespace by running
kubectl get pod -n kube-system
kubectl logs <aci-connector-linux-POD_NAME> -n kube-system
default
namespace.kubectl get pod -n default
kubectl logs <virtual-kubelet-POD_NAME> -n default
Certainly, please find the files attached. container-deployment.yaml.txt VK.zip
Certainly, please find the files attached. container-deployment.yaml.txt VK.zip Would you please upload the required info to https://gist.github.com/ and share the link with us?
Please see if you can use this gist: https://gist.github.com/eugen-nw/fd4526514a4ab6c3b9d995b0e76e9475 The vk.log files should have 50478 lines.
@eugen-nw I'd like to understand your workload a little more since you mentioned that it was working before. Do you need an IP for the workload running inside windows container? If yes, how was the IP shown before? ACI currently does not support vNet private IP for windows container so only public IP works there.
We recently added a check such that, from VK's perspective, if ACI does not return an IP for the container instance, it treats the Pod not ready because for native K8s, a Pod has to have an IP when its state becomes ready. That being said, to be K8s API compatible, VK has to set public IP request for windows container explicitly (which is not done today).
While thinking about the solution, I am wondering the IP requirement of your use case.
@Fei-Guo Below is the behavior that I am experiencing across two AKS instances which shows that it is possible for VK to obtain the "Running" status of a my Container from ACI. Can't you guys repro this "Pending" status issue on your side?
I have little experience with VK, AKS, Containers,... I did setup only 6 AKS environments over the past 3 years, all use VK to run in ACI. This is the first AKS where I'm experiencing my running Pods to display this "Pending" Status. In another AKS instance that I setup on May 20, 2022 it displays "Running" status. The Container code is identical. The two AKS instances run within the same Azure Subscription so they share ACI.
The characteristics of the VK instance that runs there are below:
Name: virtual-kubelet-virtual-kubelet-aci-for-aks-5f9b8ccbcf-5tn4g
Namespace: default
Priority: 0
Node: aks-agentpool-13538704-vmss000000/10.240.0.4
Start Time: Fri, 20 May 2022 15:41:39 -0700
Labels: app=virtual-kubelet-virtual-kubelet-aci-for-aks
pod-template-hash=5f9b8ccbcf
Annotations: checksum/secret: a9105fe650e6dea3914921605d3181a60b310e4d6bfa58f3625a1aa97742fe9f
Status: Running
IP: 10.240.0.14
IPs:
IP: 10.240.0.14
Controlled By: ReplicaSet/virtual-kubelet-virtual-kubelet-aci-for-aks-5f9b8ccbcf
Containers:
virtual-kubelet-virtual-kubelet-aci-for-aks:
Container ID: containerd://444785f9b5575a9eaa39e38f65709b2947ab6d17ec586da206d5af14cbad72a1
Image: mcr.microsoft.com/oss/virtual-kubelet/virtual-kubelet:1.4.2
Image ID: mcr.microsoft.com/oss/virtual-kubelet/virtual-kubelet@sha256:04761be99f594b109825e50b3fd324bf3f7820f28c1b09c916b64d122ecd29bc
Port: <none>
Host Port: <none>
Command:
virtual-kubelet
Args:
--provider
azure
--namespace
--nodename
virtual-kubelet
--authentication-token-webhook=true
--client-verify-ca
/etc/kubernetes/certs/ca.crt
--no-verify-clients=false
--os
Windows
State: Running
Started: Fri, 20 May 2022 15:41:44 -0700
Ready: True
Restart Count: 0
Environment:
KUBELET_PORT: 10250
APISERVER_CERT_LOCATION: /etc/virtual-kubelet/cert.pem
APISERVER_KEY_LOCATION: /etc/virtual-kubelet/key.pem
VKUBELET_POD_IP: (v1:status.podIP)
VKUBELET_TAINT_KEY: virtual-kubelet.io/provider
VKUBELET_TAINT_VALUE: azure
VKUBELET_TAINT_EFFECT: NoSchedule
ACS_CREDENTIAL_LOCATION: /etc/acs/azure.json
AZURE_TENANT_ID:
AZURE_SUBSCRIPTION_ID:
AZURE_CLIENT_ID:
AZURE_CLIENT_SECRET: <set to the key 'clientSecret' in secret 'virtual-kubelet-virtual-kubelet-aci-for-aks'> Optional: false
ACI_RESOURCE_GROUP:
ACI_REGION:
ACI_EXTRA_USER_AGENT: helm-chart/aks/virtual-kubelet-aci-for-aks/1.4.0
MASTER_URI: https://aks-logistics-dns-ee15e054.hcp.westus.azmk8s.io:443
Mounts:
/etc/acs/azure.json from acs-credential (rw)
/etc/kubernetes/certs from certificates (ro)
/etc/virtual-kubelet from credentials (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4mrtj (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
credentials:
Type: Secret (a volume populated by a Secret)
SecretName: virtual-kubelet-virtual-kubelet-aci-for-aks
Optional: false
certificates:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/certs
HostPathType:
acs-credential:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/azure.json
HostPathType: File
kube-api-access-4mrtj:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
@eugen-nw Thanks for the reply. Actually, my question is what type of your workload is running in the ACI windows instance given that the ACI instance is not configured any vNIC with IP. Does your workload need any inbound/outbound network traffics?
I rarely see a workload that does not need networking in a container deployment. Hence the question.
Our Container gets its input from a Service Bus Queue.
Our Container gets its input from a Service Bus Queue.
How can the application get its input from a Service Bus Queue when the windows ACI instance does not have any networking configured? Do I miss something?
I'm very sorry but I just do not know. Can it access storage unless it does not have any networking?
It works fine on 5 other AKS clusters, only on this one it does not. The exact same code and had been operating this way since day 1.
I start to guess where your questions originate from. I see in VK's log messages like time="2022-11-30T20:58:39Z" level=error msg="failed to retrieve pod aks-aci-boldiq-workforce-gozen-678d4cf57f-c8hjv status from provider" error="IPAddress cannot be nil for container group default-aks-aci-boldiq-workforce-gozen-678d4cf57f-c8hjv" method=PodsTracker.processPodUpdates node=virtual-kubelet operatingSystem=Windows provider=azure watchedNamespace=
Maybe the status retrieval functionality changed in this version of VK? Would you consider looking at that log in the other AKS where Helm chart 1.4.0 works properly? As I said, it is the exact same Container running there as well.
One idea is for you to reach out to the ACI folks, but the error is not on their side since all other Containers display their status properly.
For a twist: it is the VK that starts up each Pod in ACI, right? Is there a slight possibility that in the past whenever VK was encountering Pods that did not have a public IP address, it was assigning each new Pod instance one, so the VK could maintain communications with it? Maybe that feature was eliminated from VK in order to save on the costs of public IP addresses?
Another guess: since the VK instance cannot communicate with the CG running in ACI, could it be that that fact is causing the behavior I reported in #378? VK has no way of telling the CG to die, so it leaves it behind running.
@eugen-nw Yes, the behavior change is due to a recent change in VK to skip pod update if pod does not have an IP. In your case, it seems to be ok for a running ACI instance without IP, but for more broader k8s use cases, we cannot mark Pod ready if it does not have an IP because this will confuse all other components such as service/endpoint controllers. Empty Pod IP is not a problem for ACI since cg state only depends on the state of the running container.
The fix could be one of the following: 1) Force to allocate a public IP for windows containers (it seems that ACI would not charge more for public IP). 2) Go back to the old behavior without checking IP existence.
I was thinking about 1 but I start to wonder how your use case can work without an IP. Can you please check other running windows ACI instances with older VK and see if there are public IP configured there? Note that without an IP, you cannot even do kubectl logs
to retrieve logs from your ACI instance.
I checked our other Container instances running in ACI and none of them has a public IP address. We never needed one. Service Bus is not calling us, but their library running in our software establishes a forever connection w/ Service Bus through which all communications flow.
I think that 2. will be a better scenario. The Producer - Consumer pattern that we're implementing (the Container is the Consumer that picks messages from a Queue) is well known so VK will encounter other Container instances with no public IP.
We scale out several times/day to hundreds of Containers running on top of VK. It would be totally wasteful for us to maintain such large count of public IPs that we do not need for communication purposes.
We can temporally move back to old behavior for windows container and add the check until ACI supports vnet private IP.
Yes, PLEASE move back to the old behavior. Is there an ETA please?
All regions are reverted to 1.4.5, which should not have this issue. This problem will be addressed in 1.4.8, which will be released Jan next year.
Does this mean that if I helm-deploy VK again it will deploy 1.4.5 instead of 1.4.7?
No. It means if you create an AKS cluster with VK enabled, the VK version is 1.4.5. For any existing clusters that enable the VK addon, the VK version is going to be 1.4.5. This does not affect any VK that you installed manually.
The 1.4.7 Windows VK that I installed manually does not work - hence this ticket - so there's no use for me to keep it around. I gather that if I uninstall 1.4.7 and I helm-install again the Windows VK, I will get the 1.4.5 right?
No, you have to use the 1.4.5 helm chart to install 1.4.5 manually, like the instructions mentioned here: https://github.com/virtual-kubelet/azure-aci/blob/master/docs/DOWNGRADE-README.md
@Fei-Guo Rolling back from 1.4.7 to 1.4.5 has removed a change that added a lot of regions for the ACI service: https://github.com/virtual-kubelet/azure-aci/commit/2263e89bbfcf3de4ab06b9976aa58683a419beb1 I can no longer use the ACI service in those regions since this rollback... Will I have to wait until January for this to be resolved as you indicated?
@Fei-Guo Rolling back from 1.4.7 to 1.4.5 has removed a change that added a lot of regions for the ACI service: 2263e89 I can no longer use the ACI service in those regions since this rollback... Will I have to wait until January for this to be resolved as you indicated?
@mishnz kindly you can follow this document to use 1.4.7 for now.
MANY THANKS for 1. providing instructions on how to install a version that does not have this problem 2. removing this functionality in 1.4.8!
I made an attempt to use the Option 1 command line from https://github.com/virtual-kubelet/azure-aci/blob/master/docs/DOWNGRADE-README.md to install the Windows VK. My command line is:
helm install $CHART_NAME $CHART_URL --set provider=azure --set providers.azure.masterUri=$MASTER_URI --set nodeName=$NODE_NAME --set image.repository=$IMG_URL --set image.name=$IMG_REPO --set nodeOsType="Windows" --set image.tag=$IMG_TAG --set nodeOsType="Windows" --set providers.azure.masterUri=$MASTER_URI --set providers.azure.vnet.enabled=$ENABLE_VNET --set providers.azure.vnet.subnetName=$VIRTUAL_NODE_SUBNET_NAME --set providers.azure.vnet.subnetCidr=$VIRTUAL_NODE_SUBNET_RANGE --set providers.azure.vnet.clusterCidr=$CLUSTER_SUBNET_RANGE --set providers.azure.vnet.kubeDnsIp=$KUBE_DNS_IP --set providers.azure.managedIdentityID=$VIRTUALNODE_USER_IDENTITY_CLIENTID
The Pod does not start, it appears that the Image Tag is not resolving correctly, the "image.name" seems to be missing. What should I do differently please in order to install the 1.4.5 Windows VK? Is the --set image.name=$IMG_REPO
option setting IMG_REPO to the right field of "image"?
Also, could the Error: InvalidImageName
message say that the Windows VK is not available?
kdp virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure-5m528
Name: virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure-5m528
Namespace: default
Priority: 0
Node: aks-agentpool-30560331-vmss000000/10.240.0.4
Start Time: Wed, 07 Dec 2022 17:06:19 -0800
Labels: app=virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure-aci
pod-template-hash=7f84fd5c59
Annotations: checksum/secret: cbc42ea74f18df2651df948690ceaab397825bdee31ab8760581014bd93ea5e2
Status: Pending
IP: 10.240.0.110
IPs:
IP: 10.240.0.110
Controlled By: ReplicaSet/virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure-aci-7f84fd5c59
Containers:
virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure-aci:
Container ID:
Image: mcr.microsoft.com/:1.4.5
Image ID:
Port: <none>
Host Port: <none>
Command:
virtual-kubelet
Args:
--provider
azure
--namespace
--nodename
virtual-kubelet
--authentication-token-webhook=true
--client-verify-ca
/etc/kubernetes/certs/ca.crt
--no-verify-clients=false
--os
Windows
State: Waiting
Reason: InvalidImageName
Ready: False
Restart Count: 0
Environment:
KUBELET_PORT: 10250
APISERVER_CERT_LOCATION: /etc/virtual-kubelet/cert.pem
APISERVER_KEY_LOCATION: /etc/virtual-kubelet/key.pem
VKUBELET_POD_IP: (v1:status.podIP)
VKUBELET_TAINT_KEY: virtual-kubelet.io/provider
VKUBELET_TAINT_VALUE: azure
VKUBELET_TAINT_EFFECT: NoSchedule
VIRTUALNODE_USER_IDENTITY_CLIENTID: e3f86d26-b2a5-4f9e-a4c3-b10c08cb4235
AKS_CREDENTIAL_LOCATION: /etc/aks/azure.json
AZURE_TENANT_ID:
AZURE_SUBSCRIPTION_ID:
AZURE_CLIENT_ID:
AZURE_CLIENT_SECRET: <set to the key 'clientSecret' in secret 'virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure-aci'> Optional: false
ACI_RESOURCE_GROUP:
ACI_REGION:
ACI_EXTRA_USER_AGENT: helm-chart/aks/virtual-kubelet-azure-aci/1.4.5
ACI_VNET_SUBSCRIPTION_ID:
ACI_VNET_RESOURCE_GROUP:
ACI_VNET_NAME:
ACI_SUBNET_NAME: virtual-node-aci
ACI_SUBNET_CIDR: 10.241.0.0/16
MASTER_URI: https://aks-workforce-dns-fe064e62.hcp.westus.azmk8s.io:443
CLUSTER_CIDR: 10.0.0.0/16
KUBE_DNS_IP: 10.0.0.10
ENABLE_REAL_TIME_METRICS: true
USE_VK_VERSION_2: true
Mounts:
/etc/aks/azure.json from aks-credential (rw)
/etc/kubernetes/certs from certificates (ro)
/etc/virtual-kubelet from credentials (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-phfxk (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
credentials:
Type: Secret (a volume populated by a Secret)
SecretName: virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure-aci
Optional: false
certificates:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/certs
HostPathType:
aks-credential:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/azure.json
HostPathType: File
kube-api-access-phfxk:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> Successfully assigned default/virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure-5m528 to aks-agentpool-30560331-vmss000000
Warning InspectFailed 7s (x7 over 82s) kubelet, aks-agentpool-30560331-vmss000000 Failed to apply default image tag "mcr.microsoft.com/:1.4.5": couldn't parse image reference "mcr.microsoft.com/:1.4.5": invalid reference format
Warning Failed 7s (x7 over 82s) kubelet, aks-agentpool-30560331-vmss000000 Error: InvalidImageName
Could someone please help me install 1.4.5? As I mentioned above a week ago, the installation of 1.4.5 does not work using the instructions provided.
You can simply edit the deployment
kubectl edit deployment virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure
and fix the image url to
image: virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure:1.4.5
Thanks very much! I did not know that one can edit a deployment. However, I still could not get it to work. What should I do now?
image: virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure:1.4.5
Failed to pull image "virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure:1.4.5": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure:1.4.5": failed to resolve reference "docker.io/library/virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure:1.4.5": pull access denied, repository does not exist
image: mcr.microsoft.com/oss/virtual-kubelet/virtual-kubelet/virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure:1.4.5
Failed to pull image "mcr.microsoft.com/oss/virtual-kubelet/virtual-kubelet/virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure:1.4.5": rpc error: code = NotFound desc = failed to pull and unpack image "mcr.microsoft.com/oss/virtual-kubelet/virtual-kubelet/virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure:1.4.5": failed to resolve reference "mcr.microsoft.com/oss/virtual-kubelet/virtual-kubelet/virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure:1.4.5": mcr.microsoft.com/oss/virtual-kubelet/virtual-kubelet/virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure:1.4.5: not found
image: mcr.microsoft.com/oss/virtual-kubelet/virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure:1.4.5
Failed to pull image "mcr.microsoft.com/oss/virtual-kubelet/virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure:1.4.5": rpc error: code = NotFound desc = failed to pull and unpack image "mcr.microsoft.com/oss/virtual-kubelet/virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure:1.4.5": failed to resolve reference "mcr.microsoft.com/oss/virtual-kubelet/virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure:1.4.5": mcr.microsoft.com/oss/virtual-kubelet/virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure:1.4.5: not found
This attempt found the image but it cannot run it:
image: mcr.microsoft.com/oss/virtual-kubelet/virtual-kubelet:1.4.5
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> Successfully assigned default/virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure-qzv5w to aks-agentpool-30560331-vmss000000
Normal Pulled 10m kubelet, aks-agentpool-30560331-vmss000000 Successfully pulled image "mcr.microsoft.com/oss/virtual-kubelet/virtual-kubelet:1.4.5" in 144.967687ms
Normal Pulled 10m kubelet, aks-agentpool-30560331-vmss000000 Successfully pulled image "mcr.microsoft.com/oss/virtual-kubelet/virtual-kubelet:1.4.5" in 332.131485ms
Normal Pulled 10m kubelet, aks-agentpool-30560331-vmss000000 Successfully pulled image "mcr.microsoft.com/oss/virtual-kubelet/virtual-kubelet:1.4.5" in 139.368742ms
Normal Created 9m50s (x4 over 10m) kubelet, aks-agentpool-30560331-vmss000000 Created container virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure-aci
Normal Started 9m50s (x4 over 10m) kubelet, aks-agentpool-30560331-vmss000000 Started container virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure-aci
Normal Pulled 9m50s kubelet, aks-agentpool-30560331-vmss000000 Successfully pulled image "mcr.microsoft.com/oss/virtual-kubelet/virtual-kubelet:1.4.5" in 167.126528ms
Normal Pulling 9m1s (x5 over 10m) kubelet, aks-agentpool-30560331-vmss000000 Pulling image "mcr.microsoft.com/oss/virtual-kubelet/virtual-kubelet:1.4.5"
Warning BackOff 26s (x48 over 10m) kubelet, aks-agentpool-30560331-vmss000000 Back-off restarting failed container
@eugen-nw the image should be mcr.microsoft.com/oss/virtual-kubelet/virtual-kubelet:1.4.5
@helayoty I just showed above that that particular 1.4.5 image is not working. For extra info:
kubectl logs virtual-kubelet-azure-aci-downgrade-virtual-kubelet-azure-bxwqh
WARNING: Package "github.com/golang/protobuf/protoc-gen-go/generator" is deprecated.
A future release of golang/protobuf will delete this package,
which has long been excluded from the compatibility promise.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xb0 pc=0x14d8591]
goroutine 1 [running]:
github.com/virtual-kubelet/azure-aci/provider.NewACIProvider(0x0, 0x0, 0xc00007cba0, 0x7ffe8ae23a22, 0xf, 0x7ffe8ae23aa5, 0x7, 0xc000042050, 0xb, 0x280a, ...)
/go/src/github.com/virtual-kubelet/azure-aci/provider/aci.go:241 +0x1f1
main.main.func1(0x0, 0x0, 0x7ffe8ae23a22, 0xf, 0x7ffe8ae23aa5, 0x7, 0xc000042050, 0xb, 0x280a, 0x1abf00e, ...)
/go/src/github.com/virtual-kubelet/azure-aci/cmd/virtual-kubelet/main.go:67 +0xc5
github.com/virtual-kubelet/node-cli/internal/commands/root.runRootCommandWithProviderAndClient(0x1d12358, 0xc0000b83c0, 0x1b7d808, 0x1d35bb8, 0xc0000782c0, 0xc0003b6780, 0x0, 0x0)
/go/pkg/mod/github.com/virtual-kubelet/node-cli@v0.7.0/internal/commands/root/root.go:163 +0x8f8
github.com/virtual-kubelet/node-cli/internal/commands/root.runRootCommand(0x1d12358, 0xc0000b83c0, 0xc0004ce830, 0xc0003b6780, 0x0, 0x0)
/go/pkg/mod/github.com/virtual-kubelet/node-cli@v0.7.0/internal/commands/root/root.go:81 +0xfe
github.com/virtual-kubelet/node-cli/internal/commands/root.NewCommand.func1(0xc00013c000, 0xc0002fa000, 0x0, 0xc, 0x0, 0x0)
/go/pkg/mod/github.com/virtual-kubelet/node-cli@v0.7.0/internal/commands/root/root.go:56 +0x50
github.com/spf13/cobra.(*Command).execute(0xc00013c000, 0xc00009ab70, 0xc, 0xc, 0xc00013c000, 0xc00009ab70)
/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842 +0x472
github.com/spf13/cobra.(*Command).ExecuteC(0xc00013c000, 0xc00021e1b0, 0xc00013c2c0, 0xc00013cdc0)
/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887
github.com/spf13/cobra.(*Command).ExecuteContext(...)
/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:880
github.com/virtual-kubelet/node-cli.(*Command).Run(0xc00021e1b0, 0x1d12358, 0xc0000b83c0, 0x0, 0x0, 0x0, 0x0, 0x0)
/go/pkg/mod/github.com/virtual-kubelet/node-cli@v0.7.0/cli.go:170 +0x85
main.main()
/go/src/github.com/virtual-kubelet/azure-aci/cmd/virtual-kubelet/main.go:83 +0x5ff
I "helm uninstall"-ed both 1.4.7 and 1.4.5, installed the 1.4.2 Windows VK. Using the VK version 1.4.2, kubectl get pods
shows the correct state of the Pods that run on the virtual Node that the VK creates.
For anyone wanting to install 1.4.2, please see below the commands I'd used.
$CHART_NAME="virtual-kubelet-azure-aci"
$NODE_NAME="virtual-kubelet"
$CHART_URL="https://github.com/virtual-kubelet/azure-aci/raw/gh-pages/charts/virtual-kubelet-1.4.2.tgz"
kubectl cluster-info
$MASTER_URI="(the Kubernetes master URI from above)" - looks like "https://<cluster name lowercase>-dns-<some identifier>.hcp.westus.azmk8s.io:443"
helm install $CHART_NAME $CHART_URL --set provider=azure --set providers.azure.masterUri=$MASTER_URI --set nodeName=$NODE_NAME --set nodeOsType="Windows"
It's not fair to close this issue because it is present in 1.4.7 and needs to be addressed.
I "helm uninstall"-ed both 1.4.7 and 1.4.5, installed the 1.4.2 Windows VK. Using the VK version 1.4.2,
kubectl get pods
shows the correct state of the Pods that run on the virtual Node that the VK creates.For anyone wanting to install 1.4.2, please see below the commands I'd used.
$CHART_NAME="virtual-kubelet-azure-aci" $NODE_NAME="virtual-kubelet" $CHART_URL="https://github.com/virtual-kubelet/azure-aci/raw/gh-pages/charts/virtual-kubelet-1.4.2.tgz" kubectl cluster-info $MASTER_URI="(the Kubernetes master URI from above)" - looks like "https://<cluster name lowercase>-dns-<some identifier>.hcp.westus.azmk8s.io:443" helm install $CHART_NAME $CHART_URL --set provider=azure --set providers.azure.masterUri=$MASTER_URI --set nodeName=$NODE_NAME --set nodeOsType="Windows"
@eugen-nw Thanks for pointing that out. we figured the .tgz
binaries were the issue and we fixed it for all releases. Would you kindly try to run either 1.4.5
or 1.4.7
again?
A new version 1.4.8
has been released today that will address this issue. You can use it by installing the helm chart in your cluster. This version will be available as a default addon for the Virtual Node by Jan 2023.
https://github.com/virtual-kubelet/azure-aci/releases/tag/1.4.8
Describe the Issue After deployments,
kubectl get pods
displays the Pods indefinitely in the 'Pending' state. In the Azure Portal, ACI shows them 'Running' after minutes and I do see there activity being logged. The Containers do produce the work we expect them to do.Steps To Reproduce Windows containers, running on the proper
--set nodeOsType="Windows"
VKExpected behavior To display the Pods as Running once they're started up.
Virtual-kubelet version helm-chart/aks/virtual-kubelet-azure-aci/1.4.7
azure-aci plugin version Not certain what is this or how to obtain the information.
Kubernetes version AKS 1.21.9
Additional context I've used the VK since at least 2019 but had never had this experience.