rootless-containers / usernetes

Kubernetes without the root privileges
https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2033-kubelet-in-userns-aka-rootless
Apache License 2.0
865 stars 58 forks source link

Question: support for multiple hosts? #281

Closed vsoch closed 1 year ago

vsoch commented 1 year ago

Hi! I have a simple question I didn't see obviously in the README or doing a quick search - does usernetes support multiple hosts or does it assume running on one host? I saw it is using slirp4netns. which seems to be the same (and thus might result in the same problem) as I'm hitting with k3s https://github.com/k3s-io/k3s/discussions/7615#discussioncomment-6016006. I also see that k3s uses usernetes? So maybe it's exactly the same problem!

Thanks for your help!

AkihiroSuda commented 1 year ago

Yes although complicated. Here is an example: https://github.com/rootless-containers/usernetes/tree/v20230518.0#multi-node-docker-compose

vsoch commented 1 year ago

Hey! Just wanted to give a quick update because it's been a long time. I was finally able to get over some VM hurdles on GCP (with the kernel modules loading and the uid/gid setup) and now I have a terraform build that can (on one node) install and run kubectl to get that node. I'm looking at https://github.com/rootless-containers/usernetes/blob/v20230518.0/docker-compose.yml and if I understand this, I think I need to generate the certificates (to be seen by all nodes) and then run different commands on different nodes. Hopefully will make some time this week!

vsoch commented 1 year ago

okay trying to reproduce what I see in the docker-compose! Here is the batch script - basically this gets run under one job and then you get usernetes running under an allocation.

#!/bin/bash

# Final steps to setting up usernetes
# These steps will vary based on the hostname

# We also need the node names - not ideal, but for now they are predictable so it works
node_master=gffw-compute-a-001
node_crio=gffw-compute-a-002
node_containerd=gffw-compute-a-003

# What node is running this?
nodename=$(hostname)

# Install usernetes on all nodes (fuse3 and wget are already installed)
wget https://github.com/rootless-containers/usernetes/releases/download/v20230518.0/usernetes-x86_64.tbz
tar xjvf usernetes-x86_64.tbz
cd usernetes

# Run this on the main login node - since it's shared we only need 
# to generate the certs once.
if [[ "$nodename" == *"001"* ]]; then

    echo "I am ${nodename} going to run the master stuff"
    /bin/bash ./common/cfssl.sh --dir=/home/$USER/.config/usernetes --master=${node_master} --node=${node_crio} --node=${node_containerd}

    # 2379/tcp: etcd, 6443/tcp: kube-apiserver
    /bin/bash ./install.sh --wait-init-certs --start=u7s-master-with-etcd.target --cidr=10.0.100.0/24 --publish=0.0.0.0:2379:2379/tcp --publish=0.0.0.0:6443:6443/tcp --cni=flannel --cri=crio

fi

# The first compute node runs crio
if [[ "$nodename" == *"${node_crio}"* ]]; then

    echo "I am compute node ${nodename} going to run crio"
    # 10250/tcp: kubelet, 8472/udp: flannel
    /bin/bash ./install.sh --wait-init-certs --start=u7s-node.target --cidr=10.0.101.0/24 --publish=0.0.0.0:10250:10250/tcp --publish=0.0.0.0:8472:8472/udp --cni=flannel --cri=crio

fi

# The second compute node runs crio
if [[ "$nodename" == *"${node_containerd}"* ]]; then

    echo "I am compute node ${nodename} going to run containerd"

    # 10250/tcp: kubelet, 8472/udp: flannel
    /bin/bash ./install.sh --wait-init-certs --start=u7s-node.target --cidr=10.0.102.0/24 --publish=0.0.0.0:10250:10250/tcp --publish=0.0.0.0:8472:8472/udp --cni=flannel --cri=containerd

fi

The first (master) runs OK to generate certs and the second part (the .config in $HOME is a shared NFS filesystem). I'm debugging the second node (crio) and the first issue I ran into is this line in install.sh:

- U7S_ROOTLESSKIT_PORTS=${publish}
+ U7S_ROOTLESSKIT_PORTS="${publish}"

As is, it results in an invalid formatting:

- U7S_ROOTLESSKIT_PORTS= 0.0.0.0:10250:10250/tcp 0.0.0.0:8472:8472/udp
+ U7S_ROOTLESSKIT_PORTS=" 0.0.0.0:10250:10250/tcp 0.0.0.0:8472:8472/udp"

When I add quotes around that (assuming it's OK) and try again, it fails again, and I trace it to this:

$ $HOME/usernetes/boot/kube-proxy.sh
[INFO] Entering RootlessKit namespaces: OK
E0717 20:40:10.637050     262 server.go:494] "Error running ProxyServer" err="stat $HOME/.config/usernetes/node/kube-proxy.kubeconfig: no such file or directory"
E0717 20:40:10.637097     262 run.go:74] "command failed" err="stat $HOME/.config/usernetes/node/kube-proxy.kubeconfig: no such file or directory"

Where is that kube proxy kubeconfig supposed to be generated? Does any of this look familiar / in the right (or wrong) direction and can you advise @AkihiroSuda ?

AkihiroSuda commented 1 year ago

👍

kube-proxy.kubeconfig

https://github.com/rootless-containers/usernetes/blob/58df6ea63cc4a00425b80a088889015eedc96320/common/cfssl.sh#L131 https://github.com/rootless-containers/usernetes/blob/58df6ea63cc4a00425b80a088889015eedc96320/common/cfssl.sh#L209-L211

vsoch commented 1 year ago

Ah this is helpful! So it seems the issue is that the setup is looking for the file to be under a "node" directory but it's generated under a name that is explicitly for the hostname:

$ find . -name kube-proxy*
./.config/usernetes/master/kube-proxy.pem
./.config/usernetes/master/kube-proxy-key.pem
./.config/usernetes/master/kube-proxy.csr
./.config/usernetes/master/kube-proxy.kubeconfig
./.config/usernetes/nodes.gffw-compute-a-002/kube-proxy.kubeconfig
./.config/usernetes/nodes.gffw-compute-a-003/kube-proxy.kubeconfig

The above script looks correct (it generates the files above) but the error must be in install.sh that is assuming a directory called "node" that I don't see.

$ ls .config/usernetes/
containers  env     nodes.gffw-compute-a-002
crio        master  nodes.gffw-compute-a-003
vsoch commented 1 year ago

hey @AkihiroSuda I see the issue! The script does a bind of the appropriate certs directory (either for crio or the other one) always to "node" - could we please expose these paths as a variable (that each default to node, expecting the docker-compose setup?) I think I'm getting close to this working - I was able to get all services started and at least list the main master node, and I'd like to test fresh with this fix.

vsoch commented 1 year ago

Here are the places I see it hard coded (docker-compose removed since that should stay!)

[sochat1_llnl_gov@gffw-compute-a-001 usernetes]$ grep -R usernetes/node
boot/.nfs000000001700008d00000003:      --etcd-endpoints https://$(cat $XDG_CONFIG_HOME/usernetes/node/master):2379 \
boot/kube-proxy.sh:  kubeconfig: "$XDG_CONFIG_HOME/usernetes/node/kube-proxy.kubeconfig"
boot/kubelet.sh:    clientCAFile: "$XDG_CONFIG_HOME/usernetes/node/ca.pem"
boot/kubelet.sh:tlsCertFile: "$XDG_CONFIG_HOME/usernetes/node/node.pem"
boot/kubelet.sh:tlsPrivateKeyFile: "$XDG_CONFIG_HOME/usernetes/node/node-key.pem"
boot/kubelet.sh:        --kubeconfig $XDG_CONFIG_HOME/usernetes/node/node.kubeconfig \
boot/flanneld.sh:       --etcd-endpoints https://$(cat $XDG_CONFIG_HOME/usernetes/node/master):2379 \
install.sh:             if [[ -f ${config_dir}/usernetes/node/done || -f ${config_dir}/usernetes/master/done ]]; then
install.sh:     cp -r "${cfssldir}/nodes.$node" ${config_dir}/usernetes/node

The second to last line looks promising, but I looked closer and this is intended for a single node cluster (so the condition isn't hit, and I suspect this is what the docker-compose might use?)

if [[ -n "$wait_init_certs" ]]; then
        max_trial=300
        INFO "Waiting for certs to be created.":
        for ((i = 0; i < max_trial; i++)); do
                if [[ -f ${config_dir}/usernetes/node/done || -f ${config_dir}/usernetes/master/done ]]; then
                        echo "OK"
                        break
                fi
                echo -n .
                sleep 5
        done
elif [[ ! -d ${config_dir}/usernetes/master ]]; then
        ### If the keys are not generated yet, generate them for the single-node cluster
        INFO "Generating single-node cluster TLS keys (${config_dir}/usernetes/{master,node})"
        cfssldir=$(mktemp -d /tmp/cfssl.XXXXXXXXX)
        master=127.0.0.1
        node=$(hostname)
        ${base}/common/cfssl.sh --dir=${cfssldir} --master=$master --node=$node,127.0.0.1
        rm -rf ${config_dir}/usernetes/{master,node}
        cp -r "${cfssldir}/master" ${config_dir}/usernetes/master
        cp -r "${cfssldir}/nodes.$node" ${config_dir}/usernetes/node
        rm -rf "${cfssldir}"
fi

Going to blow everything up and start from the beginning again tomorrow, just to try and reproduce at least the services all starting. I think this issue here is that I got it working for a single node setup but actually have different machines.

vsoch commented 1 year ago

Tested again - it seems that the issue (aside from that "node" directory under the .configs and the typo in install.sh that there are missing quotes around ${publish} is that unless I start crio/containerd on that same master node, the cluster doesn't hook up. The setup I created is here: https://github.com/converged-computing/flux-terraform-gcp/tree/usernetes/examples/usernetes and more specifically after doing a clone, copying the .config directory, I am following the logic here: https://github.com/converged-computing/flux-terraform-gcp/blob/usernetes/examples/usernetes/scripts/batch.sh. Note that the only reason the README logs seem to be working is because when I first ran them, I had the manually started all the different services on the master node (and then I saw it come up under kubectl get nodes. When I try to follow the logic / commands from the docker-compose, I see the various K8s objects created, but it hangs and times out on the last part. Starting the other two nodes (crio and containerd) akin to the docker-compose doesn't have obvious errors, but kubectl get nodes still doesn't work. And sometimes I see warnings:

[WARNING] Kernel module x_tables not loaded
[WARNING] Kernel module xt_MASQUERADE not loaded
[WARNING] Kernel module xt_tcpudp not loaded

But not always! I'm out of ideas, I hope a maintainer here can advise. Thank you!

vsoch commented 1 year ago

hey @AkihiroSuda we are really interested in this use case, and I can offer to help. Can we step back and talk about what would be needed to get this working on different nodes? Would you have time to look at the above (what we have closer to working) and talk about maybe a step 1 we can take to get this working? E.g., I'd say maybe we could start by tweaking the scripts so they aren't hard coded for docker compose. What do you think?

AkihiroSuda commented 1 year ago

I see the various K8s objects created, but it hangs and times out on the last part.

Any error in the logs of containerd, kubelet, kube-apiserver?

vsoch commented 1 year ago

I will bring this up and report back! Aside from the console, I'm guessing I can find logs by poking around the usernetes directory in the user home.

AkihiroSuda commented 1 year ago

journalctl --user --no-pager -f should be useful to get the logs

vsoch commented 1 year ago

okay on the node 001 (the first master node) I am running the first part of script/batch.sh in my linked PR. This part:

echo "I am ${nodename} going to run the master stuff"
/bin/bash ./common/cfssl.sh --dir=/home/$USER/.config/usernetes --master=${node_master} --node=${node_crio} --node=${node_containerd}

# The script /home/sochat1_llnl_gov/usernetes/boot/kube-proxy.sh is asking for a non-existent 
# "$XDG_CONFIG_HOME/usernetes/node/kube-proxy.kubeconfig", so we are going to arbitrarily make it
# I did a diff of the two kube-proxy.kubectl and they are the same
cp -R ~/.config/usernetes/nodes.$node_crio ~/.config/usernetes/node

# 2379/tcp: etcd, 6443/tcp: kube-apiserver
# This first install will timeout because configs are missing, but we need to generate the first ones!
/bin/bash ./install.sh --wait-init-certs --start=u7s-master-with-etcd.target --cidr=10.0.100.0/24 --publish=0.0.0.0:2379:2379/tcp --publish=0.0.0.0:6443:6443/tcp --cni=flannel --cri=

And here is the first timeout:

[INFO] Installing CoreDNS
+ sleep 3
+ kubectl get nodes -o wide
No resources found
+ kubectl apply -f /home/sochat1_llnl_gov/usernetes/manifests/coredns.yaml
serviceaccount/coredns created
clusterrole.rbac.authorization.k8s.io/system:coredns created
clusterrolebinding.rbac.authorization.k8s.io/system:coredns created
configmap/coredns created
deployment.apps/coredns created
service/kube-dns created
+ set +x
[INFO] Waiting for CoreDNS pods to be available
+ sleep 3
+ kubectl -n kube-system wait --for=condition=ready pod -l k8s-app=kube-dns
timed out waiting for the condition on pods/coredns-8557665db-mb5wt
timed out waiting for the condition on pods/coredns-8557665db-qzjq9

I don't see any logs with that command:

$ sudo journalctl --user --no-pager -f  
No journal files were found.

And without sudo there is not sufficient permissions.

vsoch commented 1 year ago

Here is what the systemctl shows:

$ systemctl --user --all --no-pager list-units 'u7s-*'
  UNIT                                LOAD      ACTIVE   SUB     DESCRIPTION                                      
  u7s-etcd.service                    loaded    active   running Usernetes etcd service                           
  u7s-kube-apiserver.service          loaded    active   running Usernetes kube-apiserver service                 
  u7s-kube-controller-manager.service loaded    active   running Usernetes kube-controller-manager service        
  u7s-kube-scheduler.service          loaded    active   running Usernetes kube-scheduler service                 
  u7s-rootlesskit.service             loaded    active   running Usernetes RootlessKit service                    
  u7s-etcd.target                     loaded    active   active  Usernetes target for etcd                        
  u7s-master-with-etcd.target         loaded    active   active  Usernetes target for Kubernetes master components
  u7s-master.target                   loaded    active   active  Usernetes target for Kubernetes master components
● u7s-node.target                     not-found inactive dead    u7s-node.target                                  

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

9 loaded units listed.
To show all installed unit files use 'systemctl list-unit-files'.
vsoch commented 1 year ago

The containerd node:

$     /bin/bash ./install.sh --wait-init-certs --start=u7s-node.target --cidr=10.0.102.0/24 --publish=0.0.0.0:10250:10250/tcp --publish=0.0.0.0:8472:8472/udp --cni=flannel --cri=containerd
[INFO] Rootless cgroup (v2) is supported
[WARNING] Kernel module x_tables not loaded
[WARNING] Kernel module xt_MASQUERADE not loaded
[WARNING] Kernel module xt_tcpudp not loaded
[INFO] Waiting for certs to be created.:
OK
[INFO] Base dir: /home/sochat1_llnl_gov/usernetes
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s.target
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-master-with-etcd.target
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-rootlesskit.service
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-etcd.target
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-etcd.service
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-master.target
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-kube-apiserver.service
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-kube-controller-manager.service
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-kube-scheduler.service
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-node.target
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-containerd-fuse-overlayfs-grpc.service
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-kubelet-containerd.service
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-kube-proxy.service
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-flanneld.service
[INFO] Starting u7s-node.target
+ systemctl --user -T enable u7s-node.target
Created symlink /home/sochat1_llnl_gov/.config/systemd/user/u7s.target.wants/u7s-node.target → /home/sochat1_llnl_gov/.config/systemd/user/u7s-node.target.
+ systemctl --user -T start u7s-node.target
Enqueued anchor job 19 u7s-node.target/start.
Enqueued auxiliary job 32 u7s-kubelet-containerd.service/start.
Enqueued auxiliary job 29 u7s-rootlesskit.service/start.
Enqueued auxiliary job 20 u7s-containerd-fuse-overlayfs-grpc.service/start.
Enqueued auxiliary job 30 u7s-flanneld.service/start.
Enqueued auxiliary job 31 u7s-kube-proxy.service/start.

real    0m1.793s
user    0m0.001s
sys 0m0.003s
+ systemctl --user --all --no-pager list-units 'u7s-*'
UNIT                                       LOAD   ACTIVE   SUB     DESCRIPTION                                                       
u7s-containerd-fuse-overlayfs-grpc.service loaded active   running Usernetes containerd-fuse-overlayfs-grpc service                  
u7s-etcd.service                           loaded inactive dead    Usernetes etcd service                                            
u7s-flanneld.service                       loaded active   running Usernetes flanneld service                                        
u7s-kube-apiserver.service                 loaded inactive dead    Usernetes kube-apiserver service                                  
u7s-kube-controller-manager.service        loaded inactive dead    Usernetes kube-controller-manager service                         
u7s-kube-proxy.service                     loaded active   running Usernetes kube-proxy service                                      
u7s-kube-scheduler.service                 loaded inactive dead    Usernetes kube-scheduler service                                  
u7s-kubelet-containerd.service             loaded active   running Usernetes kubelet service (containerd)                            
u7s-rootlesskit.service                    loaded active   running Usernetes RootlessKit service (containerd)                        
u7s-etcd.target                            loaded inactive dead    Usernetes target for etcd                                         
u7s-master-with-etcd.target                loaded inactive dead    Usernetes target for Kubernetes master components (including etcd)
u7s-master.target                          loaded inactive dead    Usernetes target for Kubernetes master components                 
u7s-node.target                            loaded active   active  Usernetes target for Kubernetes node components (containerd)      

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

13 loaded units listed.
To show all installed unit files use 'systemctl list-unit-files'.
+ set +x
[INFO] Installation complete.
[INFO] Hint: `sudo loginctl enable-linger` to start user services automatically on the system start up.

And the node for crio:

$     /bin/bash ./install.sh --wait-init-certs --start=u7s-node.target --cidr=10.0.101.0/24 --publish=0.0.0.0:10250:10250/tcp --publish=0.0.0.0:8472:8472/udp --cni=flannel --cri=crio
[INFO] Rootless cgroup (v2) is supported
[WARNING] Kernel module x_tables not loaded
[WARNING] Kernel module xt_MASQUERADE not loaded
[WARNING] Kernel module xt_tcpudp not loaded
[INFO] Waiting for certs to be created.:
OK
[INFO] Base dir: /home/sochat1_llnl_gov/usernetes
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s.target
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-master-with-etcd.target
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-rootlesskit.service
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-etcd.target
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-etcd.service
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-master.target
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-kube-apiserver.service
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-kube-controller-manager.service
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-kube-scheduler.service
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-node.target
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-kubelet-crio.service
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-kube-proxy.service
[INFO] Installing /home/sochat1_llnl_gov/.config/systemd/user/u7s-flanneld.service
[INFO] Starting u7s-node.target
+ systemctl --user -T enable u7s-node.target
+ systemctl --user -T start u7s-node.target
Enqueued anchor job 10 u7s-node.target/start.
Enqueued auxiliary job 22 u7s-flanneld.service/start.
Enqueued auxiliary job 21 u7s-rootlesskit.service/start.
Enqueued auxiliary job 11 u7s-kube-proxy.service/start.
Enqueued auxiliary job 13 u7s-kubelet-crio.service/start.

real    0m1.588s
user    0m0.003s
sys 0m0.002s
+ systemctl --user --all --no-pager list-units 'u7s-*'
UNIT                                LOAD   ACTIVE   SUB     DESCRIPTION                                                       
u7s-etcd.service                    loaded inactive dead    Usernetes etcd service                                            
u7s-flanneld.service                loaded active   running Usernetes flanneld service                                        
u7s-kube-apiserver.service          loaded inactive dead    Usernetes kube-apiserver service                                  
u7s-kube-controller-manager.service loaded inactive dead    Usernetes kube-controller-manager service                         
u7s-kube-proxy.service              loaded active   running Usernetes kube-proxy service                                      
u7s-kube-scheduler.service          loaded inactive dead    Usernetes kube-scheduler service                                  
u7s-kubelet-crio.service            loaded active   running Usernetes kubelet service (crio)                                  
u7s-rootlesskit.service             loaded active   running Usernetes RootlessKit service (crio)                              
u7s-etcd.target                     loaded inactive dead    Usernetes target for etcd                                         
u7s-master-with-etcd.target         loaded inactive dead    Usernetes target for Kubernetes master components (including etcd)
u7s-master.target                   loaded inactive dead    Usernetes target for Kubernetes master components                 
u7s-node.target                     loaded active   active  Usernetes target for Kubernetes node components (crio)            

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

12 loaded units listed.
To show all installed unit files use 'systemctl list-unit-files'.
+ set +x
[INFO] Installation complete.
[INFO] Hint: `sudo loginctl enable-linger` to start user services automatically on the system start up.
[sochat1_llnl_gov@gffw-compute-a-003 usernetes]$     sudo loginctl enable-linger
vsoch commented 1 year ago

I can leave this up a little bit if you want to tell me where to look!

AkihiroSuda commented 1 year ago

And without sudo there is not sufficient permissions.

What is the error and what is your distro?

vsoch commented 1 year ago

permissions (why I added sudo)

$ journalctl --user --no-pager -f
Hint: You are currently not seeing messages from the system.
      Users in the 'systemd-journal' group can see all messages. Pass -q to
      turn off this notice.
No journal files were opened due to insufficient permissions.

This is Rocky Linux 8

NAME="Rocky Linux"
VERSION="8.8 (Green Obsidian)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Rocky Linux 8.8 (Green Obsidian)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:8:GA"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2029-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-8"
ROCKY_SUPPORT_PRODUCT_VERSION="8.8"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.8"
AkihiroSuda commented 1 year ago

Users in the 'systemd-journal' group can see all messages.

Can you try adding yourself to this group?

vsoch commented 1 year ago

I already did, it doesn't change anything.

$ echo $USER
sochat1_llnl_gov
[sochat1_llnl_gov@gffw-compute-a-001 usernetes]$ sudo usermod -a -G systemd-journal $USER
[sochat1_llnl_gov@gffw-compute-a-001 usernetes]$ journalctl --user --no-pager -f  
Hint: You are currently not seeing messages from the system.
      Users in the 'systemd-journal' group can see all messages. Pass -q to
      turn off this notice.
No journal files were opened due to insufficient permissions.
[sochat1_llnl_gov@gffw-compute-a-001 usernetes]$ 
vsoch commented 1 year ago

okay logged out and in! That message went away, but still no logs:

$ journalctl --user --no-pager -f  
No journal files were found.
vsoch commented 1 year ago

I think at least found them, and a way to read a file:

$ journalctl --file /run/log/journal/$(cat /etc/machine-id)/system.journal
-- Logs begin at Fri 2023-08-04 02:24:17 UTC, end at Fri 2023-08-04 02:24:17 UTC. --
Aug 04 02:24:17 gffw-compute-a-001 systemd-journald[10040]: Journal started
Aug 04 02:24:17 gffw-compute-a-001 systemd-journald[10040]: Runtime journal (/run/log/journal/a699437c101cde4ba34d>
Aug 04 02:24:17 gffw-compute-a-001 audit[10040]: EVENT_LISTENER pid=10040 uid=0 auid=501043911 tty=pts0 ses=5 subj>
Aug 04 02:24:17 gffw-compute-a-001 audit[10040]: SYSCALL arch=c000003e syscall=49 success=yes exit=0 a0=8 a1=5572f>
Aug 04 02:24:17 gffw-compute-a-001 audit: PROCTITLE proctitle="/usr/lib/systemd/systemd-journald"
Aug 04 02:24:17 gffw-compute-a-001 audit: CONFIG_CHANGE op=set audit_enabled=1 old=1 auid=501043911 ses=5 subj=unc>
Aug 04 02:24:17 gffw-compute-a-001 audit[10040]: SYSCALL arch=c000003e syscall=46 success=yes exit=60 a0=8 a1=7ffe>
Aug 04 02:24:17 gffw-compute-a-001 audit: PROCTITLE proctitle="/usr/lib/systemd/systemd-journald"
vsoch commented 1 year ago

Here is one of the weirdly named files:

```console -- Logs begin at Fri 2023-08-04 01:51:48 UTC, end at Fri 2023-08-04 02:24:28 UTC. -- Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Linux version 4.18.0-477.15.1.el8_8.cloud.x86_64 (mockbuild@iad1-prod-build001.bld.e> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Command line: BOOT_IMAGE=(hd0,gpt2)/boot/vmlinuz-4.18.0-477.15.1.el8_8.cloud.x86_64 > Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[5]: 960, xstate_sizes[5]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]: 512 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Enabled xstate features 0xff, context size is 2560 bytes, using 'compacted'> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: signal: max sigframe size: 3632 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-provided physical RAM map: Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000001000-0x0000000000054fff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000055000-0x000000000005ffff] reserved lines 1-23...skipping... -- Logs begin at Fri 2023-08-04 01:51:48 UTC, end at Fri 2023-08-04 02:24:28 UTC. -- Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Linux version 4.18.0-477.15.1.el8_8.cloud.x86_64 (mockbuild@iad1-prod-build001.bld.e> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Command line: BOOT_IMAGE=(hd0,gpt2)/boot/vmlinuz-4.18.0-477.15.1.el8_8.cloud.x86_64 > Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[5]: 960, xstate_sizes[5]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]: 512 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Enabled xstate features 0xff, context size is 2560 bytes, using 'compacted'> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: signal: max sigframe size: 3632 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-provided physical RAM map: Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000001000-0x0000000000054fff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000055000-0x000000000005ffff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000060000-0x0000000000097fff] usable lines 1-24...skipping... -- Logs begin at Fri 2023-08-04 01:51:48 UTC, end at Fri 2023-08-04 02:24:28 UTC. -- Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Linux version 4.18.0-477.15.1.el8_8.cloud.x86_64 (mockbuild@iad1-prod-build001.bld.e> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Command line: BOOT_IMAGE=(hd0,gpt2)/boot/vmlinuz-4.18.0-477.15.1.el8_8.cloud.x86_64 > Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[5]: 960, xstate_sizes[5]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]: 512 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Enabled xstate features 0xff, context size is 2560 bytes, using 'compacted'> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: signal: max sigframe size: 3632 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-provided physical RAM map: Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000001000-0x0000000000054fff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000055000-0x000000000005ffff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000060000-0x0000000000097fff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000098000-0x000000000009ffff] reserved lines 1-25...skipping... -- Logs begin at Fri 2023-08-04 01:51:48 UTC, end at Fri 2023-08-04 02:24:28 UTC. -- Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Linux version 4.18.0-477.15.1.el8_8.cloud.x86_64 (mockbuild@iad1-prod-build001.bld.e> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Command line: BOOT_IMAGE=(hd0,gpt2)/boot/vmlinuz-4.18.0-477.15.1.el8_8.cloud.x86_64 > Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[5]: 960, xstate_sizes[5]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]: 512 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Enabled xstate features 0xff, context size is 2560 bytes, using 'compacted'> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: signal: max sigframe size: 3632 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-provided physical RAM map: Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000001000-0x0000000000054fff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000055000-0x000000000005ffff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000060000-0x0000000000097fff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000098000-0x000000000009ffff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bf8ecfff] usable lines 1-26...skipping... -- Logs begin at Fri 2023-08-04 01:51:48 UTC, end at Fri 2023-08-04 02:24:28 UTC. -- Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Linux version 4.18.0-477.15.1.el8_8.cloud.x86_64 (mockbuild@iad1-prod-build001.bld.e> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Command line: BOOT_IMAGE=(hd0,gpt2)/boot/vmlinuz-4.18.0-477.15.1.el8_8.cloud.x86_64 > Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[5]: 960, xstate_sizes[5]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]: 512 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Enabled xstate features 0xff, context size is 2560 bytes, using 'compacted'> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: signal: max sigframe size: 3632 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-provided physical RAM map: Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000001000-0x0000000000054fff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000055000-0x000000000005ffff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000060000-0x0000000000097fff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000098000-0x000000000009ffff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bf8ecfff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x00000000bf8ed000-0x00000000bfb6cfff] reserved lines 1-27...skipping... -- Logs begin at Fri 2023-08-04 01:51:48 UTC, end at Fri 2023-08-04 02:24:28 UTC. -- Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Linux version 4.18.0-477.15.1.el8_8.cloud.x86_64 (mockbuild@iad1-prod-build001.bld.e> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Command line: BOOT_IMAGE=(hd0,gpt2)/boot/vmlinuz-4.18.0-477.15.1.el8_8.cloud.x86_64 > Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[5]: 960, xstate_sizes[5]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]: 512 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Enabled xstate features 0xff, context size is 2560 bytes, using 'compacted'> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: signal: max sigframe size: 3632 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-provided physical RAM map: Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000001000-0x0000000000054fff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000055000-0x000000000005ffff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000060000-0x0000000000097fff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000098000-0x000000000009ffff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bf8ecfff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x00000000bf8ed000-0x00000000bfb6cfff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x00000000bfb6d000-0x00000000bfb7efff] ACPI data lines 1-28...skipping... -- Logs begin at Fri 2023-08-04 01:51:48 UTC, end at Fri 2023-08-04 02:24:28 UTC. -- Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Linux version 4.18.0-477.15.1.el8_8.cloud.x86_64 (mockbuild@iad1-prod-build001.bld.e> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Command line: BOOT_IMAGE=(hd0,gpt2)/boot/vmlinuz-4.18.0-477.15.1.el8_8.cloud.x86_64 > Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[5]: 960, xstate_sizes[5]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]: 512 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Enabled xstate features 0xff, context size is 2560 bytes, using 'compacted'> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: signal: max sigframe size: 3632 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-provided physical RAM map: Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000001000-0x0000000000054fff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000055000-0x000000000005ffff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000060000-0x0000000000097fff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000098000-0x000000000009ffff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bf8ecfff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x00000000bf8ed000-0x00000000bfb6cfff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x00000000bfb6d000-0x00000000bfb7efff] ACPI data Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x00000000bfb7f000-0x00000000bfbfefff] ACPI NVS lines 1-29...skipping... -- Logs begin at Fri 2023-08-04 01:51:48 UTC, end at Fri 2023-08-04 02:24:28 UTC. -- Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Linux version 4.18.0-477.15.1.el8_8.cloud.x86_64 (mockbuild@iad1-prod-build001.bld.e> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Command line: BOOT_IMAGE=(hd0,gpt2)/boot/vmlinuz-4.18.0-477.15.1.el8_8.cloud.x86_64 > Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[5]: 960, xstate_sizes[5]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]: 512 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Enabled xstate features 0xff, context size is 2560 bytes, using 'compacted'> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: signal: max sigframe size: 3632 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-provided physical RAM map: Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000001000-0x0000000000054fff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000055000-0x000000000005ffff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000060000-0x0000000000097fff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000098000-0x000000000009ffff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bf8ecfff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x00000000bf8ed000-0x00000000bfb6cfff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x00000000bfb6d000-0x00000000bfb7efff] ACPI data Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x00000000bfb7f000-0x00000000bfbfefff] ACPI NVS Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x00000000bfbff000-0x00000000bffdffff] usable lines 1-30...skipping... -- Logs begin at Fri 2023-08-04 01:51:48 UTC, end at Fri 2023-08-04 02:24:28 UTC. -- Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Linux version 4.18.0-477.15.1.el8_8.cloud.x86_64 (mockbuild@iad1-prod-build001.bld.e> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Command line: BOOT_IMAGE=(hd0,gpt2)/boot/vmlinuz-4.18.0-477.15.1.el8_8.cloud.x86_64 > Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[5]: 960, xstate_sizes[5]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]: 512 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Enabled xstate features 0xff, context size is 2560 bytes, using 'compacted'> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: signal: max sigframe size: 3632 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-provided physical RAM map: Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000001000-0x0000000000054fff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000055000-0x000000000005ffff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000060000-0x0000000000097fff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000098000-0x000000000009ffff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bf8ecfff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x00000000bf8ed000-0x00000000bfb6cfff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x00000000bfb6d000-0x00000000bfb7efff] ACPI data Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x00000000bfb7f000-0x00000000bfbfefff] ACPI NVS Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x00000000bfbff000-0x00000000bffdffff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000100000000-0x000000083fffffff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: NX (Execute Disable) protection: active Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: efi: EFI v2.70 by EDK II Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: efi: TPMFinalLog=0xbfbf7000 ACPI=0xbfb7e000 ACPI 2.0=0xbfb7e014 SMBIOS=0xbf9ca00> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: secureboot: Secure boot disabled Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: SMBIOS 2.4 present. Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: DMI: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2023 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Hypervisor detected: KVM Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: kvm-clock: Using msrs 4b564d01 and 4b564d00 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: kvm-clock: using sched offset of 8214383053 cycles Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: tsc: Detected 3100.304 MHz processor Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: e820: remove [mem 0x000a0000-0x000fffff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: last_pfn = 0x840000 max_arch_pfn = 0x400000000 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: MTRR default type: write-back Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: MTRR fixed ranges enabled: Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: 00000-9FFFF write-back Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: A0000-FFFFF uncachable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: MTRR variable ranges enabled: Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: 0 base 0000C0000000 mask 3FFFC0000000 uncachable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: 1 disabled Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: 2 disabled Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: 3 disabled Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: 4 disabled Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: 5 disabled Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: 6 disabled Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: 7 disabled Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: last_pfn = 0xbffe0 max_arch_pfn = 0x400000000 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Using GB pages for direct mapping Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BRK [0x233c01000, 0x233c01fff] PGTABLE Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BRK [0x233c02000, 0x233c02fff] PGTABLE Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BRK [0x233c03000, 0x233c03fff] PGTABLE Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BRK [0x233c04000, 0x233c04fff] PGTABLE Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BRK [0x233c05000, 0x233c05fff] PGTABLE Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BRK [0x233c06000, 0x233c06fff] PGTABLE Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BRK [0x233c07000, 0x233c07fff] PGTABLE Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BRK [0x233c08000, 0x233c08fff] PGTABLE Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: RAMDISK: [mem 0x5a6fa000-0x5c4dafff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Early table checksum verification disabled Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: RSDP 0x00000000BFB7E014 000024 (v02 Google) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: XSDT 0x00000000BFB7D0E8 00005C (v01 Google GOOGFACP 00000001 01000013) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: FACP 0x00000000BFB78000 0000F4 (v02 Google GOOGFACP 00000001 GOOG 00000001) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: DSDT 0x00000000BFB79000 001A64 (v01 Google GOOGDSDT 00000001 GOOG 00000001) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: FACS 0x00000000BFBF2000 000040 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: SSDT 0x00000000BFB7C000 000316 (v02 GOOGLE Tpm2Tabl 00001000 INTL 20211217) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: TPM2 0x00000000BFB7B000 000034 (v04 GOOGLE 00000001 GOOG 00000001) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: SRAT 0x00000000BFB77000 000128 (v03 Google GOOGSRAT 00000001 GOOG 00000001) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: APIC 0x00000000BFB76000 0000A6 (v05 Google GOOGAPIC 00000001 GOOG 00000001) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: SSDT 0x00000000BFB75000 000BC6 (v01 Google GOOGSSDT 00000001 GOOG 00000001) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: WAET 0x00000000BFB74000 000028 (v01 Google GOOGWAET 00000001 GOOG 00000001) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Reserving FACP table memory at [mem 0xbfb78000-0xbfb780f3] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Reserving DSDT table memory at [mem 0xbfb79000-0xbfb7aa63] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Reserving FACS table memory at [mem 0xbfbf2000-0xbfbf203f] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Reserving SSDT table memory at [mem 0xbfb7c000-0xbfb7c315] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Reserving TPM2 table memory at [mem 0xbfb7b000-0xbfb7b033] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Reserving SRAT table memory at [mem 0xbfb77000-0xbfb77127] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Reserving APIC table memory at [mem 0xbfb76000-0xbfb760a5] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Reserving SSDT table memory at [mem 0xbfb75000-0xbfb75bc5] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Reserving WAET table memory at [mem 0xbfb74000-0xbfb74027] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Local APIC address 0xfee00000 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: SRAT: PXM 0 -> APIC 0x01 -> Node 0 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: SRAT: PXM 0 -> APIC 0x02 -> Node 0 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: SRAT: PXM 0 -> APIC 0x03 -> Node 0 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: SRAT: PXM 0 -> APIC 0x04 -> Node 0 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: SRAT: PXM 0 -> APIC 0x05 -> Node 0 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: SRAT: PXM 0 -> APIC 0x06 -> Node 0 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: SRAT: PXM 0 -> APIC 0x07 -> Node 0 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x83fffffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x000> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x83fffffff] -> [mem 0x0> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: NODE_DATA(0) allocated [mem 0x83ffd5000-0x83fffffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Using crashkernel=auto, the size chosen is a best effort estimation. Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Reserving 256MB of memory at 2736MB for crashkernel (System RAM: 32764MB) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Zone ranges: Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: DMA [mem 0x0000000000001000-0x0000000000ffffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: DMA32 [mem 0x0000000001000000-0x00000000ffffffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Normal [mem 0x0000000100000000-0x000000083fffffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Device empty Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Movable zone start for each node Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Early memory node ranges Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: node 0: [mem 0x0000000000001000-0x0000000000054fff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: node 0: [mem 0x0000000000060000-0x0000000000097fff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: node 0: [mem 0x0000000000100000-0x00000000bf8ecfff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: node 0: [mem 0x00000000bfbff000-0x00000000bffdffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: node 0: [mem 0x0000000100000000-0x000000083fffffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Zeroed struct page in unavailable ranges: 934 pages Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Initmem setup node 0 [mem 0x0000000000001000-0x000000083fffffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: On node 0 totalpages: 8387674 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: DMA zone: 64 pages used for memmap Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: DMA zone: 3182 pages reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: DMA zone: 3980 pages, LIFO batch:0 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: DMA32 zone: 12212 pages used for memmap Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: DMA32 zone: 781518 pages, LIFO batch:63 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Normal zone: 118784 pages used for memmap Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Normal zone: 7602176 pages, LIFO batch:63 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: PM-Timer IO Port: 0xb008 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Local APIC address 0xfee00000 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1]) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: IRQ5 used by override. Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: IRQ9 used by override. Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: IRQ10 used by override. Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: IRQ11 used by override. Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Using ACPI (MADT) for SMP configuration information Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: smpboot: Allowing 8 CPUs, 0 hotplug CPUs Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: PM: Registered nosave memory: [mem 0x00000000-0x00000fff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: PM: Registered nosave memory: [mem 0x00055000-0x0005ffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: PM: Registered nosave memory: [mem 0x00098000-0x0009ffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: PM: Registered nosave memory: [mem 0x000a0000-0x000fffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: PM: Registered nosave memory: [mem 0xbf8ed000-0xbfb6cfff] [sochat1_llnl_gov@gffw-compute-a-001 usernetes]$ journalctl --file /run/log/journal/a699437c101cde4ba34df680d8a9df5f/system\@0006020f8f6b6086-bcc53f0a609314af.journal~ --no-pager -f -- Logs begin at Fri 2023-08-04 01:51:48 UTC. -- Aug 04 02:22:06 gffw-compute-a-001 systemd[8268]: Reloading. Aug 04 02:22:27 gffw-compute-a-001 etcd.sh[8997]: {"level":"info","ts":"2023-08-04T02:22:27.956078Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":1581} Aug 04 02:22:27 gffw-compute-a-001 etcd.sh[8997]: {"level":"info","ts":"2023-08-04T02:22:27.95665Z","caller":"mvcc/kvstore_compaction.go:66","msg":"finished scheduled compaction","compact-revision":1581,"took":"448.696µs","hash":192400766} Aug 04 02:22:27 gffw-compute-a-001 etcd.sh[8997]: {"level":"info","ts":"2023-08-04T02:22:27.956672Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":192400766,"revision":1581,"compact-revision":1222} Aug 04 02:24:17 gffw-compute-a-001 rsyslogd[2989]: imjournal: journal files changed, reloading... [v8.2102.0-13.el8 try https://www.rsyslog.com/e/0 ] Aug 04 02:24:28 gffw-compute-a-001 systemd[8268]: Reloading. Aug 04 02:24:28 gffw-compute-a-001 systemd[8268]: Reloading. Aug 04 02:27:27 gffw-compute-a-001 etcd.sh[8997]: {"level":"info","ts":"2023-08-04T02:27:27.968146Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":1940} Aug 04 02:27:27 gffw-compute-a-001 etcd.sh[8997]: {"level":"info","ts":"2023-08-04T02:27:27.968725Z","caller":"mvcc/kvstore_compaction.go:66","msg":"finished scheduled compaction","compact-revision":1940,"took":"429.789µs","hash":3908641438} Aug 04 02:27:27 gffw-compute-a-001 etcd.sh[8997]: {"level":"info","ts":"2023-08-04T02:27:27.968754Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":3908641438,"revision":1940,"compact-revision":1581} ```

And the other one:

```console -- Logs begin at Fri 2023-08-04 01:51:48 UTC, end at Fri 2023-08-04 02:24:28 UTC. -- Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Linux version 4.18.0-477.15.1.el8_8.cloud.x86_64 (mockbuild@iad1-prod-build001.bld.e> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Command line: BOOT_IMAGE=(hd0,gpt2)/boot/vmlinuz-4.18.0-477.15.1.el8_8.cloud.x86_64 > Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256' Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[5]: 960, xstate_sizes[5]: 64 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]: 512 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/fpu: Enabled xstate features 0xff, context size is 2560 bytes, using 'compacted'> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: signal: max sigframe size: 3632 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-provided physical RAM map: Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000001000-0x0000000000054fff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000055000-0x000000000005ffff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000060000-0x0000000000097fff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000098000-0x000000000009ffff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bf8ecfff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x00000000bf8ed000-0x00000000bfb6cfff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x00000000bfb6d000-0x00000000bfb7efff] ACPI data Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x00000000bfb7f000-0x00000000bfbfefff] ACPI NVS Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x00000000bfbff000-0x00000000bffdffff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BIOS-e820: [mem 0x0000000100000000-0x000000083fffffff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: NX (Execute Disable) protection: active Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: efi: EFI v2.70 by EDK II Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: efi: TPMFinalLog=0xbfbf7000 ACPI=0xbfb7e000 ACPI 2.0=0xbfb7e014 SMBIOS=0xbf9ca00> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: secureboot: Secure boot disabled Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: SMBIOS 2.4 present. Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: DMI: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2023 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Hypervisor detected: KVM Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: kvm-clock: Using msrs 4b564d01 and 4b564d00 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: kvm-clock: using sched offset of 8214383053 cycles Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: tsc: Detected 3100.304 MHz processor Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: e820: remove [mem 0x000a0000-0x000fffff] usable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: last_pfn = 0x840000 max_arch_pfn = 0x400000000 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: MTRR default type: write-back Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: MTRR fixed ranges enabled: Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: 00000-9FFFF write-back Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: A0000-FFFFF uncachable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: MTRR variable ranges enabled: Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: 0 base 0000C0000000 mask 3FFFC0000000 uncachable Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: 1 disabled Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: 2 disabled Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: 3 disabled Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: 4 disabled Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: 5 disabled Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: 6 disabled Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: 7 disabled Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: last_pfn = 0xbffe0 max_arch_pfn = 0x400000000 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Using GB pages for direct mapping Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BRK [0x233c01000, 0x233c01fff] PGTABLE Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BRK [0x233c02000, 0x233c02fff] PGTABLE Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BRK [0x233c03000, 0x233c03fff] PGTABLE Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BRK [0x233c04000, 0x233c04fff] PGTABLE Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BRK [0x233c05000, 0x233c05fff] PGTABLE Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BRK [0x233c06000, 0x233c06fff] PGTABLE Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BRK [0x233c07000, 0x233c07fff] PGTABLE Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: BRK [0x233c08000, 0x233c08fff] PGTABLE Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: RAMDISK: [mem 0x5a6fa000-0x5c4dafff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Early table checksum verification disabled Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: RSDP 0x00000000BFB7E014 000024 (v02 Google) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: XSDT 0x00000000BFB7D0E8 00005C (v01 Google GOOGFACP 00000001 01000013) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: FACP 0x00000000BFB78000 0000F4 (v02 Google GOOGFACP 00000001 GOOG 00000001) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: DSDT 0x00000000BFB79000 001A64 (v01 Google GOOGDSDT 00000001 GOOG 00000001) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: FACS 0x00000000BFBF2000 000040 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: SSDT 0x00000000BFB7C000 000316 (v02 GOOGLE Tpm2Tabl 00001000 INTL 20211217) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: TPM2 0x00000000BFB7B000 000034 (v04 GOOGLE 00000001 GOOG 00000001) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: SRAT 0x00000000BFB77000 000128 (v03 Google GOOGSRAT 00000001 GOOG 00000001) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: APIC 0x00000000BFB76000 0000A6 (v05 Google GOOGAPIC 00000001 GOOG 00000001) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: SSDT 0x00000000BFB75000 000BC6 (v01 Google GOOGSSDT 00000001 GOOG 00000001) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: WAET 0x00000000BFB74000 000028 (v01 Google GOOGWAET 00000001 GOOG 00000001) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Reserving FACP table memory at [mem 0xbfb78000-0xbfb780f3] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Reserving DSDT table memory at [mem 0xbfb79000-0xbfb7aa63] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Reserving FACS table memory at [mem 0xbfbf2000-0xbfbf203f] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Reserving SSDT table memory at [mem 0xbfb7c000-0xbfb7c315] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Reserving TPM2 table memory at [mem 0xbfb7b000-0xbfb7b033] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Reserving SRAT table memory at [mem 0xbfb77000-0xbfb77127] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Reserving APIC table memory at [mem 0xbfb76000-0xbfb760a5] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Reserving SSDT table memory at [mem 0xbfb75000-0xbfb75bc5] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Reserving WAET table memory at [mem 0xbfb74000-0xbfb74027] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Local APIC address 0xfee00000 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: SRAT: PXM 0 -> APIC 0x01 -> Node 0 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: SRAT: PXM 0 -> APIC 0x02 -> Node 0 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: SRAT: PXM 0 -> APIC 0x03 -> Node 0 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: SRAT: PXM 0 -> APIC 0x04 -> Node 0 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: SRAT: PXM 0 -> APIC 0x05 -> Node 0 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: SRAT: PXM 0 -> APIC 0x06 -> Node 0 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: SRAT: PXM 0 -> APIC 0x07 -> Node 0 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x83fffffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x000> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x83fffffff] -> [mem 0x0> Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: NODE_DATA(0) allocated [mem 0x83ffd5000-0x83fffffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Using crashkernel=auto, the size chosen is a best effort estimation. Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Reserving 256MB of memory at 2736MB for crashkernel (System RAM: 32764MB) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Zone ranges: Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: DMA [mem 0x0000000000001000-0x0000000000ffffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: DMA32 [mem 0x0000000001000000-0x00000000ffffffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Normal [mem 0x0000000100000000-0x000000083fffffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Device empty Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Movable zone start for each node Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Early memory node ranges Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: node 0: [mem 0x0000000000001000-0x0000000000054fff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: node 0: [mem 0x0000000000060000-0x0000000000097fff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: node 0: [mem 0x0000000000100000-0x00000000bf8ecfff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: node 0: [mem 0x00000000bfbff000-0x00000000bffdffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: node 0: [mem 0x0000000100000000-0x000000083fffffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Zeroed struct page in unavailable ranges: 934 pages Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Initmem setup node 0 [mem 0x0000000000001000-0x000000083fffffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: On node 0 totalpages: 8387674 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: DMA zone: 64 pages used for memmap Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: DMA zone: 3182 pages reserved Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: DMA zone: 3980 pages, LIFO batch:0 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: DMA32 zone: 12212 pages used for memmap Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: DMA32 zone: 781518 pages, LIFO batch:63 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Normal zone: 118784 pages used for memmap Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Normal zone: 7602176 pages, LIFO batch:63 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: PM-Timer IO Port: 0xb008 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: Local APIC address 0xfee00000 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1]) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23 Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: IRQ5 used by override. Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: IRQ9 used by override. Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: IRQ10 used by override. Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: ACPI: IRQ11 used by override. Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: Using ACPI (MADT) for SMP configuration information Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: smpboot: Allowing 8 CPUs, 0 hotplug CPUs Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: PM: Registered nosave memory: [mem 0x00000000-0x00000fff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: PM: Registered nosave memory: [mem 0x00055000-0x0005ffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: PM: Registered nosave memory: [mem 0x00098000-0x0009ffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: PM: Registered nosave memory: [mem 0x000a0000-0x000fffff] Aug 04 01:51:48 packer-64ab657b-254d-6f13-2c7f-ca2a888d1813 kernel: PM: Registered nosave memory: [mem 0xbf8ed000-0xbfb6cfff] [sochat1_llnl_gov@gffw-compute-a-001 usernetes]$ journalctl --file /run/log/journal/a699437c101cde4ba34df680d8a9df5f/system\@0006020f8f6b6086-bcc53f0a609314af.journal~ --no-pager -f -- Logs begin at Fri 2023-08-04 01:51:48 UTC. -- Aug 04 02:22:06 gffw-compute-a-001 systemd[8268]: Reloading. Aug 04 02:22:27 gffw-compute-a-001 etcd.sh[8997]: {"level":"info","ts":"2023-08-04T02:22:27.956078Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":1581} Aug 04 02:22:27 gffw-compute-a-001 etcd.sh[8997]: {"level":"info","ts":"2023-08-04T02:22:27.95665Z","caller":"mvcc/kvstore_compaction.go:66","msg":"finished scheduled compaction","compact-revision":1581,"took":"448.696µs","hash":192400766} Aug 04 02:22:27 gffw-compute-a-001 etcd.sh[8997]: {"level":"info","ts":"2023-08-04T02:22:27.956672Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":192400766,"revision":1581,"compact-revision":1222} Aug 04 02:24:17 gffw-compute-a-001 rsyslogd[2989]: imjournal: journal files changed, reloading... [v8.2102.0-13.el8 try https://www.rsyslog.com/e/0 ] Aug 04 02:24:28 gffw-compute-a-001 systemd[8268]: Reloading. Aug 04 02:24:28 gffw-compute-a-001 systemd[8268]: Reloading. Aug 04 02:27:27 gffw-compute-a-001 etcd.sh[8997]: {"level":"info","ts":"2023-08-04T02:27:27.968146Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":1940} Aug 04 02:27:27 gffw-compute-a-001 etcd.sh[8997]: {"level":"info","ts":"2023-08-04T02:27:27.968725Z","caller":"mvcc/kvstore_compaction.go:66","msg":"finished scheduled compaction","compact-revision":1940,"took":"429.789µs","hash":3908641438} Aug 04 02:27:27 gffw-compute-a-001 etcd.sh[8997]: {"level":"info","ts":"2023-08-04T02:27:27.968754Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":3908641438,"revision":1940,"compact-revision":1581} Aug 04 02:28:17 gffw-compute-a-001 sshd[8338]: pam_unix(sshd:session): session closed for user sochat1_llnl_gov Aug 04 02:28:17 gffw-compute-a-001 systemd[1]: session-4.scope: Succeeded. Aug 04 02:28:17 gffw-compute-a-001 systemd-logind[1797]: Session 4 logged out. Waiting for processes to exit. Aug 04 02:28:17 gffw-compute-a-001 systemd-logind[1797]: Removed session 4. q^C [sochat1_llnl_gov@gffw-compute-a-001 usernetes]$ journalctl --file /run/log/journal/a699437c101cde4ba34df680d8a9df5f/system\@0006020f --no-pager -f system@0006020f8f6b6086-bcc53f0a609314af.journal~ system@0006020f97fb9574-56775ff6dbb63353.journal~ [sochat1_llnl_gov@gffw-compute-a-001 usernetes]$ journalctl --file /run/log/journal/a699437c101cde4ba34df680d8a9df5f/system\@0006020f --no-pager -f system@0006020f8f6b6086-bcc53f0a609314af.journal~ system@0006020f97fb9574-56775ff6dbb63353.journal~ [sochat1_llnl_gov@gffw-compute-a-001 usernetes]$ journalctl --file /run/log/journal/a699437c101cde4ba34df680d8a9df5f/system\@0006020f --no-pager -f system@0006020f8f6b6086-bcc53f0a609314af.journal~ system@0006020f97fb9574-56775ff6dbb63353.journal~ [sochat1_llnl_gov@gffw-compute-a-001 usernetes]$ journalctl --file /run/log/journal/a699437c101cde4ba34df680d8a9df5f/system\@0006020f97fb9574-56775ff6dbb63353.journal~ --no-pager -f -- Logs begin at Fri 2023-08-04 02:21:54 UTC. -- Aug 04 02:24:17 gffw-compute-a-001 audit: CONFIG_CHANGE op=set audit_enabled=1 old=1 auid=501043911 ses=5 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 res=1 Aug 04 02:24:17 gffw-compute-a-001 audit[10040]: SYSCALL arch=c000003e syscall=46 success=yes exit=60 a0=8 a1=7ffe41e1fdf0 a2=4000 a3=7ffe41e1fe9c items=0 ppid=10039 pid=10040 auid=501043911 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=5 comm="systemd-journal" exe="/usr/lib/systemd/systemd-journald" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null) Aug 04 02:24:17 gffw-compute-a-001 audit: PROCTITLE proctitle="/usr/lib/systemd/systemd-journald" Aug 04 02:28:17 gffw-compute-a-001 audit[8338]: CRYPTO_KEY_USER pid=8338 uid=0 auid=501043911 ses=4 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=session fp=? direction=both spid=8342 suid=501043911 rport=35925 laddr=10.10.0.4 lport=22 exe="/usr/sbin/sshd" hostname=? addr=35.235.244.32 terminal=? res=success' Aug 04 02:28:17 gffw-compute-a-001 audit[8338]: CRYPTO_KEY_USER pid=8338 uid=0 auid=501043911 ses=4 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:ef:c0:c1:19:d0:be:97:69:18:65:86:b6:7a:de:93:eb:4f:7a:36:05:e1:0e:bf:46:08:63:54:b7:e1:5f:44:c2 direction=? spid=8342 suid=501043911 exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success' Aug 04 02:28:17 gffw-compute-a-001 audit[8338]: USER_END pid=8338 uid=0 auid=501043911 ses=4 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=PAM:session_close grantors=pam_selinux,pam_loginuid,pam_selinux,pam_namespace,pam_keyinit,pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_umask,pam_lastlog,pam_mkhomedir acct="sochat1_llnl_gov" exe="/usr/sbin/sshd" hostname=35.235.244.32 addr=35.235.244.32 terminal=ssh res=success' Aug 04 02:28:17 gffw-compute-a-001 audit[8338]: CRED_DISP pid=8338 uid=0 auid=501043911 ses=4 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="sochat1_llnl_gov" exe="/usr/sbin/sshd" hostname=35.235.244.32 addr=35.235.244.32 terminal=ssh res=success' Aug 04 02:28:17 gffw-compute-a-001 audit[8338]: CRYPTO_KEY_USER pid=8338 uid=0 auid=501043911 ses=4 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:cb:24:99:1a:fd:7e:22:02:83:a7:9d:8f:3f:f4:f2:74:3d:92:a6:e9:f1:bc:3a:bb:da:12:e0:d6:a3:ad:44:32 direction=? spid=8338 suid=0 exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success' Aug 04 02:28:17 gffw-compute-a-001 audit[8338]: CRYPTO_KEY_USER pid=8338 uid=0 auid=501043911 ses=4 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:aa:c8:22:32:ae:81:f0:1c:9c:6c:a6:24:c0:51:0e:ec:d0:e1:c6:dc:39:f4:82:9c:22:b4:e4:a1:d7:5e:d0:6f direction=? spid=8338 suid=0 exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success' Aug 04 02:28:17 gffw-compute-a-001 audit[8338]: CRYPTO_KEY_USER pid=8338 uid=0 auid=501043911 ses=4 subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 msg='op=destroy kind=server fp=SHA256:ef:c0:c1:19:d0:be:97:69:18:65:86:b6:7a:de:93:eb:4f:7a:36:05:e1:0e:bf:46:08:63:54:b7:e1:5f:44:c2 direction=? spid=8338 suid=0 exe="/usr/sbin/sshd" hostname=? addr=? terminal=? res=success' ```
vsoch commented 1 year ago

The way I was debugging this before is with systemctl status, but I'm not sure what I'm looking for so it was hard to know where to look.

AkihiroSuda commented 1 year ago

Using Ubuntu (23.04 or 22.04) might be easier

vsoch commented 1 year ago

I think I agree. Okay - so here is a plan. I'll rebuild the cluster (it has a base VM) using ubuntu. Then I'll bring it up and see if the problem reproduces (and if we can get logs!) My suggestion after that is to try making a PR that doesn't have the node directories hard coded as "node" as I suspect the error might be coming from there. E.g, this line: https://github.com/converged-computing/flux-terraform-gcp/blob/26822feed8435f27d184c7a2cd4200614228824e/examples/usernetes/scripts/batch.sh#L28 I have to do because it's looking for the actual node name (but docker-compose hard codes all as "node").

What do you think? After I create the ubuntu cluster and try again I can report back and then we can figure out that next step.

aojea commented 1 year ago

/cc

vsoch commented 1 year ago

Thank you again to you both! I almost have the debian setup done, although I'm not exactly a morning person (got up to chat with you!) so I'll probably go back to sleep for a bit and be in touch later today / this weekend, and of course this means we can chat more next week. Happy Friday and have an amazing weekend!

vsoch commented 1 year ago

@AkihiroSuda I struggled all day trying to get a debian/ubuntu setup (yes, pathetic, GCP with terraform and the foundation network setup seems to not create the eth0, and I need to ping some Google colleagues to ask about this) BUT I went back to Rocky and (I think?) found a way to view the logs! We can just look at /var/log/messages and I think I'm seeing everything in there? Here you go!

Content of /var/log/messages ```console compute-a-001 systemd[1]: man-db-cache-update.service: Succeeded. Aug 5 03:11:59 gffw-compute-a-001 systemd[1]: Started man-db-cache-update.service. Aug 5 03:11:59 gffw-compute-a-001 systemd[1]: run-r58b87819f091433b951107f6f77b255b.service: Succeeded. Aug 5 03:11:59 gffw-compute-a-001 kernel: nfs: Deprecated parameter 'intr' Aug 5 03:12:03 gffw-compute-a-001 kernel: nfs: Deprecated parameter 'intr' Aug 5 03:12:03 gffw-compute-a-001 systemd[1]: systemd-hostnamed.service: Succeeded. Aug 5 03:12:06 gffw-compute-a-001 systemd[1]: Stopping User Manager for UID 0... Aug 5 03:12:06 gffw-compute-a-001 systemd[5370]: Stopped target Default. Aug 5 03:12:06 gffw-compute-a-001 systemd[5370]: Stopped target Basic System. Aug 5 03:12:06 gffw-compute-a-001 systemd[5370]: Stopped target Timers. Aug 5 03:12:06 gffw-compute-a-001 systemd[5370]: Stopped target Paths. Aug 5 03:12:06 gffw-compute-a-001 systemd[5370]: Stopped target Sockets. Aug 5 03:12:06 gffw-compute-a-001 systemd[5370]: Closed D-Bus User Message Bus Socket. Aug 5 03:12:06 gffw-compute-a-001 systemd[5370]: Reached target Shutdown. Aug 5 03:12:06 gffw-compute-a-001 systemd[5370]: Started Exit the Session. Aug 5 03:12:06 gffw-compute-a-001 systemd[5370]: Reached target Exit the Session. Aug 5 03:12:06 gffw-compute-a-001 systemd[1]: user@0.service: Succeeded. Aug 5 03:12:06 gffw-compute-a-001 systemd[1]: Stopped User Manager for UID 0. Aug 5 03:12:06 gffw-compute-a-001 systemd[1]: Stopping User runtime directory /run/user/0... Aug 5 03:12:06 gffw-compute-a-001 systemd[1]: run-user-0.mount: Succeeded. Aug 5 03:12:06 gffw-compute-a-001 systemd[1]: user-runtime-dir@0.service: Succeeded. Aug 5 03:12:06 gffw-compute-a-001 systemd[1]: Stopped User runtime directory /run/user/0. Aug 5 03:12:06 gffw-compute-a-001 systemd[1]: Removed slice User Slice of UID 0. Aug 5 03:12:11 gffw-compute-a-001 kernel: nfs: Deprecated parameter 'intr' Aug 5 03:12:11 gffw-compute-a-001 nfsrahead[8225]: setting /home readahead to 128 Aug 5 03:12:11 gffw-compute-a-001 google_metadata_script_runner[2976]: startup-script exit status 0 Aug 5 03:12:11 gffw-compute-a-001 google_metadata_script_runner[2976]: Finished running startup scripts. Aug 5 03:12:11 gffw-compute-a-001 systemd[1]: google-startup-scripts.service: Succeeded. Aug 5 03:12:11 gffw-compute-a-001 systemd[1]: Started Google Compute Engine Startup Scripts. Aug 5 03:12:11 gffw-compute-a-001 systemd[1]: Reached target Multi-User System. Aug 5 03:12:11 gffw-compute-a-001 systemd[1]: Starting Update UTMP about System Runlevel Changes... Aug 5 03:12:11 gffw-compute-a-001 systemd[1]: systemd-update-utmp-runlevel.service: Succeeded. Aug 5 03:12:11 gffw-compute-a-001 systemd[1]: Started Update UTMP about System Runlevel Changes. Aug 5 03:12:11 gffw-compute-a-001 systemd[1]: Startup finished in 1.076s (kernel) + 4.504s (initrd) + 46.633s (userspace) = 52.214s. Aug 5 03:12:13 gffw-compute-a-001 systemd[1]: Created slice User Slice of UID 501043911. Aug 5 03:12:13 gffw-compute-a-001 systemd[1]: Starting User runtime directory /run/user/501043911... Aug 5 03:12:13 gffw-compute-a-001 systemd-logind[2720]: New session 2 of user sochat1_llnl_gov. Aug 5 03:12:13 gffw-compute-a-001 systemd[1]: Started User runtime directory /run/user/501043911. Aug 5 03:12:13 gffw-compute-a-001 systemd[1]: Starting User Manager for UID 501043911... Aug 5 03:12:13 gffw-compute-a-001 systemd[8234]: Started Mark boot as successful after the user session has run 2 minutes. Aug 5 03:12:13 gffw-compute-a-001 systemd[8234]: Reached target Timers. Aug 5 03:12:13 gffw-compute-a-001 systemd[8234]: Starting D-Bus User Message Bus Socket. Aug 5 03:12:13 gffw-compute-a-001 systemd[8234]: Reached target Paths. Aug 5 03:12:13 gffw-compute-a-001 systemd[8234]: Listening on D-Bus User Message Bus Socket. Aug 5 03:12:13 gffw-compute-a-001 systemd[8234]: Reached target Sockets. Aug 5 03:12:13 gffw-compute-a-001 systemd[8234]: Reached target Basic System. Aug 5 03:12:13 gffw-compute-a-001 systemd[8234]: Reached target Default. Aug 5 03:12:13 gffw-compute-a-001 systemd[8234]: Startup finished in 25ms. Aug 5 03:12:13 gffw-compute-a-001 systemd[1]: Started User Manager for UID 501043911. Aug 5 03:12:13 gffw-compute-a-001 systemd[1]: Started Session 2 of user sochat1_llnl_gov. Aug 5 03:12:47 gffw-compute-a-001 systemd-logind[2720]: New session 4 of user sochat1_llnl_gov. Aug 5 03:12:47 gffw-compute-a-001 systemd[1]: Started Session 4 of user sochat1_llnl_gov. Aug 5 03:13:02 gffw-compute-a-001 flux[5553]: sched-fluxion-qmanager.debug[0]: alloc success (queue=default id=1104242802688) Aug 5 03:13:03 gffw-compute-a-001 flux[5553]: sched-fluxion-qmanager.debug[0]: free succeeded (queue=default id=1104242802688) Aug 5 03:13:07 gffw-compute-a-001 flux[5553]: job-ingest.debug[0]: job-validator[0]: inactivity timeout Aug 5 03:14:04 gffw-compute-a-001 systemd[8234]: Reloading. Aug 5 03:14:04 gffw-compute-a-001 systemd[8234]: Reloading. Aug 5 03:14:04 gffw-compute-a-001 systemd[8234]: Started Usernetes RootlessKit service. Aug 5 03:14:04 gffw-compute-a-001 systemd[8234]: Starting Usernetes etcd service... Aug 5 03:14:04 gffw-compute-a-001 kernel: IPv6: ADDRCONF(NETDEV_UP): tap0: link is not ready Aug 5 03:14:05 gffw-compute-a-001 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): tap0: link becomes ready Aug 5 03:14:05 gffw-compute-a-001 rootlesskit.sh[8917]: #033[104m#033[97m[INFO]#033[49m#033[39m RootlessKit ready, PID=8892, state directory=/run/user/501043911/usernetes/rootlesskit . Aug 5 03:14:05 gffw-compute-a-001 rootlesskit.sh[8917]: #033[104m#033[97m[INFO]#033[49m#033[39m Hint: You can enter RootlessKit namespaces by running `nsenter -U --preserve-credential -n -m -t 8892`. Aug 5 03:14:05 gffw-compute-a-001 rootlesskit.sh[8956]: 1 Aug 5 03:14:05 gffw-compute-a-001 rootlesskit.sh[8956]: 2 Aug 5 03:14:05 gffw-compute-a-001 etcd.sh[8868]: #033[104m#033[97m[INFO]#033[49m#033[39m Entering RootlessKit namespaces: . Aug 5 03:14:05 gffw-compute-a-001 etcd.sh[8968]: OK Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"warn","ts":"2023-08-05T03:14:06.001647Z","caller":"embed/config.go:673","msg":"Running http and grpc server on single port. This is not recommended for production."} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.002347Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--data-dir","/home/sochat1_llnl_gov/.local/share/usernetes/etcd","--enable-v2=true","--name","gffw-compute-a-001","--cert-file=/home/sochat1_llnl_gov/.config/usernetes/master/kubernetes.pem","--key-file=/home/sochat1_llnl_gov/.config/usernetes/master/kubernetes-key.pem","--peer-cert-file=/home/sochat1_llnl_gov/.config/usernetes/master/kubernetes.pem","--peer-key-file=/home/sochat1_llnl_gov/.config/usernetes/master/kubernetes-key.pem","--trusted-ca-file=/home/sochat1_llnl_gov/.config/usernetes/master/ca.pem","--peer-trusted-ca-file=/home/sochat1_llnl_gov/.config/usernetes/master/ca.pem","--peer-client-cert-auth","--client-cert-auth","--listen-client-urls","https://0.0.0.0:2379","--listen-peer-urls","https://0.0.0.0:2380","--advertise-client-urls","https://127.0.0.1:2379","--initial-advertise-peer-urls","https://127.0.0.1:2380"]} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"warn","ts":"2023-08-05T03:14:06.002594Z","caller":"embed/config.go:673","msg":"Running http and grpc server on single port. This is not recommended for production."} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.002604Z","caller":"embed/etcd.go:127","msg":"configuring peer listeners","listen-peer-urls":["https://0.0.0.0:2380"]} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.002631Z","caller":"embed/etcd.go:495","msg":"starting with peer TLS","tls-info":"cert = /home/sochat1_llnl_gov/.config/usernetes/master/kubernetes.pem, key = /home/sochat1_llnl_gov/.config/usernetes/master/kubernetes-key.pem, client-cert=, client-key=, trusted-ca = /home/sochat1_llnl_gov/.config/usernetes/master/ca.pem, client-cert-auth = true, crl-file = ","cipher-suites":[]} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.00432Z","caller":"embed/etcd.go:135","msg":"configuring client listeners","listen-client-urls":["https://0.0.0.0:2379"]} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.004403Z","caller":"embed/etcd.go:309","msg":"starting an etcd server","etcd-version":"3.5.9","git-sha":"bdbbde998","go-version":"go1.19.9","go-os":"linux","go-arch":"amd64","max-cpu-set":8,"max-cpu-available":8,"member-initialized":false,"name":"gffw-compute-a-001","data-dir":"/home/sochat1_llnl_gov/.local/share/usernetes/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/home/sochat1_llnl_gov/.local/share/usernetes/etcd/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":100000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://127.0.0.1:2380"],"listen-peer-urls":["https://0.0.0.0:2380"],"advertise-client-urls":["https://127.0.0.1:2379"],"listen-client-urls":["https://0.0.0.0:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"gffw-compute-a-001=https://127.0.0.1:2380","initial-cluster-state":"new","initial-cluster-token":"etcd-cluster","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.028953Z","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/home/sochat1_llnl_gov/.local/share/usernetes/etcd/member/snap/db","took":"13.62958ms"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.068895Z","caller":"etcdserver/raft.go:495","msg":"starting local member","local-member-id":"a874c87fd42044f","cluster-id":"c9be114fc2da2776"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.069267Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"a874c87fd42044f switched to configuration voters=()"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.069296Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"a874c87fd42044f became follower at term 0"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.069305Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"newRaft a874c87fd42044f [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.069311Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"a874c87fd42044f became follower at term 1"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.069345Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"a874c87fd42044f switched to configuration voters=(758659209188475983)"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"warn","ts":"2023-08-05T03:14:06.091224Z","caller":"auth/store.go:1238","msg":"simple token is not cryptographically signed"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.097361Z","caller":"mvcc/kvstore.go:393","msg":"kvstore restored","current-rev":1} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.10127Z","caller":"etcdserver/quota.go:94","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota-size-bytes":2147483648,"quota-size":"2.1 GB"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.106517Z","caller":"etcdserver/server.go:854","msg":"starting etcd server","local-member-id":"a874c87fd42044f","local-server-version":"3.5.9","cluster-version":"to_be_decided"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.106804Z","caller":"etcdserver/server.go:738","msg":"started as single-node; fast-forwarding election ticks","local-member-id":"a874c87fd42044f","forward-ticks":9,"forward-duration":"900ms","election-ticks":10,"election-timeout":"1s"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.107312Z","caller":"fileutil/purge.go:44","msg":"started to purge file","dir":"/home/sochat1_llnl_gov/.local/share/usernetes/etcd/member/snap","suffix":"snap.db","max":5,"interval":"30s"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.107715Z","caller":"fileutil/purge.go:44","msg":"started to purge file","dir":"/home/sochat1_llnl_gov/.local/share/usernetes/etcd/member/snap","suffix":"snap","max":5,"interval":"30s"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.107737Z","caller":"fileutil/purge.go:44","msg":"started to purge file","dir":"/home/sochat1_llnl_gov/.local/share/usernetes/etcd/member/wal","suffix":"wal","max":5,"interval":"30s"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.109017Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"a874c87fd42044f switched to configuration voters=(758659209188475983)"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.109153Z","caller":"membership/cluster.go:421","msg":"added member","cluster-id":"c9be114fc2da2776","local-member-id":"a874c87fd42044f","added-peer-id":"a874c87fd42044f","added-peer-peer-urls":["https://127.0.0.1:2380"]} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.109962Z","caller":"embed/etcd.go:726","msg":"starting with client TLS","tls-info":"cert = /home/sochat1_llnl_gov/.config/usernetes/master/kubernetes.pem, key = /home/sochat1_llnl_gov/.config/usernetes/master/kubernetes-key.pem, client-cert=, client-key=, trusted-ca = /home/sochat1_llnl_gov/.config/usernetes/master/ca.pem, client-cert-auth = true, crl-file = ","cipher-suites":[]} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"warn","ts":"2023-08-05T03:14:06.109988Z","caller":"embed/etcd.go:739","msg":"Flag `enable-v2` is deprecated and will get removed in etcd 3.6."} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.110039Z","caller":"embed/etcd.go:597","msg":"serving peer traffic","address":"[::]:2380"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.110065Z","caller":"embed/etcd.go:569","msg":"cmux::serve","address":"[::]:2380"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.110068Z","caller":"embed/etcd.go:278","msg":"now serving peer/client/metrics","local-member-id":"a874c87fd42044f","initial-advertise-peer-urls":["https://127.0.0.1:2380"],"listen-peer-urls":["https://0.0.0.0:2380"],"advertise-client-urls":["https://127.0.0.1:2379"],"listen-client-urls":["https://0.0.0.0:2379"],"listen-metrics-urls":[]} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.673846Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"a874c87fd42044f is starting a new election at term 1"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.673907Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"a874c87fd42044f became pre-candidate at term 1"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.673955Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"a874c87fd42044f received MsgPreVoteResp from a874c87fd42044f at term 1"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.673971Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"a874c87fd42044f became candidate at term 2"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.673977Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"a874c87fd42044f received MsgVoteResp from a874c87fd42044f at term 2"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.673985Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"a874c87fd42044f became leader at term 2"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.673992Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"raft.node: a874c87fd42044f elected leader a874c87fd42044f at term 2"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.676041Z","caller":"etcdserver/server.go:2571","msg":"setting up initial cluster version using v2 API","cluster-version":"3.5"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.677649Z","caller":"embed/serve.go:103","msg":"ready to serve client requests"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.677648Z","caller":"etcdserver/server.go:2062","msg":"published local member to cluster through raft","local-member-id":"a874c87fd42044f","local-member-attributes":"{Name:gffw-compute-a-001 ClientURLs:[https://127.0.0.1:2379]}","request-path":"/0/members/a874c87fd42044f/attributes","cluster-id":"c9be114fc2da2776","publish-timeout":"7s"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.67786Z","caller":"etcdmain/main.go:44","msg":"notifying init daemon"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.677963Z","caller":"etcdmain/main.go:50","msg":"successfully notified init daemon"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.679515Z","caller":"membership/cluster.go:584","msg":"set initial cluster version","cluster-id":"c9be114fc2da2776","local-member-id":"a874c87fd42044f","cluster-version":"3.5"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.679594Z","caller":"api/capability.go:75","msg":"enabled capabilities for version","cluster-version":"3.5"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.679522Z","caller":"embed/serve.go:250","msg":"serving client traffic securely","traffic":"grpc+http","address":"[::]:2379"} Aug 5 03:14:06 gffw-compute-a-001 etcd.sh[8973]: {"level":"info","ts":"2023-08-05T03:14:06.679622Z","caller":"etcdserver/server.go:2595","msg":"cluster version is updated","cluster-version":"3.5"} Aug 5 03:14:06 gffw-compute-a-001 etcd-init-data.sh[8991]: #033[104m#033[97m[INFO]#033[49m#033[39m Entering RootlessKit namespaces: Aug 5 03:14:06 gffw-compute-a-001 etcd-init-data.sh[8999]: OK Aug 5 03:14:06 gffw-compute-a-001 etcd-init-data.sh[9004]: + timeout 60 sh -c 'until cat /home/sochat1_llnl_gov/usernetes/config/flannel/etcd/coreos.com_network_config | ETCDCTL_API=3 etcdctl --endpoints https://127.0.0.1:2379 --cacert=/home/sochat1_llnl_gov/.config/usernetes/master/ca.pem --cert=/home/sochat1_llnl_gov/.config/usernetes/master/kubernetes.pem --key=/home/sochat1_llnl_gov/.config/usernetes/master/kubernetes-key.pem put /coreos.com/network/config; do sleep 1; done' Aug 5 03:14:06 gffw-compute-a-001 etcd-init-data.sh[9011]: OK Aug 5 03:14:06 gffw-compute-a-001 systemd[8234]: Started Usernetes etcd service. Aug 5 03:14:06 gffw-compute-a-001 systemd[8234]: Starting Usernetes kube-apiserver service... Aug 5 03:14:06 gffw-compute-a-001 systemd[8234]: Reached target Usernetes target for etcd. Aug 5 03:14:06 gffw-compute-a-001 kube-apiserver.sh[9021]: #033[104m#033[97m[INFO]#033[49m#033[39m Entering RootlessKit namespaces: Aug 5 03:14:06 gffw-compute-a-001 kube-apiserver.sh[9034]: OK Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.013508 100 server.go:551] external host was not specified, using 10.10.0.4 Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.013875 100 authentication.go:525] AnonymousAuth is not allowed with the AlwaysAllow authorizer. Resetting AnonymousAuth to false. You should use a different authorizer Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.015009 100 server.go:165] Version: v1.27.2 Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.015037 100 server.go:167] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.518222 100 plugins.go:158] Loaded 11 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.518253 100 plugins.go:161] Loaded 13 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,ClusterTrustBundleAttest,CertificateSubjectRestriction,ValidatingAdmissionPolicy,ValidatingAdmissionWebhook,ResourceQuota. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.543555 100 handler.go:232] Adding GroupVersion apiextensions.k8s.io v1 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.543584 100 genericapiserver.go:752] Skipping API apiextensions.k8s.io/v1beta1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.546078 100 instance.go:282] Using reconciler: lease Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: E0805 03:14:07.554361 100 instance.go:388] Could not construct pre-rendered responses for ServiceAccountIssuerDiscovery endpoints. Endpoints will not be enabled. Error: issuer URL must use https scheme, got: kubernetes.default.svc Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.678197 100 handler.go:232] Adding GroupVersion v1 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.678442 100 instance.go:651] API group "internal.apiserver.k8s.io" is not enabled, skipping. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.888478 100 instance.go:651] API group "resource.k8s.io" is not enabled, skipping. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.899810 100 handler.go:232] Adding GroupVersion authentication.k8s.io v1 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.899835 100 genericapiserver.go:752] Skipping API authentication.k8s.io/v1beta1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.899842 100 genericapiserver.go:752] Skipping API authentication.k8s.io/v1alpha1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.901626 100 handler.go:232] Adding GroupVersion authorization.k8s.io v1 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.901647 100 genericapiserver.go:752] Skipping API authorization.k8s.io/v1beta1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.903527 100 handler.go:232] Adding GroupVersion autoscaling v2 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.904058 100 handler.go:232] Adding GroupVersion autoscaling v1 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.904075 100 genericapiserver.go:752] Skipping API autoscaling/v2beta1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.904081 100 genericapiserver.go:752] Skipping API autoscaling/v2beta2 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.906397 100 handler.go:232] Adding GroupVersion batch v1 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.906416 100 genericapiserver.go:752] Skipping API batch/v1beta1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.908294 100 handler.go:232] Adding GroupVersion certificates.k8s.io v1 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.908314 100 genericapiserver.go:752] Skipping API certificates.k8s.io/v1beta1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.908321 100 genericapiserver.go:752] Skipping API certificates.k8s.io/v1alpha1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.910105 100 handler.go:232] Adding GroupVersion coordination.k8s.io v1 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.910125 100 genericapiserver.go:752] Skipping API coordination.k8s.io/v1beta1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.911425 100 genericapiserver.go:752] Skipping API discovery.k8s.io/v1beta1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.911846 100 handler.go:232] Adding GroupVersion discovery.k8s.io v1 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.914343 100 handler.go:232] Adding GroupVersion networking.k8s.io v1 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.914364 100 genericapiserver.go:752] Skipping API networking.k8s.io/v1beta1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.914371 100 genericapiserver.go:752] Skipping API networking.k8s.io/v1alpha1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.915968 100 handler.go:232] Adding GroupVersion node.k8s.io v1 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.915986 100 genericapiserver.go:752] Skipping API node.k8s.io/v1beta1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.915993 100 genericapiserver.go:752] Skipping API node.k8s.io/v1alpha1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.918006 100 handler.go:232] Adding GroupVersion policy v1 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.918023 100 genericapiserver.go:752] Skipping API policy/v1beta1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.920995 100 handler.go:232] Adding GroupVersion rbac.authorization.k8s.io v1 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.921012 100 genericapiserver.go:752] Skipping API rbac.authorization.k8s.io/v1beta1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.921019 100 genericapiserver.go:752] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.922296 100 handler.go:232] Adding GroupVersion scheduling.k8s.io v1 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.922318 100 genericapiserver.go:752] Skipping API scheduling.k8s.io/v1beta1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.922326 100 genericapiserver.go:752] Skipping API scheduling.k8s.io/v1alpha1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.925005 100 handler.go:232] Adding GroupVersion storage.k8s.io v1 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.925025 100 genericapiserver.go:752] Skipping API storage.k8s.io/v1beta1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.925032 100 genericapiserver.go:752] Skipping API storage.k8s.io/v1alpha1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.926743 100 handler.go:232] Adding GroupVersion flowcontrol.apiserver.k8s.io v1beta3 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.927465 100 handler.go:232] Adding GroupVersion flowcontrol.apiserver.k8s.io v1beta2 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.927485 100 genericapiserver.go:752] Skipping API flowcontrol.apiserver.k8s.io/v1beta1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.927491 100 genericapiserver.go:752] Skipping API flowcontrol.apiserver.k8s.io/v1alpha1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.931143 100 handler.go:232] Adding GroupVersion apps v1 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.931161 100 genericapiserver.go:752] Skipping API apps/v1beta2 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.931167 100 genericapiserver.go:752] Skipping API apps/v1beta1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.933176 100 handler.go:232] Adding GroupVersion admissionregistration.k8s.io v1 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.933194 100 genericapiserver.go:752] Skipping API admissionregistration.k8s.io/v1beta1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.933200 100 genericapiserver.go:752] Skipping API admissionregistration.k8s.io/v1alpha1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.934590 100 handler.go:232] Adding GroupVersion events.k8s.io v1 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.934609 100 genericapiserver.go:752] Skipping API events.k8s.io/v1beta1 because it has no resources. Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.944529 100 handler.go:232] Adding GroupVersion apiregistration.k8s.io v1 to ResourceManager Aug 5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:07.944552 100 genericapiserver.go:752] Skipping API apiregistration.k8s.io/v1beta1 because it has no resources. Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.367422 100 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/home/sochat1_llnl_gov/.config/usernetes/master/ca.pem" Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.367878 100 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/home/sochat1_llnl_gov/.config/usernetes/master/kubernetes.pem::/home/sochat1_llnl_gov/.config/usernetes/master/kubernetes-key.pem" Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.368499 100 secure_serving.go:210] Serving securely on [::]:6443 Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.368577 100 apiservice_controller.go:97] Starting APIServiceRegistrationController Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.368595 100 cache.go:32] Waiting for caches to sync for APIServiceRegistrationController controller Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.368706 100 controller.go:83] Starting OpenAPI AggregationController Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.368728 100 handler_discovery.go:392] Starting ResourceDiscoveryManager Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.368743 100 controller.go:80] Starting OpenAPI V3 AggregationController Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.368987 100 gc_controller.go:78] Starting apiserver lease garbage collector Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.368987 100 autoregister_controller.go:141] Starting autoregister controller Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.369025 100 cache.go:32] Waiting for caches to sync for autoregister controller Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.369034 100 controller.go:121] Starting legacy_token_tracking_controller Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.369054 100 shared_informer.go:311] Waiting for caches to sync for configmaps Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.369077 100 apf_controller.go:361] Starting API Priority and Fairness config controller Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.369090 100 tlsconfig.go:240] "Starting DynamicServingCertificateController" Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.369101 100 available_controller.go:423] Starting AvailableConditionController Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.369110 100 cache.go:32] Waiting for caches to sync for AvailableConditionController controller Aug 5 03:14:08 gffw-compute-a-001 systemd[8234]: Started Usernetes kube-apiserver service. Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.369122 100 gc_controller.go:78] Starting apiserver lease garbage collector Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.369138 100 system_namespaces_controller.go:67] Starting system namespaces controller Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.369126 100 crdregistration_controller.go:111] Starting crd-autoregister controller Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.369186 100 shared_informer.go:311] Waiting for caches to sync for crd-autoregister Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.369693 100 cluster_authentication_trust_controller.go:440] Starting cluster_authentication_trust_controller controller Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.369703 100 customresource_discovery_controller.go:289] Starting DiscoveryController Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.369908 100 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/home/sochat1_llnl_gov/.config/usernetes/master/ca.pem" Aug 5 03:14:08 gffw-compute-a-001 systemd[8234]: Started Usernetes kube-controller-manager service. Aug 5 03:14:08 gffw-compute-a-001 systemd[8234]: Started Usernetes kube-scheduler service. Aug 5 03:14:08 gffw-compute-a-001 systemd[8234]: Reached target Usernetes target for Kubernetes master components. Aug 5 03:14:08 gffw-compute-a-001 systemd[8234]: Reached target Usernetes target for Kubernetes master components (including etcd). Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.374237 100 controller.go:85] Starting OpenAPI controller Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.374400 100 controller.go:85] Starting OpenAPI V3 controller Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.374447 100 naming_controller.go:291] Starting NamingConditionController Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.374462 100 establishing_controller.go:76] Starting EstablishingController Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.375287 100 nonstructuralschema_controller.go:192] Starting NonStructuralSchemaConditionController Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.375323 100 apiapproval_controller.go:186] Starting KubernetesAPIApprovalPolicyConformantConditionController Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.375363 100 crd_finalizer.go:266] Starting CRDFinalizer Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.369706 100 shared_informer.go:311] Waiting for caches to sync for cluster_authentication_trust_controller Aug 5 03:14:08 gffw-compute-a-001 kube-controller-manager.sh[9055]: #033[104m#033[97m[INFO]#033[49m#033[39m Entering RootlessKit namespaces: Aug 5 03:14:08 gffw-compute-a-001 kube-controller-manager.sh[9082]: OK Aug 5 03:14:08 gffw-compute-a-001 kube-scheduler.sh[9056]: #033[104m#033[97m[INFO]#033[49m#033[39m Entering RootlessKit namespaces: Aug 5 03:14:08 gffw-compute-a-001 kube-scheduler.sh[9083]: OK Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: E0805 03:14:08.442969 100 controller.go:146] "Failed to ensure lease exists, will retry" err="namespaces \"kube-system\" not found" interval="200ms" Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.469163 100 shared_informer.go:318] Caches are synced for configmaps Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.469183 100 cache.go:39] Caches are synced for AvailableConditionController controller Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.469207 100 apf_controller.go:366] Running API Priority and Fairness config worker Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.469190 100 cache.go:39] Caches are synced for autoregister controller Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.469241 100 cache.go:39] Caches are synced for APIServiceRegistrationController controller Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.469212 100 shared_informer.go:318] Caches are synced for crd-autoregister Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.469224 100 apf_controller.go:369] Running API Priority and Fairness periodic rebalancing process Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.471424 100 controller.go:624] quota admission added evaluator for: namespaces Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.475423 100 shared_informer.go:318] Caches are synced for cluster_authentication_trust_controller Aug 5 03:14:08 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:08.646052 100 controller.go:624] quota admission added evaluator for: leases.coordination.k8s.io Aug 5 03:14:08 gffw-compute-a-001 kube-scheduler.sh[9093]: I0805 03:14:08.862262 119 serving.go:348] Generated self-signed cert in-memory Aug 5 03:14:08 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:08.881114 118 serving.go:348] Generated self-signed cert in-memory Aug 5 03:14:09 gffw-compute-a-001 kube-controller-manager.sh[9092]: W0805 03:14:09.119012 118 authentication.go:339] No authentication-kubeconfig provided in order to lookup client-ca-file in configmap/extension-apiserver-authentication in kube-system, so client certificate authentication won't work. Aug 5 03:14:09 gffw-compute-a-001 kube-controller-manager.sh[9092]: W0805 03:14:09.119040 118 authentication.go:363] No authentication-kubeconfig provided in order to lookup requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work. Aug 5 03:14:09 gffw-compute-a-001 kube-controller-manager.sh[9092]: W0805 03:14:09.119055 118 authorization.go:193] No authorization-kubeconfig provided, so SubjectAccessReview of authorization tokens won't work. Aug 5 03:14:09 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:09.119323 118 controllermanager.go:187] "Starting" version="v1.27.2" Aug 5 03:14:09 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:09.119344 118 controllermanager.go:189] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" Aug 5 03:14:09 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:09.122239 118 secure_serving.go:210] Serving securely on [::]:10257 Aug 5 03:14:09 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:09.122335 118 tlsconfig.go:240] "Starting DynamicServingCertificateController" Aug 5 03:14:09 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:09.122866 118 leaderelection.go:245] attempting to acquire leader lease kube-system/kube-controller-manager... Aug 5 03:14:09 gffw-compute-a-001 kube-scheduler.sh[9093]: W0805 03:14:09.132733 119 authentication.go:339] No authentication-kubeconfig provided in order to lookup client-ca-file in configmap/extension-apiserver-authentication in kube-system, so client certificate authentication won't work. Aug 5 03:14:09 gffw-compute-a-001 kube-scheduler.sh[9093]: W0805 03:14:09.132754 119 authentication.go:363] No authentication-kubeconfig provided in order to lookup requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work. Aug 5 03:14:09 gffw-compute-a-001 kube-scheduler.sh[9093]: W0805 03:14:09.132769 119 authorization.go:193] No authorization-kubeconfig provided, so SubjectAccessReview of authorization tokens won't work. Aug 5 03:14:09 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:09.138221 118 leaderelection.go:255] successfully acquired lease kube-system/kube-controller-manager Aug 5 03:14:09 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:09.138342 118 event.go:307] "Event occurred" object="kube-system/kube-controller-manager" fieldPath="" kind="Lease" apiVersion="coordination.k8s.io/v1" type="Normal" reason="LeaderElection" message="gffw-compute-a-001_b5faff78-e1cb-432b-a17c-a97518224420 became leader" Aug 5 03:14:09 gffw-compute-a-001 kube-scheduler.sh[9093]: I0805 03:14:09.155403 119 server.go:154] "Starting Kubernetes Scheduler" version="v1.27.2" Aug 5 03:14:09 gffw-compute-a-001 kube-scheduler.sh[9093]: I0805 03:14:09.155429 119 server.go:156] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" Aug 5 03:14:09 gffw-compute-a-001 kube-scheduler.sh[9093]: I0805 03:14:09.157894 119 secure_serving.go:210] Serving securely on [::]:10259 Aug 5 03:14:09 gffw-compute-a-001 kube-scheduler.sh[9093]: I0805 03:14:09.159313 119 tlsconfig.go:240] "Starting DynamicServingCertificateController" Aug 5 03:14:09 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:09.175772 100 controller.go:132] OpenAPI AggregationController: action for item k8s_internal_local_delegation_chain_0000000000: Nothing (removed from the queue). Aug 5 03:14:09 gffw-compute-a-001 kube-scheduler.sh[9093]: I0805 03:14:09.258571 119 leaderelection.go:245] attempting to acquire leader lease kube-system/kube-scheduler... Aug 5 03:14:09 gffw-compute-a-001 kube-scheduler.sh[9093]: I0805 03:14:09.265858 119 leaderelection.go:255] successfully acquired lease kube-system/kube-scheduler Aug 5 03:14:09 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:09.375092 100 storage_scheduling.go:95] created PriorityClass system-node-critical with value 2000001000 Aug 5 03:14:09 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:09.379407 100 storage_scheduling.go:95] created PriorityClass system-cluster-critical with value 2000000000 Aug 5 03:14:09 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:09.379427 100 storage_scheduling.go:111] all system priority classes are created successfully or already exist. Aug 5 03:14:09 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:09.394655 100 alloc.go:330] "allocated clusterIPs" service="default/kubernetes" clusterIPs=map[IPv4:10.0.0.1] Aug 5 03:14:09 gffw-compute-a-001 kube-apiserver.sh[9039]: W0805 03:14:09.401455 100 lease.go:251] Resetting endpoints for master service "kubernetes" to [10.10.0.4] Aug 5 03:14:09 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:09.402197 100 controller.go:624] quota admission added evaluator for: endpoints Aug 5 03:14:09 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:09.406154 100 controller.go:624] quota admission added evaluator for: endpointslices.discovery.k8s.io Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.152735 118 shared_informer.go:311] Waiting for caches to sync for tokens Aug 5 03:14:10 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:10.157531 100 controller.go:624] quota admission added evaluator for: serviceaccounts Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.160729 118 controllermanager.go:638] "Started controller" controller="deployment" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.160859 118 deployment_controller.go:168] "Starting controller" controller="deployment" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.160875 118 shared_informer.go:311] Waiting for caches to sync for deployment Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.169994 118 controllermanager.go:638] "Started controller" controller="horizontalpodautoscaling" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.170064 118 horizontal.go:200] "Starting HPA controller" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.170078 118 shared_informer.go:311] Waiting for caches to sync for HPA Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.176274 118 controllermanager.go:638] "Started controller" controller="cronjob" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.176391 118 cronjob_controllerv2.go:139] "Starting cronjob controller v2" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.176407 118 shared_informer.go:311] Waiting for caches to sync for cronjob Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.194036 118 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="daemonsets.apps" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.194084 118 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="csistoragecapacities.storage.k8s.io" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.194366 118 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="rolebindings.rbac.authorization.k8s.io" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.194396 118 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="endpointslices.discovery.k8s.io" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: W0805 03:14:10.194419 118 shared_informer.go:592] resyncPeriod 18h9m10.538246446s is smaller than resyncCheckPeriod 18h45m34.202617713s and the informer has already started. Changing it to 18h45m34.202617713s Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.194449 118 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="serviceaccounts" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.194481 118 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="poddisruptionbudgets.policy" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.194512 118 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="statefulsets.apps" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.194530 118 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="jobs.batch" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.194545 118 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="leases.coordination.k8s.io" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.194569 118 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="endpoints" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.194583 118 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="limitranges" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.194598 118 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="roles.rbac.authorization.k8s.io" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.194618 118 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="deployments.apps" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.194633 118 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="controllerrevisions.apps" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: W0805 03:14:10.194640 118 shared_informer.go:592] resyncPeriod 14h47m48.805273649s is smaller than resyncCheckPeriod 18h45m34.202617713s and the informer has already started. Changing it to 18h45m34.202617713s Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.194675 118 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="replicasets.apps" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.194689 118 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="horizontalpodautoscalers.autoscaling" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.194964 118 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="networkpolicies.networking.k8s.io" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.194982 118 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="ingresses.networking.k8s.io" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.194996 118 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="podtemplates" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.195008 118 resource_quota_monitor.go:223] "QuotaMonitor created object count evaluator" resource="cronjobs.batch" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.195028 118 controllermanager.go:638] "Started controller" controller="resourcequota" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.195047 118 resource_quota_controller.go:295] "Starting resource quota controller" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.195062 118 shared_informer.go:311] Waiting for caches to sync for resource quota Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.195080 118 resource_quota_monitor.go:304] "QuotaMonitor running" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.207848 118 controllermanager.go:638] "Started controller" controller="garbagecollector" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.207979 118 garbagecollector.go:155] "Starting controller" controller="garbagecollector" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.208086 118 shared_informer.go:311] Waiting for caches to sync for garbage collector Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.208119 118 graph_builder.go:294] "Running" component="GraphBuilder" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.214068 118 controllermanager.go:638] "Started controller" controller="statefulset" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.214181 118 stateful_set.go:161] "Starting stateful set controller" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.214202 118 shared_informer.go:311] Waiting for caches to sync for stateful set Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.219987 118 controllermanager.go:638] "Started controller" controller="ttl-after-finished" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.220023 118 ttlafterfinished_controller.go:109] "Starting TTL after finished controller" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.220029 118 shared_informer.go:311] Waiting for caches to sync for TTL after finished Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.252976 118 shared_informer.go:318] Caches are synced for tokens Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.359056 118 controllermanager.go:638] "Started controller" controller="ephemeral-volume" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.360887 118 controller.go:169] "Starting ephemeral volume controller" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.360914 118 shared_informer.go:311] Waiting for caches to sync for ephemeral Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.509249 118 controllermanager.go:638] "Started controller" controller="job" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.509274 118 controllermanager.go:603] "Warning: controller is disabled" controller="tokencleaner" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.509283 118 controllermanager.go:616] "Warning: skipping controller" controller="nodeipam" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.509356 118 job_controller.go:202] Starting job controller Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.509367 118 shared_informer.go:311] Waiting for caches to sync for job Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: W0805 03:14:10.660277 118 probe.go:268] Flexvolume plugin directory at /usr/libexec/kubernetes/kubelet-plugins/volume/exec/ does not exist. Recreating. Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: E0805 03:14:10.660347 118 plugins.go:609] "Error initializing dynamic plugin prober" err="error (re-)creating driver directory: mkdir /usr/libexec/kubernetes: permission denied" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.660785 118 controllermanager.go:638] "Started controller" controller="attachdetach" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.660871 118 attach_detach_controller.go:343] "Starting attach detach controller" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.660882 118 shared_informer.go:311] Waiting for caches to sync for attach detach Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.808500 118 controllermanager.go:638] "Started controller" controller="pvc-protection" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.808577 118 pvc_protection_controller.go:102] "Starting PVC protection controller" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.808594 118 shared_informer.go:311] Waiting for caches to sync for PVC protection Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.958629 118 controllermanager.go:638] "Started controller" controller="root-ca-cert-publisher" Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.958681 118 publisher.go:101] Starting root CA certificate configmap publisher Aug 5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:10.958689 118 shared_informer.go:311] Waiting for caches to sync for crt configmap Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.108669 118 controllermanager.go:638] "Started controller" controller="daemonset" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.108750 118 daemon_controller.go:291] "Starting daemon sets controller" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.108759 118 shared_informer.go:311] Waiting for caches to sync for daemon sets Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.258084 118 controllermanager.go:638] "Started controller" controller="csrcleaner" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.258138 118 cleaner.go:82] Starting CSR cleaner controller Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.410618 118 node_lifecycle_controller.go:431] "Controller will reconcile labels" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.410674 118 controllermanager.go:638] "Started controller" controller="nodelifecycle" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.410739 118 node_lifecycle_controller.go:465] "Sending events to api server" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.410763 118 node_lifecycle_controller.go:476] "Starting node controller" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.410771 118 shared_informer.go:311] Waiting for caches to sync for taint Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: E0805 03:14:11.455692 118 core.go:213] "Failed to start cloud node lifecycle controller" err="no cloud provider provided" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.455718 118 controllermanager.go:616] "Warning: skipping controller" controller="cloud-node-lifecycle" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.610486 118 controllermanager.go:638] "Started controller" controller="persistentvolume-binder" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.610573 118 pv_controller_base.go:323] "Starting persistent volume controller" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.610581 118 shared_informer.go:311] Waiting for caches to sync for persistent volume Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.765117 118 controllermanager.go:638] "Started controller" controller="clusterrole-aggregation" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.765176 118 clusterroleaggregation_controller.go:189] "Starting ClusterRoleAggregator controller" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.765186 118 shared_informer.go:311] Waiting for caches to sync for ClusterRoleAggregator Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.807592 118 certificate_controller.go:112] Starting certificate controller "csrsigning-kubelet-serving" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.807617 118 shared_informer.go:311] Waiting for caches to sync for certificate-csrsigning-kubelet-serving Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.807657 118 dynamic_serving_content.go:132] "Starting controller" name="csr-controller::/home/sochat1_llnl_gov/.config/usernetes/master/ca.pem::/home/sochat1_llnl_gov/.config/usernetes/master/ca-key.pem" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.808713 118 certificate_controller.go:112] Starting certificate controller "csrsigning-kubelet-client" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.808730 118 shared_informer.go:311] Waiting for caches to sync for certificate-csrsigning-kubelet-client Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.808795 118 dynamic_serving_content.go:132] "Starting controller" name="csr-controller::/home/sochat1_llnl_gov/.config/usernetes/master/ca.pem::/home/sochat1_llnl_gov/.config/usernetes/master/ca-key.pem" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.809619 118 certificate_controller.go:112] Starting certificate controller "csrsigning-kube-apiserver-client" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.809653 118 shared_informer.go:311] Waiting for caches to sync for certificate-csrsigning-kube-apiserver-client Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.809627 118 dynamic_serving_content.go:132] "Starting controller" name="csr-controller::/home/sochat1_llnl_gov/.config/usernetes/master/ca.pem::/home/sochat1_llnl_gov/.config/usernetes/master/ca-key.pem" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.810242 118 controllermanager.go:638] "Started controller" controller="csrsigning" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.810264 118 controllermanager.go:603] "Warning: controller is disabled" controller="bootstrapsigner" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.810299 118 certificate_controller.go:112] Starting certificate controller "csrsigning-legacy-unknown" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.810305 118 shared_informer.go:311] Waiting for caches to sync for certificate-csrsigning-legacy-unknown Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.810340 118 dynamic_serving_content.go:132] "Starting controller" name="csr-controller::/home/sochat1_llnl_gov/.config/usernetes/master/ca.pem::/home/sochat1_llnl_gov/.config/usernetes/master/ca-key.pem" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.959693 118 controllermanager.go:638] "Started controller" controller="serviceaccount" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.959738 118 serviceaccounts_controller.go:111] "Starting service account controller" Aug 5 03:14:11 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:11.959757 118 shared_informer.go:311] Waiting for caches to sync for service account Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.108811 118 controllermanager.go:638] "Started controller" controller="replicaset" Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.108890 118 replica_set.go:201] "Starting controller" name="replicaset" Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.108897 118 shared_informer.go:311] Waiting for caches to sync for ReplicaSet Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.156293 118 controllermanager.go:638] "Started controller" controller="csrapproving" Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.156353 118 certificate_controller.go:112] Starting certificate controller "csrapproving" Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.156364 118 shared_informer.go:311] Waiting for caches to sync for certificate-csrapproving Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.309667 118 controllermanager.go:638] "Started controller" controller="endpoint" Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.309797 118 endpoints_controller.go:172] Starting endpoint controller Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.309811 118 shared_informer.go:311] Waiting for caches to sync for endpoint Aug 5 03:14:12 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:12.417448 100 controller.go:624] quota admission added evaluator for: deployments.apps Aug 5 03:14:12 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:12.434986 100 alloc.go:330] "allocated clusterIPs" service="kube-system/kube-dns" clusterIPs=map[IPv4:10.0.0.53] Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.458748 118 controllermanager.go:638] "Started controller" controller="podgc" Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.459318 118 gc_controller.go:103] Starting GC controller Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.459330 118 shared_informer.go:311] Waiting for caches to sync for GC Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.609157 118 controllermanager.go:638] "Started controller" controller="ttl" Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.609231 118 ttl_controller.go:124] "Starting TTL controller" Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.609250 118 shared_informer.go:311] Waiting for caches to sync for TTL Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: E0805 03:14:12.759095 118 core.go:92] "Failed to start service controller" err="WARNING: no cloud provider provided, services of type LoadBalancer will fail" Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.759121 118 controllermanager.go:616] "Warning: skipping controller" controller="service" Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.759138 118 core.go:224] "Will not configure cloud provider routes for allocate-node-cidrs" CIDRs=false routes=true Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.759146 118 controllermanager.go:616] "Warning: skipping controller" controller="route" Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.909067 118 controllermanager.go:638] "Started controller" controller="pv-protection" Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.909125 118 pv_protection_controller.go:78] "Starting PV protection controller" Aug 5 03:14:12 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:12.909133 118 shared_informer.go:311] Waiting for caches to sync for PV protection Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.058896 118 controllermanager.go:638] "Started controller" controller="endpointslice" Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.059014 118 endpointslice_controller.go:252] Starting endpoint slice controller Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.059027 118 shared_informer.go:311] Waiting for caches to sync for endpoint_slice Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.209152 118 controllermanager.go:638] "Started controller" controller="endpointslicemirroring" Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.209231 118 endpointslicemirroring_controller.go:211] Starting EndpointSliceMirroring controller Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.209238 118 shared_informer.go:311] Waiting for caches to sync for endpoint_slice_mirroring Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.405664 118 controllermanager.go:638] "Started controller" controller="disruption" Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.405725 118 disruption.go:423] Sending events to api server. Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.405765 118 disruption.go:434] Starting disruption controller Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.405772 118 shared_informer.go:311] Waiting for caches to sync for disruption Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.558575 118 controllermanager.go:638] "Started controller" controller="persistentvolume-expander" Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.558624 118 expand_controller.go:339] "Starting expand controller" Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.558632 118 shared_informer.go:311] Waiting for caches to sync for expand Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.708735 118 controllermanager.go:638] "Started controller" controller="replicationcontroller" Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.708799 118 replica_set.go:201] "Starting controller" name="replicationcontroller" Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.708821 118 shared_informer.go:311] Waiting for caches to sync for ReplicationController Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.961057 118 controllermanager.go:638] "Started controller" controller="namespace" Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.961148 118 namespace_controller.go:197] "Starting namespace controller" Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.961281 118 shared_informer.go:311] Waiting for caches to sync for namespace Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.964004 118 shared_informer.go:311] Waiting for caches to sync for resource quota Aug 5 03:14:13 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:13.976041 118 shared_informer.go:311] Waiting for caches to sync for garbage collector Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.006043 118 shared_informer.go:318] Caches are synced for disruption Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.009341 118 shared_informer.go:318] Caches are synced for TTL Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.009359 118 shared_informer.go:318] Caches are synced for ReplicaSet Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.009391 118 shared_informer.go:318] Caches are synced for PV protection Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.009389 118 shared_informer.go:318] Caches are synced for ReplicationController Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.009410 118 shared_informer.go:318] Caches are synced for PVC protection Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.009438 118 shared_informer.go:318] Caches are synced for daemon sets Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.010529 118 shared_informer.go:318] Caches are synced for endpoint Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.010643 118 shared_informer.go:318] Caches are synced for persistent volume Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.010799 118 shared_informer.go:318] Caches are synced for taint Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.010900 118 taint_manager.go:206] "Starting NoExecuteTaintManager" Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.010953 118 taint_manager.go:211] "Sending events to api server" Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.014538 118 shared_informer.go:318] Caches are synced for stateful set Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.059273 118 shared_informer.go:318] Caches are synced for crt configmap Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.059284 118 shared_informer.go:318] Caches are synced for expand Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.059393 118 shared_informer.go:318] Caches are synced for GC Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.060473 118 shared_informer.go:318] Caches are synced for service account Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.061529 118 shared_informer.go:318] Caches are synced for ephemeral Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.061541 118 shared_informer.go:318] Caches are synced for attach detach Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.061557 118 shared_informer.go:318] Caches are synced for namespace Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.061574 118 shared_informer.go:318] Caches are synced for deployment Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.065494 118 shared_informer.go:318] Caches are synced for ClusterRoleAggregator Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.070729 118 shared_informer.go:318] Caches are synced for HPA Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.076969 118 shared_informer.go:318] Caches are synced for cronjob Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.110206 118 shared_informer.go:318] Caches are synced for job Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.120462 118 shared_informer.go:318] Caches are synced for TTL after finished Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.159434 118 shared_informer.go:318] Caches are synced for endpoint_slice Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.164746 118 shared_informer.go:318] Caches are synced for resource quota Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.195257 118 shared_informer.go:318] Caches are synced for resource quota Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.208265 118 shared_informer.go:318] Caches are synced for certificate-csrsigning-kubelet-serving Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.209362 118 shared_informer.go:318] Caches are synced for certificate-csrsigning-kubelet-client Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.209382 118 shared_informer.go:318] Caches are synced for endpoint_slice_mirroring Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.210554 118 shared_informer.go:318] Caches are synced for certificate-csrsigning-kube-apiserver-client Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.210632 118 shared_informer.go:318] Caches are synced for certificate-csrsigning-legacy-unknown Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.257051 118 shared_informer.go:318] Caches are synced for certificate-csrapproving Aug 5 03:14:14 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:14.410481 100 controller.go:624] quota admission added evaluator for: replicasets.apps Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.414243 118 event.go:307] "Event occurred" object="kube-system/coredns" fieldPath="" kind="Deployment" apiVersion="apps/v1" type="Normal" reason="ScalingReplicaSet" message="Scaled up replica set coredns-8557665db to 2" Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.565153 118 event.go:307] "Event occurred" object="kube-system/coredns-8557665db" fieldPath="" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: coredns-8557665db-ckfzd" Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.576527 118 shared_informer.go:318] Caches are synced for garbage collector Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.577719 118 event.go:307] "Event occurred" object="kube-system/coredns-8557665db" fieldPath="" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: coredns-8557665db-9rzc4" Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.608960 118 shared_informer.go:318] Caches are synced for garbage collector Aug 5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.608985 118 garbagecollector.go:166] "All resource monitors have synced. Proceeding to collect garbage" Aug 5 03:15:39 gffw-compute-a-001 systemd-journald[571]: Time spent on flushing to /var is 12.919ms for 1839 entries. Aug 5 03:15:39 gffw-compute-a-001 rsyslogd[2971]: imjournal: journal files changed, reloading... [v8.2102.0-13.el8 try https://www.rsyslog.com/e/0 ] ```

I haven't looked closely yet - need to eat dinner but I will during!

vsoch commented 1 year ago

okay here are some blobs that stand out to me (and might be useful for debugging):

This could be more of a warning:

Aug  5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: E0805 03:14:07.554361     100 instance.go:388] Could not construct pre-rendered responses for ServiceAccountIssuerDiscovery endpoints. Endpoints will not be enabled. Error: issuer URL must use https scheme, got: kubernetes.default.svc

This looks like an issue - should this be created in advance for the rootless use case?

Aug  5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: W0805 03:14:10.660277     118 probe.go:268] Flexvolume plugin directory at /usr/libexec/kubernetes/kubelet-plugins/volume/exec/ does not exist. Recreating.
Aug  5 03:14:10 gffw-compute-a-001 kube-controller-manager.sh[9092]: E0805 03:14:10.660347     118 plugins.go:609] "Error initializing dynamic plugin prober" err="error (re-)creating driver directory: mkdir /usr/libexec/kubernetes: permission denied"

I'll see if I can lookup what that is for (and try creating it). If CORE DNS (or other plugins) need to write stuff there, that could be an issue. And then this is the last explicit error:

Aug  5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.546078     100 instance.go:282] Using reconciler: lease
Aug  5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: E0805 03:14:07.554361     100 instance.go:388] Could not construct pre-rendered responses for ServiceAccountIssuerDiscovery endpoints. Endpoints will not be enabled. Error: issuer URL must use https scheme, got: kubernetes.default.svc
Aug  5 03:14:07 gffw-compute-a-001 kube-apiserver.sh[9039]: I0805 03:14:07.678197     100 handler.go:232] Adding GroupVersion  v1 to ResourceManager

If that is just for service endpoints (which we haven't cared about yet) it's probably not the issue.

vsoch commented 1 year ago

okay debugging - the kube-controller-manager.sh is here: https://github.com/rootless-containers/usernetes/blob/58df6ea63cc4a00425b80a088889015eedc96320/boot/kube-controller-manager.sh#L4 and it's calling nsenter.sh, but that seems to be more of a wrapper, and it's calling kube-controller-manager. Searching around for those files, the first probe.go seems to be here:

https://github.com/kubernetes/kubernetes/blob/2c6c4566eff972d6c1320b5f8ad795f88c822d09/pkg/volume/flexvolume/probe.go#L265-L274

And then the plugins.go is here:

https://github.com/kubernetes/kubernetes/blob/2c6c4566eff972d6c1320b5f8ad795f88c822d09/pkg/volume/plugins.go#L603-L607

But it looks like it's still initializing some dummy plugins prober, so maybe this isn't an actual error after all?

pm.prober = &dummyPluginProber{}
vsoch commented 1 year ago

I see mention of CoreDNS at the bottom but nothing looks terribly off?

Aug  5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.414243     118 event.go:307] "Event occurred" object="kube-system/coredns" fieldPath="" kind="Deployment" apiVersion="apps/v1" type="Normal" reason="ScalingReplicaSet" message="Scaled up replica set coredns-8557665db to 2"
Aug  5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.565153     118 event.go:307] "Event occurred" object="kube-system/coredns-8557665db" fieldPath="" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: coredns-8557665db-ckfzd"
Aug  5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.576527     118 shared_informer.go:318] Caches are synced for garbage collector
Aug  5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.577719     118 event.go:307] "Event occurred" object="kube-system/coredns-8557665db" fieldPath="" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: coredns-8557665db-9rzc4"
Aug  5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.608960     118 shared_informer.go:318] Caches are synced for garbage collector
Aug  5 03:14:14 gffw-compute-a-001 kube-controller-manager.sh[9092]: I0805 03:14:14.608985     118 garbagecollector.go:166] "All resource monitors have synced. Proceeding to collect garbage"

If we are seeing successful create there, it could be whatever wait logic is checking for them has a bug.

aojea commented 1 year ago

But what is the error? Which component is failing to start?

vsoch commented 1 year ago

The error is the timeout shown in this comment: https://github.com/rootless-containers/usernetes/issues/281#issuecomment-1664866664

Which, given the log above that it started, I am wondering if the script that determines there is a timeout is issuing the wrong command / wrong permission to test. In other testing when I bring up the other services on the same node I see this master run to completion (with instructions to export my config path, etc.)

vsoch commented 1 year ago

That part of the script is here: https://github.com/rootless-containers/usernetes/blob/58df6ea63cc4a00425b80a088889015eedc96320/install.sh#L457 I’ll see if I can do some debugilging around that - after midnight here so I’ll turn into a pumpkin soon for sure! 🎃

vsoch commented 1 year ago

okay added some more debugging - the pods do seem to be created (when I list all namespaces) but they are pending:

+ kubectl get pods --all-namespaces
NAMESPACE     NAME                      READY   STATUS    RESTARTS   AGE
kube-system   coredns-8557665db-t257d   0/1     Pending   0          8s
kube-system   coredns-8557665db-wbj64   0/1     Pending   0          8s

okay so it's failing because there are no nodes to schedule pods:

Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  kube-api-access-sqv94:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 CriticalAddonsOnly op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  112s  default-scheduler  no nodes available to schedule pods
+ kubectl -n kube-system wait --for=condition=ready pod -l k8s-app=kube-dns

I assume this is a bit of a race that the worker nodes need to be started and connected - so let's try to start them. Now I can see the KUBECONFIG path and that the pods are pending to start:

$ export KUBECONFIG=/home/sochat1_llnl_gov/.config/usernetes/master/admin-localhost.kubeconfig
[sochat1_llnl_gov@gffw-compute-a-001 usernetes]$ bin/kubectl get pods --all-namespaces
NAMESPACE     NAME                      READY   STATUS    RESTARTS   AGE
kube-system   coredns-8557665db-t257d   0/1     Pending   0          3m20s
kube-system   coredns-8557665db-wbj64   0/1     Pending   0          3m20s

But also note it's probably concerning that the master isn't registering as a node (should it?) I think so, unless it's just serving as an empty sort of control plane?

$ bin/kubectl get nodes
No resources found

Let's try starting the other two nodes (from the docker compose config) on the other nodes. I tried this before, but maybe can get more debug output this time with the system logs on those nodes. Here is the second node that should start crio, e.g.,

    echo "I am compute node ${nodename} going to run crio"
    # 10250/tcp: kubelet, 8472/udp: flannel
    /bin/bash ./install.sh --wait-init-certs --start=u7s-node.target --cidr=10.0.101.0/24 --publish=0.0.0.0:10250:10250/tcp --publish=0.0.0.0:8472:8472/udp --cni=flannel --cri=crio
    sudo loginctl enable-linger

That doesn't change the state on the master node - still no nodes and pods in pending. Third node is containerd, and again no issues / obvious errors here:

    # 10250/tcp: kubelet, 8472/udp: flannel
    /bin/bash ./install.sh --wait-init-certs --start=u7s-node.target --cidr=10.0.102.0/24 --publish=0.0.0.0:10250:10250/tcp --publish=0.0.0.0:8472:8472/udp --cni=flannel --cri=containerd
    sudo loginctl enable-linger

no change in status on the master - no nodes and pods in pending. But the logs might tell a different story - here is containerd:

Logs for containerd node (003) ```console Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8514]: #033[104m#033[97m[INFO]#033[49m#033[39m RootlessKit ready, PID=8488, state directory=/run/user/501043911/usernetes/rootlesskit . Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8514]: #033[104m#033[97m[INFO]#033[49m#033[39m Hint: You can enter RootlessKit namespaces by running `nsenter -U --preserve-credential -n -m -t 8488`. Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8553]: 1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8553]: 2 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.142234096Z" level=info msg="starting containerd" revision=1677a17964311325ed1c31e2c0a3589ce6d5c30d version=v1.7.1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.158081729Z" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." type=io.containerd.content.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.161464233Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.native\"..." type=io.containerd.snapshotter.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.164346245Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.overlayfs\"..." type=io.containerd.snapshotter.v1 Aug 5 06:35:47 gffw-compute-a-003 kernel: overlayfs: unrecognized mount option "ro" or missing value Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.211113271Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.fuse-overlayfs\"..." type=io.containerd.snapshotter.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.213775959Z" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.215499913Z" level=info msg="metadata content store policy set" policy=shared Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.225727190Z" level=info msg="loading plugin \"io.containerd.differ.v1.walking\"..." type=io.containerd.differ.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.225752787Z" level=info msg="loading plugin \"io.containerd.event.v1.exchange\"..." type=io.containerd.event.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.225765149Z" level=info msg="loading plugin \"io.containerd.gc.v1.scheduler\"..." type=io.containerd.gc.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.225801537Z" level=info msg="loading plugin \"io.containerd.lease.v1.manager\"..." type=io.containerd.lease.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.225816556Z" level=info msg="loading plugin \"io.containerd.nri.v1.nri\"..." type=io.containerd.nri.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.225825838Z" level=info msg="NRI interface is disabled by configuration." Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.225834205Z" level=info msg="loading plugin \"io.containerd.runtime.v2.task\"..." type=io.containerd.runtime.v2 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.227531296Z" level=info msg="loading plugin \"io.containerd.runtime.v2.shim\"..." type=io.containerd.runtime.v2 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.227549174Z" level=info msg="loading plugin \"io.containerd.sandbox.store.v1.local\"..." type=io.containerd.sandbox.store.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.227567792Z" level=info msg="loading plugin \"io.containerd.sandbox.controller.v1.local\"..." type=io.containerd.sandbox.controller.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.227588207Z" level=info msg="loading plugin \"io.containerd.streaming.v1.manager\"..." type=io.containerd.streaming.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.227602087Z" level=info msg="loading plugin \"io.containerd.service.v1.introspection-service\"..." type=io.containerd.service.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.227613683Z" level=info msg="loading plugin \"io.containerd.service.v1.containers-service\"..." type=io.containerd.service.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.227629885Z" level=info msg="loading plugin \"io.containerd.service.v1.content-service\"..." type=io.containerd.service.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.227642274Z" level=info msg="loading plugin \"io.containerd.service.v1.diff-service\"..." type=io.containerd.service.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.227654154Z" level=info msg="loading plugin \"io.containerd.service.v1.images-service\"..." type=io.containerd.service.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.227664679Z" level=info msg="loading plugin \"io.containerd.service.v1.namespaces-service\"..." type=io.containerd.service.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.227674633Z" level=info msg="loading plugin \"io.containerd.service.v1.snapshots-service\"..." type=io.containerd.service.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.227684217Z" level=info msg="loading plugin \"io.containerd.runtime.v1.linux\"..." type=io.containerd.runtime.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.229384041Z" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." type=io.containerd.monitor.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.230487829Z" level=info msg="loading plugin \"io.containerd.service.v1.tasks-service\"..." type=io.containerd.service.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.230520183Z" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.230532517Z" level=info msg="loading plugin \"io.containerd.transfer.v1.local\"..." type=io.containerd.transfer.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.230553097Z" level=info msg="loading plugin \"io.containerd.internal.v1.restart\"..." type=io.containerd.internal.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.231553469Z" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." type=io.containerd.grpc.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.231571799Z" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." type=io.containerd.grpc.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.231582603Z" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." type=io.containerd.grpc.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.231593774Z" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." type=io.containerd.grpc.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.231606312Z" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." type=io.containerd.grpc.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.231626786Z" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." type=io.containerd.grpc.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.231642346Z" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." type=io.containerd.grpc.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.231672243Z" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." type=io.containerd.grpc.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.231683915Z" level=info msg="loading plugin \"io.containerd.internal.v1.opt\"..." type=io.containerd.internal.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.231726874Z" level=info msg="loading plugin \"io.containerd.grpc.v1.sandbox-controllers\"..." type=io.containerd.grpc.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.231738011Z" level=info msg="loading plugin \"io.containerd.grpc.v1.sandboxes\"..." type=io.containerd.grpc.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.231753696Z" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." type=io.containerd.grpc.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.231766988Z" level=info msg="loading plugin \"io.containerd.grpc.v1.streaming\"..." type=io.containerd.grpc.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.231777260Z" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." type=io.containerd.grpc.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.231788895Z" level=info msg="loading plugin \"io.containerd.grpc.v1.transfer\"..." type=io.containerd.grpc.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.231805366Z" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.231849271Z" level=info msg="loading plugin \"io.containerd.grpc.v1.cri\"..." type=io.containerd.grpc.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.232063571Z" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:fuse-overlayfs DefaultRuntimeName:crun DefaultRuntime:{Type: Path: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false PrivilegedWithoutHostDevicesAllDevicesAllowed:false BaseRuntimeSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0 Snapshotter: SandboxMode:} UntrustedWorkloadRuntime:{Type: Path: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false PrivilegedWithoutHostDevicesAllDevicesAllowed:false BaseRuntimeSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0 Snapshotter: SandboxMode:} Runtimes:map[crun:{Type:io.containerd.runc.v2 Path: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[BinaryName:crun] PrivilegedWithoutHostDevices:false PrivilegedWithoutHostDevicesAllDevicesAllowed:false BaseRuntimeSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0 Snapshotter: SandboxMode:podsandbox}] NoPivot:false DisableSnapshotAnnotations:true DiscardUnpackedLayers:false IgnoreBlockIONotEnabledErrors:false IgnoreRdtNotEnabledErrors:false} CniConfig:{NetworkPluginBinDir:/opt/cni/bin NetworkPluginConfDir:/etc/cni/net.d NetworkPluginMaxConfNum:1 NetworkPluginSetupSerially:false NetworkPluginConfTemplate: IPPreference:} Registry:{ConfigPath: Mirrors:map[] Configs:map[] Auths:map[] Headers:map[]} ImageDecryption:{KeyModel:node} DisableTCPService:true StreamServerAddress:127.0.0.1 StreamServerPort:0 StreamIdleTimeout:4h0m0s EnableSelinux:false SelinuxCategoryRange:1024 SandboxImage:registry.k8s.io/pause:3.8 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384 DisableCgroup:false DisableApparmor:true RestrictOOMScoreAdj:true MaxConcurrentDownloads:3 DisableProcMount:false UnsetSeccompProfile: TolerateMissingHugetlbController:true DisableHugetlbController:true DeviceOwnershipFromSecurityContext:false IgnoreImageDefinedVolumes:false NetNSMountsUnderStateDir:false EnableUnprivilegedPorts:false EnableUnprivilegedICMP:false EnableCDI:false CDISpecDirs:[/etc/cdi /var/run/cdi] ImagePullProgressTimeout:1m0s DrainExecSyncIOTimeout:0s} ContainerdRootDir:/home/sochat1_llnl_gov/.local/share/usernetes/containerd ContainerdEndpoint:/run/user/501043911/usernetes/containerd/containerd.sock RootDir:/home/sochat1_llnl_gov/.local/share/usernetes/containerd/io.containerd.grpc.v1.cri StateDir:/run/user/501043911/usernetes/containerd/io.containerd.grpc.v1.cri}" Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.232113383Z" level=info msg="Connect containerd service" Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.232137039Z" level=info msg="using legacy CRI server" Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.232142578Z" level=info msg="using experimental NRI integration - disable nri plugin to prevent this" Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.232164957Z" level=info msg="Get image filesystem path \"/home/sochat1_llnl_gov/.local/share/usernetes/containerd/io.containerd.snapshotter.v1.fuse-overlayfs\"" Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.232172647Z" level=warning msg="Running containerd in a user namespace typically requires disable_cgroup, disable_apparmor, restrict_oom_score_adj set to be true" Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.233987791Z" level=info msg="loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." type=io.containerd.tracing.processor.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.234013843Z" level=info msg="skip loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." error="no OpenTelemetry endpoint: skip plugin" type=io.containerd.tracing.processor.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.234024337Z" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1 Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.234035435Z" level=info msg="skipping tracing processor initialization (no tracing plugin)" error="no OpenTelemetry endpoint: skip plugin" Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.234061271Z" level=info msg="Start subscribing containerd event" Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.234446663Z" level=info msg="Start recovering state" Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.234375229Z" level=info msg=serving... address=/run/user/501043911/usernetes/containerd/containerd.sock.ttrpc Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.234564800Z" level=info msg=serving... address=/run/user/501043911/usernetes/containerd/containerd.sock Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.234582173Z" level=info msg="containerd successfully booted in 0.097304s" Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.234745496Z" level=info msg="Start event monitor" Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.234768235Z" level=info msg="Start snapshots syncer" Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.234779810Z" level=info msg="Start cni network conf syncer for default" Aug 5 06:35:47 gffw-compute-a-003 rootlesskit.sh[8558]: time="2023-08-05T06:35:47.234794960Z" level=info msg="Start streaming server" Aug 5 06:35:47 gffw-compute-a-003 flanneld.sh[8448]: #033[104m#033[97m[INFO]#033[49m#033[39m Entering RootlessKit namespaces: . Aug 5 06:35:47 gffw-compute-a-003 flanneld.sh[8592]: OK Aug 5 06:35:47 gffw-compute-a-003 containerd-fuse-overlayfs-grpc.sh[8449]: #033[104m#033[97m[INFO]#033[49m#033[39m Entering RootlessKit namespaces: . Aug 5 06:35:47 gffw-compute-a-003 containerd-fuse-overlayfs-grpc.sh[8591]: OK Aug 5 06:35:47 gffw-compute-a-003 kubelet-containerd.sh[8450]: #033[104m#033[97m[INFO]#033[49m#033[39m Entering RootlessKit namespaces: . Aug 5 06:35:47 gffw-compute-a-003 kubelet-containerd.sh[8593]: OK Aug 5 06:35:48 gffw-compute-a-003 containerd-fuse-overlayfs-grpc.sh[8607]: time="2023-08-05T06:35:48Z" level=info msg="containerd-fuse-overlayfs-grpc Version=\"v1.0.6\" Revision=\"a705ae6f22850358821ec1e7d968bc79003934ef\"" Aug 5 06:35:48 gffw-compute-a-003 systemd[8295]: Started Usernetes containerd-fuse-overlayfs-grpc service. Aug 5 06:35:48 gffw-compute-a-003 flanneld.sh[8606]: I0805 06:35:48.266905 82 main.go:211] CLI flags config: {etcdEndpoints:https://gffw-compute-a-001:2379 etcdPrefix:/coreos.com/network etcdKeyfile:/home/sochat1_llnl_gov/.config/usernetes/master/kubernetes-key.pem etcdCertfile:/home/sochat1_llnl_gov/.config/usernetes/master/kubernetes.pem etcdCAFile:/home/sochat1_llnl_gov/.config/usernetes/master/ca.pem etcdUsername: etcdPassword: version:false kubeSubnetMgr:false kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[tap0] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP:10.10.0.3 publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true useMultiClusterCidr:false} Aug 5 06:35:48 gffw-compute-a-003 flanneld.sh[8606]: W0805 06:35:48.267273 82 main.go:595] no subnet found for key: FLANNEL_SUBNET in file: /run/flannel/subnet.env Aug 5 06:35:48 gffw-compute-a-003 flanneld.sh[8606]: W0805 06:35:48.267302 82 main.go:630] no subnet found for key: FLANNEL_IPV6_SUBNET in file: /run/flannel/subnet.env Aug 5 06:35:48 gffw-compute-a-003 flanneld.sh[8606]: I0805 06:35:48.272397 82 main.go:231] Created subnet manager: Etcd Local Manager with Previous Subnet: None Aug 5 06:35:48 gffw-compute-a-003 flanneld.sh[8606]: I0805 06:35:48.272422 82 main.go:234] Installing signal handlers Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: Flag --container-runtime-endpoint has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information. Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.365522 84 server.go:415] "Kubelet version" kubeletVersion="v1.27.2" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.365568 84 server.go:417] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.373188 84 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/home/sochat1_llnl_gov/.config/usernetes/node/ca.pem" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.385098 84 server.go:662] "--cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.385802 84 container_manager_linux.go:266] "Container manager verified user specified cgroup-root exists" cgroupRoot=[] Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.385896 84 container_manager_linux.go:271] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: KubeletOOMScoreAdj:-999 ContainerRuntime: CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/home/sochat1_llnl_gov/.local/share/usernetes/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:nodefs.available Operator:LessThan Value:{Quantity: Percentage:0.03} GracePeriod:0s MinReclaim:}]} QOSReserved:map[] CPUManagerPolicy:none CPUManagerPolicyOptions:map[] TopologyManagerScope:container CPUManagerReconcilePeriod:10s ExperimentalMemoryManagerPolicy:None ExperimentalMemoryManagerReservedMemory:[] PodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms TopologyManagerPolicy:none ExperimentalTopologyManagerPolicyOptions:map[]} Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.385917 84 topology_manager.go:136] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.385930 84 container_manager_linux.go:302] "Creating device plugin manager" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.386267 84 state_mem.go:36] "Initialized new in-memory state store" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.586855 84 server.go:776] "Failed to ApplyOOMScoreAdj" err="write /proc/self/oom_score_adj: permission denied" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.597943 84 kubelet.go:405] "Attempting to sync node with API server" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.597981 84 kubelet.go:309] "Adding apiserver pod source" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.598613 84 apiserver.go:42] "Waiting for node sync before watching apiserver pods" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:35:48.599520 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-003&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:48.599583 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-003&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:35:48.599541 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:48.599621 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.600142 84 kuberuntime_manager.go:257] "Container runtime initialized" containerRuntime="containerd" version="v1.7.1" apiVersion="v1" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:48.605927 84 server.go:1157] "Failed to set rlimit on max file handles" err="operation not permitted" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.605950 84 server.go:1168] "Started kubelet" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.606123 84 ratelimit.go:65] "Setting rate limiting for podresources endpoint" qps=100 burstTokens=10 Aug 5 06:35:48 gffw-compute-a-003 systemd[8295]: Started Usernetes kubelet service (containerd). Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.606707 84 server.go:162] "Starting to listen" address="0.0.0.0" port=10250 Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:48.606932 84 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containerd/io.containerd.snapshotter.v1.fuse-overlayfs" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:48.606963 84 kubelet.go:1400] "Image garbage collection failed once. Stats initialization may not have completed yet" err="invalid capacity 0 on image filesystem" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.607476 84 server.go:461] "Adding debug handlers to kubelet server" Aug 5 06:35:48 gffw-compute-a-003 systemd[8295]: Started Usernetes kube-proxy service. Aug 5 06:35:48 gffw-compute-a-003 systemd[8295]: Reached target Usernetes target for Kubernetes node components (containerd). Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.609546 84 fs_resource_analyzer.go:67] "Starting FS ResourceAnalyzer" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.609619 84 volume_manager.go:284] "Starting Kubelet Volume Manager" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:48.609698 84 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"gffw-compute-a-003\" not found" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:48.609627 84 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-003.17786937d5a1b852", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-003", UID:"gffw-compute-a-003", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-003"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 35, 48, 605884498, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 35, 48, 605884498, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host'(may retry after sleeping) Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.609760 84 desired_state_of_world_populator.go:145] "Desired state populator starts to run" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:35:48.610470 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:48.610533 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:48.611356 84 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-003?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" interval="200ms" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:35:48.612458 84 manager.go:286] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: operation not permitted Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.618342 84 cpu_manager.go:214] "Starting CPU manager" policy="none" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.618360 84 cpu_manager.go:215] "Reconciling" reconcilePeriod="10s" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.618386 84 state_mem.go:36] "Initialized new in-memory state store" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.619678 84 state_mem.go:88] "Updated default CPUSet" cpuSet="" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.619702 84 state_mem.go:96] "Updated CPUSet assignments" assignments=map[] Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.619734 84 policy_none.go:49] "None policy: Start" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.620255 84 memory_manager.go:169] "Starting memorymanager" policy="None" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.620278 84 state_mem.go:35] "Initializing new in-memory state store" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.621085 84 state_mem.go:75] "Updated machine memory state" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.624264 84 manager.go:455] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:48.628257 84 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_containers\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containerd/io.containerd.snapshotter.v1.fuse-overlayfs" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.628687 84 plugin_manager.go:118] "Starting Kubelet Plugin Manager" Aug 5 06:35:48 gffw-compute-a-003 kube-proxy.sh[8654]: #033[104m#033[97m[INFO]#033[49m#033[39m Entering RootlessKit namespaces: Aug 5 06:35:48 gffw-compute-a-003 kube-proxy.sh[8677]: OK Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.711860 84 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-003" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:48.712759 84 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" node="gffw-compute-a-003" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.734680 84 kubelet_network_linux.go:63] "Initialized iptables rules." protocol=IPv4 Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.736842 84 kubelet_network_linux.go:63] "Initialized iptables rules." protocol=IPv6 Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.736883 84 status_manager.go:207] "Starting to sync pod status with apiserver" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.736906 84 kubelet.go:2257] "Starting kubelet main sync loop" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:48.737024 84 kubelet.go:2281] "Skipping pod synchronization" err="PLEG is not healthy: pleg has yet to be successful" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:35:48.737958 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:48.738011 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:48 gffw-compute-a-003 kube-proxy.sh[8683]: E0805 06:35:48.780659 133 node.go:130] Failed to retrieve node info: Get "https://gffw-compute-a-001:6443/api/v1/nodes/gffw-compute-a-003": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:48.813077 84 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-003?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" interval="400ms" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:48.824083 84 container_manager_linux.go:510] "Failed to ensure process in container with oom score" err="failed to apply oom score -999 to PID 84: write /proc/84/oom_score_adj: permission denied" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:48.916005 84 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-003" Aug 5 06:35:48 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:48.916910 84 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" node="gffw-compute-a-003" Aug 5 06:35:49 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:49.215022 84 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-003?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" interval="800ms" Aug 5 06:35:49 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:49.318752 84 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-003" Aug 5 06:35:49 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:49.319627 84 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" node="gffw-compute-a-003" Aug 5 06:35:49 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:35:49.638139 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:49 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:49.638178 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:49 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:35:49.899073 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-003&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:49 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:49.899112 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-003&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:49 gffw-compute-a-003 kube-proxy.sh[8683]: E0805 06:35:49.942758 133 node.go:130] Failed to retrieve node info: Get "https://gffw-compute-a-001:6443/api/v1/nodes/gffw-compute-a-003": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:50 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:50.016549 84 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-003?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" interval="1.6s" Aug 5 06:35:50 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:35:50.021197 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:50 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:50.021238 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:50 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:50.121515 84 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-003" Aug 5 06:35:50 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:50.122363 84 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" node="gffw-compute-a-003" Aug 5 06:35:50 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:35:50.248767 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:50 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:50.248801 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:50 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:50.542983 84 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-003.17786937d5a1b852", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-003", UID:"gffw-compute-a-003", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-003"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 35, 48, 605884498, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 35, 48, 605884498, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host'(may retry after sleeping) Aug 5 06:35:51 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:35:51.383144 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:51 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:51.383178 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:51 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:51.617770 84 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-003?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" interval="3.2s" Aug 5 06:35:51 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:51.724251 84 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-003" Aug 5 06:35:51 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:51.725124 84 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" node="gffw-compute-a-003" Aug 5 06:35:51 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:35:51.805003 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:51 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:51.805039 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:52 gffw-compute-a-003 kube-proxy.sh[8683]: E0805 06:35:52.098101 133 node.go:130] Failed to retrieve node info: Get "https://gffw-compute-a-001:6443/api/v1/nodes/gffw-compute-a-003": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:52 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:35:52.139608 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:52 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:52.139641 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:53 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:35:53.012779 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-003&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:53 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:53.012817 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-003&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:54 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:54.819766 84 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-003?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" interval="6.4s" Aug 5 06:35:54 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:35:54.926354 84 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-003" Aug 5 06:35:54 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:54.927679 84 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" node="gffw-compute-a-003" Aug 5 06:35:55 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:35:55.576123 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:55 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:55.576175 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:56 gffw-compute-a-003 kube-proxy.sh[8683]: E0805 06:35:56.104314 133 node.go:130] Failed to retrieve node info: Get "https://gffw-compute-a-001:6443/api/v1/nodes/gffw-compute-a-003": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:56 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:35:56.620010 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:56 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:56.620061 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:57 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:35:57.627762 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:57 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:57.627819 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:58 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:58.629384 84 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_kubelet\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containerd/io.containerd.snapshotter.v1.fuse-overlayfs" Aug 5 06:35:58 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:35:58.829261 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-003&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:35:58 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:35:58.829322 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-003&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:00 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:00.544761 84 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-003.17786937d5a1b852", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-003", UID:"gffw-compute-a-003", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-003"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 35, 48, 605884498, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 35, 48, 605884498, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host'(may retry after sleeping) Aug 5 06:36:01 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:01.221179 84 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-003?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" interval="7s" Aug 5 06:36:01 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:36:01.329525 84 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-003" Aug 5 06:36:01 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:01.330400 84 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" node="gffw-compute-a-003" Aug 5 06:36:03 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:36:03.983049 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:03 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:03.983084 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:05 gffw-compute-a-003 kube-proxy.sh[8683]: E0805 06:36:05.042370 133 node.go:130] Failed to retrieve node info: Get "https://gffw-compute-a-001:6443/api/v1/nodes/gffw-compute-a-003": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:05 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:36:05.243298 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:05 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:05.243336 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:07 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:36:07.715387 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:07 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:07.715420 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:08 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:08.222603 84 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-003?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" interval="7s" Aug 5 06:36:08 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:36:08.331897 84 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-003" Aug 5 06:36:08 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:08.332808 84 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" node="gffw-compute-a-003" Aug 5 06:36:08 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:36:08.431517 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-003&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:08 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:08.431551 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-003&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:08 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:08.631230 84 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_kubelet\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containerd/io.containerd.snapshotter.v1.fuse-overlayfs" Aug 5 06:36:10 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:10.545895 84 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-003.17786937d5a1b852", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-003", UID:"gffw-compute-a-003", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-003"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 35, 48, 605884498, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 35, 48, 605884498, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host'(may retry after sleeping) Aug 5 06:36:15 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:15.224409 84 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-003?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" interval="7s" Aug 5 06:36:15 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:36:15.334736 84 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-003" Aug 5 06:36:15 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:15.335628 84 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" node="gffw-compute-a-003" Aug 5 06:36:18 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:18.633002 84 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_kubelet\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containerd/io.containerd.snapshotter.v1.fuse-overlayfs" Aug 5 06:36:20 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:20.547192 84 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-003.17786937d5a1b852", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-003", UID:"gffw-compute-a-003", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-003"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 35, 48, 605884498, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 35, 48, 605884498, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host'(may retry after sleeping) Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: E0805 06:36:22.050381 133 node.go:130] Failed to retrieve node info: Get "https://gffw-compute-a-001:6443/api/v1/nodes/gffw-compute-a-003": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: I0805 06:36:22.050418 133 server.go:822] "Can't determine this node's IP, assuming 127.0.0.1; if this is incorrect, please set the --bind-address flag" Aug 5 06:36:22 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:36:22.051556 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:22 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:22.051594 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: I0805 06:36:22.051588 133 server_others.go:110] "Detected node IP" address="127.0.0.1" Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: I0805 06:36:22.058683 133 server_others.go:190] "Using iptables Proxier" Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: I0805 06:36:22.058709 133 server_others.go:197] "kube-proxy running in dual-stack mode" ipFamily=IPv4 Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: I0805 06:36:22.058723 133 server_others.go:198] "Creating dualStackProxier for iptables" Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: I0805 06:36:22.058731 133 server_others.go:465] "Detect-local-mode set to ClusterCIDR, but no cluster CIDR defined" Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: I0805 06:36:22.058739 133 server_others.go:521] "Defaulting to no-op detect-local" detectLocalMode="ClusterCIDR" Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: I0805 06:36:22.059479 133 proxier.go:253] "Setting route_localnet=1 to allow node-ports on localhost; to change this either disable iptables.localhostNodePorts (--iptables-localhost-nodeports) or set nodePortAddresses (--nodeport-addresses) to filter loopback addresses" Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: I0805 06:36:22.061612 133 server.go:657] "Version info" version="v1.27.2" Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: I0805 06:36:22.061637 133 server.go:659] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" Aug 5 06:36:22 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:22.226040 84 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-003?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" interval="7s" Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: I0805 06:36:22.264232 133 config.go:188] "Starting service config controller" Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: I0805 06:36:22.264231 133 config.go:97] "Starting endpoint slice config controller" Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: I0805 06:36:22.264758 133 config.go:315] "Starting node config controller" Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: I0805 06:36:22.265236 133 shared_informer.go:311] Waiting for caches to sync for service config Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: I0805 06:36:22.265243 133 shared_informer.go:311] Waiting for caches to sync for node config Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: I0805 06:36:22.265241 133 shared_informer.go:311] Waiting for caches to sync for endpoint slice config Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: E0805 06:36:22.267703 133 event_broadcaster.go:274] Unable to write event: 'Post "https://gffw-compute-a-001:6443/apis/events.k8s.io/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host' (may retry after sleeping) Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: W0805 06:36:22.268036 133 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-003&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: E0805 06:36:22.268094 133 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-003&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: W0805 06:36:22.268040 133 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: E0805 06:36:22.268115 133 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: W0805 06:36:22.268037 133 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:22 gffw-compute-a-003 kube-proxy.sh[8683]: E0805 06:36:22.268136 133 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:22 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:36:22.337313 84 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-003" Aug 5 06:36:22 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:22.338269 84 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" node="gffw-compute-a-003" Aug 5 06:36:23 gffw-compute-a-003 kube-proxy.sh[8683]: W0805 06:36:23.324757 133 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-003&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:23 gffw-compute-a-003 kube-proxy.sh[8683]: E0805 06:36:23.324790 133 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-003&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:23 gffw-compute-a-003 kube-proxy.sh[8683]: W0805 06:36:23.506264 133 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:23 gffw-compute-a-003 kube-proxy.sh[8683]: E0805 06:36:23.506303 133 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:23 gffw-compute-a-003 kube-proxy.sh[8683]: W0805 06:36:23.665464 133 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:23 gffw-compute-a-003 kube-proxy.sh[8683]: E0805 06:36:23.665498 133 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:25 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:36:25.006139 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:25 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:25.006174 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:25 gffw-compute-a-003 kube-proxy.sh[8683]: W0805 06:36:25.156987 133 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-003&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:25 gffw-compute-a-003 kube-proxy.sh[8683]: E0805 06:36:25.157025 133 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-003&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:25 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:36:25.338294 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:25 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:25.338329 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:25 gffw-compute-a-003 kube-proxy.sh[8683]: W0805 06:36:25.352242 133 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:25 gffw-compute-a-003 kube-proxy.sh[8683]: E0805 06:36:25.352274 133 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:25 gffw-compute-a-003 kube-proxy.sh[8683]: W0805 06:36:25.925701 133 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:25 gffw-compute-a-003 kube-proxy.sh[8683]: E0805 06:36:25.925753 133 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:28 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:28.634076 84 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_containers\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containerd/io.containerd.snapshotter.v1.fuse-overlayfs" Aug 5 06:36:29 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:29.227265 84 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-003?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" interval="7s" Aug 5 06:36:29 gffw-compute-a-003 kubelet-containerd.sh[8608]: I0805 06:36:29.339503 84 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-003" Aug 5 06:36:29 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:29.340416 84 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host" node="gffw-compute-a-003" Aug 5 06:36:29 gffw-compute-a-003 kube-proxy.sh[8683]: W0805 06:36:29.679020 133 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:29 gffw-compute-a-003 kube-proxy.sh[8683]: E0805 06:36:29.679065 133 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:30 gffw-compute-a-003 kubelet-containerd.sh[8608]: W0805 06:36:30.166277 84 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-003&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:30 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:30.166313 84 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-003&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:30 gffw-compute-a-003 kube-proxy.sh[8683]: W0805 06:36:30.328492 133 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-003&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:30 gffw-compute-a-003 kube-proxy.sh[8683]: E0805 06:36:30.328527 133 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-003&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:30 gffw-compute-a-003 kubelet-containerd.sh[8608]: E0805 06:36:30.549054 84 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-003.17786937d5a1b852", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-003", UID:"gffw-compute-a-003", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-003"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 35, 48, 605884498, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 35, 48, 605884498, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host'(may retry after sleeping) Aug 5 06:36:31 gffw-compute-a-003 kube-proxy.sh[8683]: W0805 06:36:31.166517 133 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host Aug 5 06:36:31 gffw-compute-a-003 kube-proxy.sh[8683]: E0805 06:36:31.166564 133 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.102.3:53: no such host ```

What I see there is that it can't find gffw-compute-a-001. Note that I can at least ping the node:

$ ping gffw-compute-a-001
PING gffw-compute-a-001.c.llnl-flux.internal (10.10.0.5) 56(84) bytes of data.
64 bytes from gffw-compute-a-001.c.llnl-flux.internal (10.10.0.5): icmp_seq=1 ttl=64 time=0.602 ms

But hmm - if that's a service, possibly that error about the service we saw above:

100 instance.go:388] Could not construct pre-rendered responses for ServiceAccountIssuerDiscovery endpoints. Endpoints will not be enabled. Error: issuer URL must use https scheme, got: kubernetes.default.svc

if that is the "discovery endpoint" indeed it cannot be discovered! For completeness let's also inspect the crio node (002) - will put this in another comment because this one is too long :)

vsoch commented 1 year ago
Logs for CRIO node (002) ```console Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8484]: #033[104m#033[97m[INFO]#033[49m#033[39m RootlessKit ready, PID=8459, state directory=/run/user/501043911/usernetes/rootlesskit . Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8484]: #033[104m#033[97m[INFO]#033[49m#033[39m Hint: You can enter RootlessKit namespaces by running `nsenter -U --preserve-credential -n -m -t 8459`. Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8524]: 1 Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8524]: 2 Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.714655421Z" level=info msg="Starting CRI-O, version: 1.27.0, git: 11d8079ee81fb928b37fdef01882bd6977d68d3d(clean)" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.718447496Z" level=info msg="Node configuration value for hugetlb cgroup is false" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.718469239Z" level=info msg="Node configuration value for pid cgroup is true" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.718521907Z" level=info msg="Node configuration value for memoryswap cgroup is true" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.718527796Z" level=info msg="Node configuration value for cgroup v2 is true" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.723512748Z" level=warning msg="node configuration validation for systemd CollectMode failed: check systemd CollectMode: exit status 1" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.723530389Z" level=info msg="Node configuration value for systemd CollectMode is false" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.727645844Z" level=warning msg="node configuration validation for systemd AllowedCPUs failed: check systemd AllowedCPUs: exit status 1" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.727658504Z" level=info msg="Node configuration value for systemd AllowedCPUs is false" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.742010561Z" level=warning msg="Network file system detected as backing store. Enforcing overlay option `force_mask=\"700\"`. Add it to storage.conf to silence this warning" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.764689995Z" level=info msg="Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.764915346Z" level=warning msg="'runc is being ignored due to: \"\\\"runc\\\" not found in $PATH: exec: \\\"runc\\\": executable file not found in $PATH\"" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.768852955Z" level=info msg="Checkpoint/restore support disabled" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.768869209Z" level=info msg="Using seccomp default profile when unspecified: true" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.768875201Z" level=info msg="Using the internal default seccomp profile" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.768880423Z" level=info msg="AppArmor is disabled by the system or at CRI-O build-time" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.768885392Z" level=info msg="No blockio config file specified, blockio not configured" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.768889665Z" level=info msg="RDT not available in the host system" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.769119947Z" level=info msg="Using conmon executable: /home/sochat1_llnl_gov/usernetes/bin/conmon" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.772241507Z" level=info msg="Conmon does support the --sync option" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.772258631Z" level=info msg="Conmon does support the --log-global-size-max option" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.787861818Z" level=info msg="Found CNI network cbr0 (type=flannel) at /etc/cni/net.d/10-flannel.conflist" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.796882079Z" level=info msg="Found CNI network u7s-bridge (type=bridge) at /etc/cni/net.d/50-bridge.conf" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.804327120Z" level=info msg="Found CNI network 99-loopback.conf (type=loopback) at /etc/cni/net.d/99-loopback.conf" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.804347569Z" level=info msg="Updated default CNI network name to cbr0" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.806117953Z" level=info msg="Attempting to restore irqbalance config from /etc/sysconfig/orig_irq_banned_cpus" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.806506241Z" level=info msg="Restore irqbalance config: failed to get current CPU ban list, ignoring" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.811892343Z" level=warning msg="Error encountered when checking whether cri-o should wipe containers: open /run/user/501043911/usernetes/crio/version: no such file or directory" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.813507397Z" level=info msg="Starting seccomp notifier watcher" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.813561075Z" level=info msg="Create NRI interface" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.813568356Z" level=info msg="NRI interface is disabled in the configuration." Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.813708595Z" level=error msg="Writing clean shutdown supported file: open /var/lib/crio/clean.shutdown.supported: no such file or directory" Aug 5 06:33:50 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:50.813725821Z" level=error msg="Failed to sync parent directory of clean shutdown file: open /var/lib/crio: no such file or directory" Aug 5 06:33:51 gffw-compute-a-002 flanneld.sh[8425]: #033[104m#033[97m[INFO]#033[49m#033[39m Entering RootlessKit namespaces: . Aug 5 06:33:51 gffw-compute-a-002 flanneld.sh[8582]: OK Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8426]: #033[104m#033[97m[INFO]#033[49m#033[39m Entering RootlessKit namespaces: . Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8583]: OK Aug 5 06:33:51 gffw-compute-a-002 flanneld.sh[8592]: I0805 06:33:51.561781 108 main.go:211] CLI flags config: {etcdEndpoints:https://gffw-compute-a-001:2379 etcdPrefix:/coreos.com/network etcdKeyfile:/home/sochat1_llnl_gov/.config/usernetes/master/kubernetes-key.pem etcdCertfile:/home/sochat1_llnl_gov/.config/usernetes/master/kubernetes.pem etcdCAFile:/home/sochat1_llnl_gov/.config/usernetes/master/ca.pem etcdUsername: etcdPassword: version:false kubeSubnetMgr:false kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[tap0] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP:10.10.0.4 publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true useMultiClusterCidr:false} Aug 5 06:33:51 gffw-compute-a-002 flanneld.sh[8592]: W0805 06:33:51.562082 108 main.go:595] no subnet found for key: FLANNEL_SUBNET in file: /run/flannel/subnet.env Aug 5 06:33:51 gffw-compute-a-002 flanneld.sh[8592]: W0805 06:33:51.562102 108 main.go:630] no subnet found for key: FLANNEL_IPV6_SUBNET in file: /run/flannel/subnet.env Aug 5 06:33:51 gffw-compute-a-002 flanneld.sh[8592]: I0805 06:33:51.568199 108 main.go:231] Created subnet manager: Etcd Local Manager with Previous Subnet: None Aug 5 06:33:51 gffw-compute-a-002 flanneld.sh[8592]: I0805 06:33:51.568215 108 main.go:234] Installing signal handlers Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: Flag --container-runtime-endpoint has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information. Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.627207 109 server.go:415] "Kubelet version" kubeletVersion="v1.27.2" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.627256 109 server.go:417] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.633417 109 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/home/sochat1_llnl_gov/.config/usernetes/node/ca.pem" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.644904 109 server.go:662] "--cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.645860 109 container_manager_linux.go:266] "Container manager verified user specified cgroup-root exists" cgroupRoot=[] Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.645926 109 container_manager_linux.go:271] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: KubeletOOMScoreAdj:-999 ContainerRuntime: CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/home/sochat1_llnl_gov/.local/share/usernetes/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:nodefs.available Operator:LessThan Value:{Quantity: Percentage:0.03} GracePeriod:0s MinReclaim:}]} QOSReserved:map[] CPUManagerPolicy:none CPUManagerPolicyOptions:map[] TopologyManagerScope:container CPUManagerReconcilePeriod:10s ExperimentalMemoryManagerPolicy:None ExperimentalMemoryManagerReservedMemory:[] PodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms TopologyManagerPolicy:none ExperimentalTopologyManagerPolicyOptions:map[]} Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.645955 109 topology_manager.go:136] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.645965 109 container_manager_linux.go:302] "Creating device plugin manager" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.647493 109 state_mem.go:36] "Initialized new in-memory state store" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.847995 109 server.go:776] "Failed to ApplyOOMScoreAdj" err="write /proc/self/oom_score_adj: permission denied" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.859454 109 kubelet.go:405] "Attempting to sync node with API server" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.859848 109 kubelet.go:309] "Adding apiserver pod source" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.860887 109 apiserver.go:42] "Waiting for node sync before watching apiserver pods" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:33:51.862224 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:51.862281 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:33:51.862306 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:51.862377 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.862631 109 kuberuntime_manager.go:257] "Container runtime initialized" containerRuntime="cri-o" version="1.27.0" apiVersion="v1" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:33:51.867649 109 probe.go:268] Flexvolume plugin directory at /home/sochat1_llnl_gov/.local/share/usernetes/kubelet-plugins-exec does not exist. Recreating. Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:51.870089 109 server.go:1157] "Failed to set rlimit on max file handles" err="operation not permitted" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.870105 109 server.go:1168] "Started kubelet" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.870286 109 ratelimit.go:65] "Setting rate limiting for podresources endpoint" qps=100 burstTokens=10 Aug 5 06:33:51 gffw-compute-a-002 systemd[4997]: Started Usernetes kubelet service (crio). Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.870760 109 server.go:162] "Starting to listen" address="0.0.0.0" port=10250 Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.871590 109 server.go:461] "Adding debug handlers to kubelet server" Aug 5 06:33:51 gffw-compute-a-002 systemd[4997]: Started Usernetes kube-proxy service. Aug 5 06:33:51 gffw-compute-a-002 systemd[4997]: Reached target Usernetes target for Kubernetes node components (crio). Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:51.872588 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:51.872632 109 kubelet.go:1400] "Image garbage collection failed once. Stats initialization may not have completed yet" err="invalid capacity 0 on image filesystem" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:51.872798 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7a1ff89", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.882062 109 fs_resource_analyzer.go:67] "Starting FS ResourceAnalyzer" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.882124 109 volume_manager.go:284] "Starting Kubelet Volume Manager" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:51.882146 109 kubelet_node_status.go:458] "Error getting the current node from lister" err="node \"gffw-compute-a-002\" not found" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.882233 109 desired_state_of_world_populator.go:145] "Desired state populator starts to run" Aug 5 06:33:51 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:51.882511597Z" level=info msg="Checking image status: registry.k8s.io/pause:3.9" id=d57adeff-5862-4567-8e5d-37042b6833b7 name=/runtime.v1.ImageService/ImageStatus Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:33:51.883407 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:51.883461 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:51.884235 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="200ms" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:33:51.884502 109 manager.go:286] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: operation not permitted Aug 5 06:33:51 gffw-compute-a-002 kube-proxy.sh[8625]: #033[104m#033[97m[INFO]#033[49m#033[39m Entering RootlessKit namespaces: Aug 5 06:33:51 gffw-compute-a-002 kube-proxy.sh[8646]: OK Aug 5 06:33:51 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:33:51.892386822Z" level=info msg="Image registry.k8s.io/pause:3.9 not found" id=d57adeff-5862-4567-8e5d-37042b6833b7 name=/runtime.v1.ImageService/ImageStatus Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.893314 109 cpu_manager.go:214] "Starting CPU manager" policy="none" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.893331 109 cpu_manager.go:215] "Reconciling" reconcilePeriod="10s" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.893542 109 state_mem.go:36] "Initialized new in-memory state store" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.908419 109 policy_none.go:49] "None policy: Start" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.908904 109 memory_manager.go:169] "Starting memorymanager" policy="None" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.909073 109 state_mem.go:35] "Initializing new in-memory state store" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.917412 109 manager.go:455] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:51.919438 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_kubelet\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.919873 109 plugin_manager.go:118] "Starting Kubelet Plugin Manager" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.962588 109 kubelet_network_linux.go:63] "Initialized iptables rules." protocol=IPv4 Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.964623 109 kubelet_network_linux.go:63] "Initialized iptables rules." protocol=IPv6 Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.964659 109 status_manager.go:207] "Starting to sync pod status with apiserver" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.964691 109 kubelet.go:2257] "Starting kubelet main sync loop" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:51.964890 109 kubelet.go:2281] "Skipping pod synchronization" err="PLEG is not healthy: pleg has yet to be successful" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:33:51.966246 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:51.966300 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:51.983887 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:33:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:51.984679 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:33:52 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:33:52.023268 147 node.go:130] Failed to retrieve node info: Get "https://gffw-compute-a-001:6443/api/v1/nodes/gffw-compute-a-002": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:52 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:52.085916 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="400ms" Aug 5 06:33:52 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:52.117983 109 container_manager_linux.go:510] "Failed to ensure process in container with oom score" err="failed to apply oom score -999 to PID 109: write /proc/109/oom_score_adj: permission denied" Aug 5 06:33:52 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:52.186293 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:33:52 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:52.186997 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:33:52 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:52.486938 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="800ms" Aug 5 06:33:52 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:52.588209 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:33:52 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:52.588988 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:33:53 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:33:53.060380 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:53 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:53.060808 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:53 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:33:53.103490 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:53 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:53.103521 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:53 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:33:53.205319 147 node.go:130] Failed to retrieve node info: Get "https://gffw-compute-a-001:6443/api/v1/nodes/gffw-compute-a-002": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:53 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:33:53.229714 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:53 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:53.229758 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:53 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:53.288543 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="1.6s" Aug 5 06:33:53 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:53.390702 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:33:53 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:53.391402 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:33:53 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:33:53.549677 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:53 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:53.549718 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:54 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:54.889592 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="3.2s" Aug 5 06:33:54 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:54.992650 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:33:54 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:54.993346 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:33:55 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:33:55.043218 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:55 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:55.043249 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:55 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:33:55.255393 147 node.go:130] Failed to retrieve node info: Get "https://gffw-compute-a-001:6443/api/v1/nodes/gffw-compute-a-002": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:55 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:55.316173 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7a1ff89", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:33:55 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:33:55.349217 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:55 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:55.349261 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:55 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:33:55.711516 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:55 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:55.711572 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:56 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:33:56.125943 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:56 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:56.125998 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:58 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:58.091494 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="6.4s" Aug 5 06:33:58 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:33:58.194929 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:33:58 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:58.195669 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:33:59 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:33:59.770965 147 node.go:130] Failed to retrieve node info: Get "https://gffw-compute-a-001:6443/api/v1/nodes/gffw-compute-a-002": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:59 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:33:59.936955 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:33:59 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:33:59.937012 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:00 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:34:00.604003 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:00 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:00.604062 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:00 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:34:00.691381 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:00 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:00.691417 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:00 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:34:00.722129 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:00 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:00.722155 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:01 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:01.921212 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:34:04 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:04.492431 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:34:04 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:34:04.596696 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:34:04 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:04.597477 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:34:05 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:05.317808 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7a1ff89", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:34:07 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:34:07.347662 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:07 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:07.347698 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:08 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:34:08.631285 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:08 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:08.631325 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:08 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:34:08.807015 147 node.go:130] Failed to retrieve node info: Get "https://gffw-compute-a-001:6443/api/v1/nodes/gffw-compute-a-002": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:10 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:34:10.749788 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:10 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:10.749826 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:11 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:11.493429 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:34:11 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:34:11.598702 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:34:11 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:11.599492 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:34:11 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:34:11.652612 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:11 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:11.652643 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:11 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:11.923255 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:34:15 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:15.319682 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7a1ff89", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:34:18 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:18.495260 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:34:18 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:34:18.600437 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:34:18 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:18.601141 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:34:18 gffw-compute-a-002 systemd[1]: session-1.scope: Succeeded. Aug 5 06:34:18 gffw-compute-a-002 systemd-logind[1837]: Session 1 logged out. Waiting for processes to exit. Aug 5 06:34:18 gffw-compute-a-002 systemd-logind[1837]: Removed session 1. Aug 5 06:34:21 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:21.924133 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_kubelet\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:34:25 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:25.321527 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7a1ff89", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:34:25 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:25.496816 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:34:25 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:34:25.603147 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:34:25 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:25.603981 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:34:26 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:34:26.058639 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:26 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:26.058674 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:34:27.369740 147 node.go:130] Failed to retrieve node info: Get "https://gffw-compute-a-001:6443/api/v1/nodes/gffw-compute-a-002": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: I0805 06:34:27.369773 147 server.go:822] "Can't determine this node's IP, assuming 127.0.0.1; if this is incorrect, please set the --bind-address flag" Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: I0805 06:34:27.369798 147 server_others.go:110] "Detected node IP" address="127.0.0.1" Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: I0805 06:34:27.377874 147 server_others.go:190] "Using iptables Proxier" Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: I0805 06:34:27.377902 147 server_others.go:197] "kube-proxy running in dual-stack mode" ipFamily=IPv4 Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: I0805 06:34:27.377911 147 server_others.go:198] "Creating dualStackProxier for iptables" Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: I0805 06:34:27.377917 147 server_others.go:465] "Detect-local-mode set to ClusterCIDR, but no cluster CIDR defined" Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: I0805 06:34:27.377924 147 server_others.go:521] "Defaulting to no-op detect-local" detectLocalMode="ClusterCIDR" Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: I0805 06:34:27.379701 147 proxier.go:253] "Setting route_localnet=1 to allow node-ports on localhost; to change this either disable iptables.localhostNodePorts (--iptables-localhost-nodeports) or set nodePortAddresses (--nodeport-addresses) to filter loopback addresses" Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: I0805 06:34:27.380890 147 server.go:657] "Version info" version="v1.27.2" Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: I0805 06:34:27.380913 147 server.go:659] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: I0805 06:34:27.583370 147 config.go:188] "Starting service config controller" Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: I0805 06:34:27.583457 147 config.go:97] "Starting endpoint slice config controller" Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: I0805 06:34:27.583905 147 config.go:315] "Starting node config controller" Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: I0805 06:34:27.584469 147 shared_informer.go:311] Waiting for caches to sync for node config Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: I0805 06:34:27.584466 147 shared_informer.go:311] Waiting for caches to sync for endpoint slice config Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: I0805 06:34:27.584472 147 shared_informer.go:311] Waiting for caches to sync for service config Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:34:27.586491 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:34:27.586510 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:34:27.586570 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:34:27.586586 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:34:27.586633 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:34:27.586663 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:27 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:34:27.587084 147 event_broadcaster.go:274] Unable to write event: 'Post "https://gffw-compute-a-001:6443/apis/events.k8s.io/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host' (may retry after sleeping) Aug 5 06:34:27 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:34:27.955056 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:27 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:27.955093 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:28 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:34:28.498259 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:28 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:34:28.498305 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:28 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:34:28.654522 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:28 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:34:28.654561 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:29 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:34:29.036265 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:29 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:34:29.036303 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:29 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:34:29.858971 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:29 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:29.859005 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:30 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:34:30.253768 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:30 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:34:30.253807 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:30 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:34:30.941103 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:30 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:34:30.941137 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:31 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:31.925776 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_kubelet\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:34:32 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:34:32.066527 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:32 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:34:32.066564 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:32 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:34:32.480606 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:32 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:32.480644 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:32 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:32.498411 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:34:32 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:34:32.605702 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:34:32 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:32.606478 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:34:35 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:35.322705 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7a1ff89", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:34:35 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:34:35.818016 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:35 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:34:35.818060 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:36 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:34:36.130576 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:36 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:34:36.130624 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:37 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:34:37.240896 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:37 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:34:37.240930 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:39 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:34:39.172889 147 event_broadcaster.go:274] Unable to write event: 'Post "https://gffw-compute-a-001:6443/apis/events.k8s.io/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host' (may retry after sleeping) Aug 5 06:34:39 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:39.500203 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:34:39 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:34:39.607527 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:34:39 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:39.608426 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:34:41 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:41.928581 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/usernetes/bin/cni\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:34:44 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:34:44.478799 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:44 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:34:44.478838 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:45 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:45.323939 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7a1ff89", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:34:46 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:34:46.469278 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:46 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:34:46.469319 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:46 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:46.501554 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:34:46 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:34:46.609753 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:34:46 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:46.610500 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:34:49 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:34:49.117203 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:49 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:34:49.117238 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:51 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:34:51.730483 147 event_broadcaster.go:274] Unable to write event: 'Post "https://gffw-compute-a-001:6443/apis/events.k8s.io/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host' (may retry after sleeping) Aug 5 06:34:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:51.929530 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_kubelet\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:34:53 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:53.502513 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:34:53 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:34:53.611805 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:34:53 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:53.612989 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:34:55 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:55.325265 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7a1ff89", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:34:58 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:34:58.163666 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:34:58 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:34:58.163699 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:00 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:00.504078 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:35:00 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:35:00.614333 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:35:00 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:00.615128 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:35:01 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:01.931132 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_log\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:35:03 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:35:03.310481 147 event_broadcaster.go:274] Unable to write event: 'Post "https://gffw-compute-a-001:6443/apis/events.k8s.io/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host' (may retry after sleeping) Aug 5 06:35:04 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:35:04.834643 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:04 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:35:04.834689 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:05 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:05.326430 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7a1ff89", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:35:05 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:35:05.502886 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:05 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:05.502920 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:07 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:07.507743 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:35:07 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:35:07.617035 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:35:07 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:07.617900 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:35:07 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:35:07.893848 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:07 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:35:07.893882 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:10 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:35:10.984449 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:10 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:35:10.984488 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:11 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:11.932542 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_kubelet\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:35:12 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:35:12.932283 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:12 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:12.932317 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:14 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:14.509030 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:35:14 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:35:14.619360 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:35:14 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:14.620136 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:35:14 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:35:14.649045 147 event_broadcaster.go:274] Unable to write event: 'Post "https://gffw-compute-a-001:6443/apis/events.k8s.io/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host' (may retry after sleeping) Aug 5 06:35:14 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:35:14.728320 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:14 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:14.728354 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:15 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:15.328391 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7a1ff89", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:35:21 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:21.538760 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:35:21 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:35:21.622091 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:35:21 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:21.623894 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:35:21 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:21.933776 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_kubelet\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:35:25 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:25.330208 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7a1ff89", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:35:26 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:35:26.923940 147 event_broadcaster.go:274] Unable to write event: 'Post "https://gffw-compute-a-001:6443/apis/events.k8s.io/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host' (may retry after sleeping) Aug 5 06:35:28 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:28.540618 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:35:28 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:35:28.625906 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:35:28 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:28.626644 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:35:30 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:35:30.593133 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:30 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:35:30.593172 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:30 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:35:30.769282 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:30 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:30.769338 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:31 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:31.935360 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_log\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:35:33 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:35:33.992591 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:33 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:35:33.992627 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:35 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:35.331415 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7a1ff89", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:35:35 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:35.331493 109 event.go:228] Unable to write event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7a1ff89", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 870033801, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}' (retry limit exceeded!) Aug 5 06:35:35 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:35.332093 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7c948f0", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"InvalidDiskCapacity", Message:"invalid capacity 0 on image filesystem", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), Count:1, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:35:35 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:35.541593 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:35:35 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:35:35.627817 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:35:35 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:35.628610 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:35:37 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:35:37.622285 147 event_broadcaster.go:274] Unable to write event: 'Post "https://gffw-compute-a-001:6443/apis/events.k8s.io/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host' (may retry after sleeping) Aug 5 06:35:37 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:35:37.793620 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:37 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:35:37.793655 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:41 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:41.936458 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:35:42 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:42.543511 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:35:42 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:35:42.630320 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:35:42 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:42.631167 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:35:42 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:42.840901 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7c948f0", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"InvalidDiskCapacity", Message:"invalid capacity 0 on image filesystem", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), Count:1, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:35:48 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:35:48.739348 147 event_broadcaster.go:274] Unable to write event: 'Post "https://gffw-compute-a-001:6443/apis/events.k8s.io/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host' (may retry after sleeping) Aug 5 06:35:49 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:49.544759 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:35:49 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:35:49.632888 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:35:49 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:49.633584 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:35:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:51.937184 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_kubelet\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:35:52 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:52.842652 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7c948f0", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"InvalidDiskCapacity", Message:"invalid capacity 0 on image filesystem", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), Count:1, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:35:53 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:35:53.616157 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:53 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:53.616212 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:54 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:35:54.091547 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:54 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:54.091601 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:35:56 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:56.546064 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:35:56 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:35:56.635546 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:35:56 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:35:56.636302 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:35:59 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:35:59.994371 147 event_broadcaster.go:274] Unable to write event: 'Post "https://gffw-compute-a-001:6443/apis/events.k8s.io/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host' (may retry after sleeping) Aug 5 06:36:01 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:01.938944 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_cni\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:36:02 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:02.843897 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7c948f0", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"InvalidDiskCapacity", Message:"invalid capacity 0 on image filesystem", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), Count:1, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:36:03 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:03.547150 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:36:03 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:36:03.637543 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:36:03 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:03.638295 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:36:05 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:36:05.063999 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:36:05 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:05.064034 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:36:08 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:36:08.066683 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:36:08 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:08.066734 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:36:10 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:10.548444 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:36:10 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:36:10.639776 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:36:10 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:10.640509 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:36:11 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:36:11.396583 147 event_broadcaster.go:274] Unable to write event: 'Post "https://gffw-compute-a-001:6443/apis/events.k8s.io/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host' (may retry after sleeping) Aug 5 06:36:11 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:11.941762 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_log\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:36:12 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:12.845097 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7c948f0", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"InvalidDiskCapacity", Message:"invalid capacity 0 on image filesystem", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), Count:1, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:36:16 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:36:16.207562 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:36:16 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:36:16.207595 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:36:17 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:17.550315 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:36:17 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:36:17.641615 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:36:17 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:17.642335 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:36:21 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:36:21.661748 147 event_broadcaster.go:274] Unable to write event: 'Post "https://gffw-compute-a-001:6443/apis/events.k8s.io/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host' (may retry after sleeping) Aug 5 06:36:21 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:36:21.785971 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:36:21 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:36:21.786005 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:36:21 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:21.942612 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_containers\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:36:22 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:22.847191 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7c948f0", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"InvalidDiskCapacity", Message:"invalid capacity 0 on image filesystem", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), Count:1, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:36:24 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:24.551264 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:36:24 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:36:24.643624 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:36:24 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:24.644369 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:36:25 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:36:25.310909 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:36:25 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:25.310942 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:36:31 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:36:31.378763 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:36:31 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:36:31.378828 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:36:31 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:31.552477 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:36:31 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:36:31.646114 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:36:31 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:31.646917 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:36:31 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:36:31.664101 147 event_broadcaster.go:274] Unable to write event: 'Post "https://gffw-compute-a-001:6443/apis/events.k8s.io/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host' (may retry after sleeping) Aug 5 06:36:31 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:36:31.664122 147 event_broadcaster.go:216] Unable to write event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.17786924f84fd3a2", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, EventTime:time.Date(2023, time.August, 5, 6, 34, 27, 583340463, time.Local), Series:(*v1.EventSeries)(nil), ReportingController:"kube-proxy", ReportingInstance:"kube-proxy-gffw-compute-a-002", Action:"StartKubeProxy", Reason:"Starting", Regarding:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Related:(*v1.ObjectReference)(nil), Note:"", Type:"Normal", DeprecatedSource:v1.EventSource{Component:"", Host:""}, DeprecatedFirstTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeprecatedLastTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeprecatedCount:0}' (retry limit exceeded!) Aug 5 06:36:31 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:31.943558 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/usernetes/bin/cni\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:36:32 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:32.849205 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7c948f0", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"InvalidDiskCapacity", Message:"invalid capacity 0 on image filesystem", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), Count:1, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:36:38 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:38.553533 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:36:38 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:36:38.647833 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:36:38 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:38.648576 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:36:41 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:36:41.763220 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:36:41 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:41.763256 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:36:41 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:41.946199 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_cni\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:36:42 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:42.851085 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7c948f0", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"InvalidDiskCapacity", Message:"invalid capacity 0 on image filesystem", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), Count:1, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:36:45 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:45.554988 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:36:45 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:36:45.650373 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:36:45 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:45.651097 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:36:46 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:36:46.521952 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:36:46 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:36:46.522007 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:36:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:51.947624 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_cni\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:36:52 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:52.556804 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:36:52 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:36:52.652153 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:36:52 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:52.652866 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:36:52 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:36:52.714790 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:36:52 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:52.714817 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:36:52 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:52.852020 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7c948f0", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"InvalidDiskCapacity", Message:"invalid capacity 0 on image filesystem", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), Count:1, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:36:59 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:59.558107 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:36:59 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:36:59.654484 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:36:59 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:36:59.655204 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:37:01 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:37:01.474838 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:37:01 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:37:01.474917 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:37:01 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:37:01.915200 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:37:01 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:01.915248 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:37:01 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:01.950272 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_kubelet\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:37:02 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:02.853525 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7c948f0", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"InvalidDiskCapacity", Message:"invalid capacity 0 on image filesystem", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), Count:1, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:37:06 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:06.559328 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:37:06 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:37:06.656819 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:37:06 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:06.657617 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:37:11 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:11.952714 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/usernetes/bin/cni\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:37:12 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:37:12.453218 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:37:12 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:37:12.453259 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:37:12 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:12.855223 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7c948f0", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"InvalidDiskCapacity", Message:"invalid capacity 0 on image filesystem", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), Count:1, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:37:13 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:13.560326 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:37:13 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:37:13.658720 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:37:13 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:13.659467 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:37:20 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:20.561296 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:37:20 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:37:20.660719 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:37:20 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:20.661509 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:37:20 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:37:20.716538 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:37:20 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:20.716585 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:37:21 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:21.953451 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_kubelet\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:37:22 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:22.856764 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7c948f0", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"InvalidDiskCapacity", Message:"invalid capacity 0 on image filesystem", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), Count:1, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:37:22 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:22.856846 109 event.go:228] Unable to write event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca7c948f0", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"InvalidDiskCapacity", Message:"invalid capacity 0 on image filesystem", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 872608496, time.Local), Count:1, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}' (retry limit exceeded!) Aug 5 06:37:22 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:22.857378 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb3690", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node gffw-compute-a-002 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:37:23 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:37:23.530418 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:37:23 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:23.530466 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:37:25 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:25.271997 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb3690", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node gffw-compute-a-002 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:37:25 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:37:25.738868 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:37:25 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:37:25.738903 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:37:26 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:37:26.251639 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:37:26 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:26.251689 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:37:27 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:27.562650 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:37:27 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:37:27.662982 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:37:27 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:27.663692 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:37:31 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:31.955219 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/usernetes/bin/cni\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:37:33 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:37:33.137655 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:37:33 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:33.137690 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:37:34 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:34.563888 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:37:34 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:37:34.665373 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:37:34 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:34.666094 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:37:35 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:35.273413 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb3690", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node gffw-compute-a-002 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:37:39 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:37:39.581530 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:37:39 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:37:39.581567 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:37:41 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:41.565808 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:37:41 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:37:41.667511 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:37:41 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:41.668276 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:37:41 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:41.957661 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:37:45 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:45.275342 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb3690", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node gffw-compute-a-002 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:37:48 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:48.567591 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:37:48 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:37:48.669944 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:37:48 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:48.670701 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:37:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:51.958980 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_kubelet\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:37:55 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:55.276849 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb3690", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node gffw-compute-a-002 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:37:55 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:55.568742 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:37:55 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:37:55.672151 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:37:55 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:55.672897 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:37:57 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:37:57.812930 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:37:57 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:37:57.812965 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:38:01 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:38:01.466406 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:38:01 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:38:01.466454 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:38:01 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:01.961909 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/usernetes/bin/cni\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:38:02 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:02.570519 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:38:02 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:38:02.673859 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:38:02 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:02.674600 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:38:03 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:38:03.509887 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:38:03 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:38:03.509927 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:38:05 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:05.278230 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb3690", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node gffw-compute-a-002 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:38:09 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:09.571807 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:38:09 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:38:09.676196 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:38:09 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:09.676937 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:38:11 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:38:11.524684 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:38:11 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:11.524716 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:38:11 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:11.963148 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_kubelet\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:38:15 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:15.280198 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb3690", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node gffw-compute-a-002 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:38:16 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:16.573330 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:38:16 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:38:16.678813 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:38:16 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:16.679604 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:38:17 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:38:17.009828 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:38:17 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:38:17.009873 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:38:18 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:38:18.480057 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:38:18 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:18.480091 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:38:21 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:21.964234 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_log\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:38:23 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:23.574845 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:38:23 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:38:23.681196 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:38:23 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:23.681980 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:38:23 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:38:23.956333 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:38:23 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:23.956368 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:38:25 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:25.281554 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb3690", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node gffw-compute-a-002 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:38:30 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:30.576484 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:38:30 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:38:30.683879 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:38:30 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:30.684706 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:38:31 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:31.966320 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_kubelet\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:38:35 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:35.283445 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb3690", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node gffw-compute-a-002 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:38:35 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:38:35.876524 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:38:35 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:38:35.876578 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:38:37 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:37.578388 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:38:37 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:38:37.685856 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:38:37 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:37.686618 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:38:41 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:41.969192 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_log\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:38:44 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:44.580187 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:38:44 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:38:44.687681 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:38:44 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:44.688430 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:38:45 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:45.284805 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb3690", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node gffw-compute-a-002 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:38:49 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:38:49.958502 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:38:49 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:38:49.958557 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:38:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:51.658309 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:38:51 gffw-compute-a-002 systemd[1]: Starting Cleanup of Temporary Directories... Aug 5 06:38:51 gffw-compute-a-002 systemd[1]: systemd-tmpfiles-clean.service: Succeeded. Aug 5 06:38:51 gffw-compute-a-002 systemd[1]: Started Cleanup of Temporary Directories. Aug 5 06:38:51 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:38:51.690150 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:38:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:51.690998 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:38:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:51.874422 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_kubelet\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:38:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:51.874451 109 kubelet.go:1396] "Image garbage collection failed multiple times in a row" err="invalid capacity 0 on image filesystem" Aug 5 06:38:51 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:38:51.895466379Z" level=info msg="Checking image status: registry.k8s.io/pause:3.9" id=f800acef-b478-4a11-bda7-91b9153eb51c name=/runtime.v1.ImageService/ImageStatus Aug 5 06:38:51 gffw-compute-a-002 rootlesskit.sh[8529]: time="2023-08-05 06:38:51.898673039Z" level=info msg="Image registry.k8s.io/pause:3.9 not found" id=f800acef-b478-4a11-bda7-91b9153eb51c name=/runtime.v1.ImageService/ImageStatus Aug 5 06:38:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:51.970424 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_cni\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:38:52 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:52.318903 109 container_manager_linux.go:510] "Failed to ensure process in container with oom score" err="failed to apply oom score -999 to PID 109: write /proc/109/oom_score_adj: permission denied" Aug 5 06:38:55 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:55.286615 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb3690", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node gffw-compute-a-002 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:38:55 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:38:55.731899 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:38:55 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:55.731949 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:38:58 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:58.659994 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:38:58 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:38:58.692013 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:38:58 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:38:58.692680 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:39:00 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:39:00.042081 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:00 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:00.042136 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:01 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:39:01.151620 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:01 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:39:01.151675 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:01 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:01.973085 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/usernetes/bin/cni\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:39:05 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:05.287691 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb3690", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node gffw-compute-a-002 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:39:05 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:05.287770 109 event.go:228] Unable to write event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb3690", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"Node gffw-compute-a-002 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892657808, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}' (retry limit exceeded!) Aug 5 06:39:05 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:05.288274 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb7d6c", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node gffw-compute-a-002 status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892675948, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892675948, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:39:05 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:05.661272 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:39:05 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:39:05.694225 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:39:05 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:05.694915 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:39:06 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:39:06.053923 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:06 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:06.053977 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:10 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:39:10.987561 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:10 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:10.987615 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:11 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:11.975127 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/usernetes/bin/cni\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:39:12 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:12.115863 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb7d6c", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node gffw-compute-a-002 status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892675948, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892675948, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:39:12 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:12.662426 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:39:12 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:39:12.696616 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:39:12 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:12.697315 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:39:12 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:39:12.758572 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:12 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:39:12.758608 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:19 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:19.663683 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:39:19 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:39:19.698747 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:39:19 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:19.699448 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:39:21 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:21.976603 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/usernetes/bin/cni\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:39:22 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:22.117212 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb7d6c", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node gffw-compute-a-002 status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892675948, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892675948, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:39:26 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:26.665378 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:39:26 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:39:26.700374 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:39:26 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:26.701054 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:39:31 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:31.978377 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_kubelet\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:39:32 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:39:32.060986 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:32 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:39:32.061025 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:32 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:32.118804 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb7d6c", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node gffw-compute-a-002 status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892675948, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892675948, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:39:33 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:39:33.206490 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:33 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:39:33.206531 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:33 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:33.667236 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:39:33 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:39:33.702385 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:39:33 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:33.703019 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:39:39 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:39:39.041495 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:39 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:39.041528 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://gffw-compute-a-001:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:40 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:40.668227 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:39:40 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:39:40.704402 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:39:40 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:40.705079 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:39:41 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:39:41.162634 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:41 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:41.162675 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://gffw-compute-a-001:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:41 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:41.980876 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_containers\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:39:42 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:42.120618 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb7d6c", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node gffw-compute-a-002 status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892675948, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892675948, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:39:44 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:39:44.045340 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:44 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:39:44.045374 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:46 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:39:46.682433 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:46 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:46.682466 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:47 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:47.669198 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:39:47 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:39:47.706306 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:39:47 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:47.707097 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:39:51 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:51.982204 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_cni\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:39:52 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:52.121753 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb7d6c", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node gffw-compute-a-002 status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892675948, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892675948, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:39:54 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:54.670259 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:39:54 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:39:54.708264 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:39:54 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:54.708928 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:39:57 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:39:57.062017 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:39:57 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:39:57.062069 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:40:01 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:40:01.671348 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:40:01 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:40:01.710516 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:40:01 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:40:01.711166 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:40:01 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:40:01.984715 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_cni\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:40:02 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:40:02.123244 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb7d6c", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node gffw-compute-a-002 status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892675948, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892675948, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:40:05 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:40:05.008570 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:40:05 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:40:05.008607 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://gffw-compute-a-001:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:40:08 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:40:08.672934 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:40:08 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:40:08.712911 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:40:08 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:40:08.713625 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:40:11 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:40:11.986679 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_kubelet\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:40:12 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:40:12.125053 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb7d6c", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node gffw-compute-a-002 status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892675948, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892675948, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:40:15 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:40:15.674852 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:40:15 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:40:15.714858 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:40:15 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:40:15.715487 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" Aug 5 06:40:16 gffw-compute-a-002 kube-proxy.sh[8653]: W0805 06:40:16.432281 147 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:40:16 gffw-compute-a-002 kube-proxy.sh[8653]: E0805 06:40:16.432318 147 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://gffw-compute-a-001:6443/apis/discovery.k8s.io/v1/endpointslices?labelSelector=%21service.kubernetes.io%2Fheadless%2C%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:40:19 gffw-compute-a-002 kubelet-crio.sh[8593]: W0805 06:40:19.798481 109 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:40:19 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:40:19.798532 109 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://gffw-compute-a-001:6443/api/v1/nodes?fieldSelector=metadata.name%3Dgffw-compute-a-002&limit=500&resourceVersion=0": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host Aug 5 06:40:21 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:40:21.987999 109 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="cannot find filesystem info for device \"10.10.0.2:/var/nfs/home/sochat1_llnl_gov/.local/share/usernetes/_var_lib_kubelet\"" mountpoint="/home/sochat1_llnl_gov/.local/share/usernetes/containers/storage/overlay-images" Aug 5 06:40:22 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:40:22.126457 109 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"gffw-compute-a-002.1778691ca8fb7d6c", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"gffw-compute-a-002", UID:"gffw-compute-a-002", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node gffw-compute-a-002 status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"gffw-compute-a-002"}, FirstTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892675948, time.Local), LastTimestamp:time.Date(2023, time.August, 5, 6, 33, 51, 892675948, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://gffw-compute-a-001:6443/api/v1/namespaces/default/events": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host'(may retry after sleeping) Aug 5 06:40:22 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:40:22.676093 109 controller.go:146] "Failed to ensure lease exists, will retry" err="Get \"https://gffw-compute-a-001:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/gffw-compute-a-002?timeout=10s\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" interval="7s" Aug 5 06:40:22 gffw-compute-a-002 kubelet-crio.sh[8593]: I0805 06:40:22.717346 109 kubelet_node_status.go:70] "Attempting to register node" node="gffw-compute-a-002" Aug 5 06:40:22 gffw-compute-a-002 kubelet-crio.sh[8593]: E0805 06:40:22.717978 109 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://gffw-compute-a-001:6443/api/v1/nodes\": dial tcp: lookup gffw-compute-a-001 on 10.0.101.3:53: no such host" node="gffw-compute-a-002" ```

So not really knowing how this all works - what I think is happening is that the pods are being created, but the different nodes don't seem to be discovering one another. I'm guessing the non-master nodes need to do some kind of handshake with the master, and perhaps there are other layers in there like auth / certificates that are wonky. What should we look at next? Could this be an issue of ports / firewalls not being open perhaps?

For reference, here are the firewall rules I thought the last one for the traffic between nodes would be sufficient - it allows flux to connect on port 8050 (for the workers to ping the broker). I'll read more into the erros above / look them up tomorrow!

aojea commented 1 year ago

bin/kubectl get nodes No resources found

this is because the kubelet is not registering on the apiserver,the next step is to check the kubelet logs

r. I'm guessing the non-master nodes need to do some kind of handshake with the master

the Node object is created by the kubelet in each node, kubelet connects to the apiservers and registers itself, creating the Node object to reflect the node where it is running and its capabilities.

vsoch commented 1 year ago

Do you know where these are for usernetes? I did a find for anything named "log" or *.log and I don't see anything in /var/log (which makes sense if it's in user space). I'll keep looking in the usernetes in user home - I'm thinking it should be somewhere in there.

vsoch commented 1 year ago

I found what could be logs, but the directories are empty

$ echo $PWD
/home/sochat1_llnl_gov/.local/share/usernetes
[sochat1_llnl_gov@gffw-compute-a-001 usernetes]$ tree .
.
├── etcd
│   └── member
│       ├── snap
│       │   └── db
│       └── wal
│           ├── 0000000000000000-0000000000000000.wal
│           └── 0.tmp
├── _var_cache
├── _var_lib_cni
├── _var_lib_containers
├── _var_lib_kubelet
└── _var_log
vsoch commented 1 year ago

In case these are helpful:

$ ./rootlessctl.sh list-ports
ID    PROTO    PARENTIP    PARENTPORT    CHILDIP    CHILDPORT    
1     tcp      0.0.0.0     2379                     2379         
2     tcp      0.0.0.0     6443                     6443  
$ bin/kubectl --kubeconfig ~/.config/usernetes/master/admin-localhost.kubeconfig version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.2", GitCommit:"7f6f68fdabc4df88cfea2dcf9a19b2b830f1e647", GitTreeState:"clean", BuildDate:"2023-05-17T14:20:07Z", GoVersion:"go1.20.4", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.2", GitCommit:"7f6f68fdabc4df88cfea2dcf9a19b2b830f1e647", GitTreeState:"clean", BuildDate:"2023-05-17T14:13:28Z", GoVersion:"go1.20.4", Compiler:"gc", Platform:"linux/amd64"}
$ bin/kubectl --kubeconfig ~/.config/usernetes/master/admin-localhost.kubeconfig cluster-info
Kubernetes control plane is running at https://127.0.0.1:6443
CoreDNS is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
$ curl -k https://127.0.0.1:6443
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}
$ bin/kubectl --kubeconfig ~/.config/usernetes/master/admin-localhost.kubeconfig get endpoints
NAME         ENDPOINTS        AGE
kubernetes   10.10.0.5:6443   18m
[sochat1_llnl_gov@gffw-compute-a-001 usernetes]$ bin/kubectl --kubeconfig ~/.config/usernetes/master/admin-localhost.kubeconfig get endpoints -o yaml
apiVersion: v1
items:
- apiVersion: v1
  kind: Endpoints
  metadata:
    creationTimestamp: "2023-08-05T18:30:34Z"
    labels:
      endpointslice.kubernetes.io/skip-mirror: "true"
    name: kubernetes
    namespace: default
    resourceVersion: "74"
    uid: a6e04b81-f695-4811-b1ab-33360096d21f
  subsets:
  - addresses:
    - ip: 10.10.0.5
    ports:
    - name: https
      port: 6443
      protocol: TCP
kind: List
metadata:
  resourceVersion: ""
aojea commented 1 year ago

Just run kubelet manually with the configuration and paste the output

vsoch commented 1 year ago

okay I've never done that, but it's probably in the code somewhere and I can try to find it!

vsoch commented 1 year ago

okay boot/kubelet.sh seems to start that, but it also seems to be only be run in the context of the containerd /crio nodes (the other two). This is a grep that includes the local usernetes and .config directory:

boot/kubelet-crio.sh:exec $(dirname $0)/kubelet.sh --container-runtime-endpoint unix://$XDG_RUNTIME_DIR/usernetes/crio/crio.sock $@
boot/kubelet-containerd.sh:exec $(dirname $0)/kubelet.sh --container-runtime-endpoint unix://$XDG_RUNTIME_DIR/usernetes/containerd/containerd.sock $@
Binary file bin/kubelet matches

If I naively run it (not knowing the args that go into it) I see:

STARTING KUBELET

[INFO] Entering RootlessKit namespaces: OK
I0805 22:30:04.701451     183 server.go:415] "Kubelet version" kubeletVersion="v1.27.2"
I0805 22:30:04.701494     183 server.go:417] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0805 22:30:04.703112     183 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/home/sochat1_llnl_gov/.config/usernetes/node/ca.pem"
I0805 22:30:04.711861     183 server.go:662] "--cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /"
I0805 22:30:04.712014     183 container_manager_linux.go:266] "Container manager verified user specified cgroup-root exists" cgroupRoot=[]
I0805 22:30:04.712077     183 container_manager_linux.go:271] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: KubeletOOMScoreAdj:-999 ContainerRuntime: CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/home/sochat1_llnl_gov/.local/share/usernetes/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.03} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] CPUManagerPolicy:none CPUManagerPolicyOptions:map[] TopologyManagerScope:container CPUManagerReconcilePeriod:10s ExperimentalMemoryManagerPolicy:None ExperimentalMemoryManagerReservedMemory:[] PodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms TopologyManagerPolicy:none ExperimentalTopologyManagerPolicyOptions:map[]}
I0805 22:30:04.712103     183 topology_manager.go:136] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container"
I0805 22:30:04.712118     183 container_manager_linux.go:302] "Creating device plugin manager"
I0805 22:30:04.712340     183 state_mem.go:36] "Initialized new in-memory state store"
I0805 22:30:04.912829     183 server.go:776] "Failed to ApplyOOMScoreAdj" err="write /proc/self/oom_score_adj: permission denied"
W0805 22:30:04.913229     183 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {
  "Addr": "/run/containerd/containerd.sock",
  "ServerName": "/run/containerd/containerd.sock",
  "Attributes": null,
  "BalancerAttributes": null,
  "Type": 0,
  "Metadata": null
}. Err: connection error: desc = "transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: no such file or directory"
E0805 22:30:04.913937     183 run.go:74] "command failed" err="failed to run Kubelet: validate service connection: validate CRI v1 runtime API for endpoint \"unix:///run/containerd/containerd.sock\": rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: no such file or directory\""

But I suspect whatever args I need are not there, and also it's not clear to me this is actually called anywhere for the master? Poking around install.sh, I think what is called for master is stuff/services in the group called u7s-master.target and that calls each of:

I think the API server is started - we confirmed that endpoint above. I tried all three of those, and it says the ports are already being used (and the previous logs showed they were active too).

vsoch commented 1 year ago

Unrelated, but I found the plugins directory that was checked for (and didn't exist) I'll add a note in my script to create it:

volumePluginDir: /home/sochat1_llnl_gov/.local/share/usernetes/kubelet-plugins-exec

I'm guessing container.d is expected to be running for the kubelet, if that error is correct? But for the docker-compose setup that variable is left empty, e.g.,:

    /bin/bash ./install.sh --wait-init-certs --start=u7s-master-with-etcd.target --cidr=10.0.100.0/24 --publish=0.0.0.0:2379:2379/tcp --publish=0.0.0.0:6443:6443/tcp --cni=flannel --cri=

https://github.com/rootless-containers/usernetes/blob/58df6ea63cc4a00425b80a088889015eedc96320/docker-compose.yml#L31

From the output, it's not clear how we would run a kubelet without containerd, but I'm not experienced with this setup so just a speculation!

u7s-kubelet-containerd.service             loaded inactive dead    Usernetes kubelet service (containerd)
vsoch commented 1 year ago

I tried running the command that would use/create that socket

$ mkdir -p /run/user/501043911/usernetes/containerd/
$ cd usernetes
$ U7S_BASE_DIR=$PWD
$ source $U7S_BASE_DIR/common/common.inc.sh
$ ./boot/kubelet.sh --container-runtime-endpoint unix://$XDG_RUNTIME_DIR/usernetes/containerd/containerd.sock
STARTING KUBELET
--container-runtime-endpoint unix:///run/user/501043911/usernetes/containerd/containerd.sock
[INFO] Entering RootlessKit namespaces: OK
Flag --container-runtime-endpoint has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
I0805 22:44:40.975865     401 server.go:415] "Kubelet version" kubeletVersion="v1.27.2"
I0805 22:44:40.975909     401 server.go:417] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0805 22:44:40.977780     401 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/home/sochat1_llnl_gov/.config/usernetes/node/ca.pem"
I0805 22:44:40.985759     401 server.go:662] "--cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /"
I0805 22:44:40.985916     401 container_manager_linux.go:266] "Container manager verified user specified cgroup-root exists" cgroupRoot=[]
I0805 22:44:40.985964     401 container_manager_linux.go:271] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: KubeletOOMScoreAdj:-999 ContainerRuntime: CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/home/sochat1_llnl_gov/.local/share/usernetes/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.03} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] CPUManagerPolicy:none CPUManagerPolicyOptions:map[] TopologyManagerScope:container CPUManagerReconcilePeriod:10s ExperimentalMemoryManagerPolicy:None ExperimentalMemoryManagerReservedMemory:[] PodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms TopologyManagerPolicy:none ExperimentalTopologyManagerPolicyOptions:map[]}
I0805 22:44:40.985983     401 topology_manager.go:136] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container"
I0805 22:44:40.986001     401 container_manager_linux.go:302] "Creating device plugin manager"
I0805 22:44:40.986248     401 state_mem.go:36] "Initialized new in-memory state store"
I0805 22:44:41.186765     401 server.go:776] "Failed to ApplyOOMScoreAdj" err="write /proc/self/oom_score_adj: permission denied"
W0805 22:44:41.187310     401 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {
  "Addr": "/run/user/501043911/usernetes/containerd/containerd.sock",
  "ServerName": "/run/user/501043911/usernetes/containerd/containerd.sock",
  "Attributes": null,
  "BalancerAttributes": null,
  "Type": 0,
  "Metadata": null
}. Err: connection error: desc = "transport: Error while dialing dial unix /run/user/501043911/usernetes/containerd/containerd.sock: connect: no such file or directory"
E0805 22:44:41.190001     401 run.go:74] "command failed" err="failed to run Kubelet: validate service connection: validate CRI v1 runtime API for endpoint \"unix:///run/user/501043911/usernetes/containerd/containerd.sock\": rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /run/user/501043911/usernetes/containerd/containerd.sock: connect: no such file or directory\""
Connection to compute.1353515447032707346 closed.

Seems like it's expected to already be there? This might also be important:

Flag --container-runtime-endpoint has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
vsoch commented 1 year ago

Okay I added that variable to the kubelet config generated in boot/kublet.sh in $XDG_RUNTIME_DIR/usernetes/kubelet-config.yaml

kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
volumePluginDir: $XDG_DATA_HOME/usernetes/kubelet-plugins-exec
authentication:
  anonymous: 
    enabled: false
  x509:
    clientCAFile: "$XDG_CONFIG_HOME/usernetes/node/ca.pem"
tlsCertFile: "$XDG_CONFIG_HOME/usernetes/node/node.pem"
tlsPrivateKeyFile: "$XDG_CONFIG_HOME/usernetes/node/node-key.pem"
clusterDomain: "cluster.local"
clusterDNS:
  - "10.0.0.53"
failSwapOn: false
featureGates:
  KubeletInUserNamespace: true
evictionHard:
  nodefs.available: "3%"
+ containerRuntimeEndpoint: "unix://$XDG_RUNTIME_DIR/usernetes/containerd/containerd.sock"
localStorageCapacityIsolation: false
cgroupDriver: "cgroupfs"
cgroupsPerQOS: true
enforceNodeAllocatable: []

That at least changes the error message to be where we wanted that containerd socket to be, somewhere in the user's control. The error message now references that path:

W0805 22:51:04.038374     436 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {
  "Addr": "/run/user/501043911/usernetes/containerd/containerd.sock",
  "ServerName": "/run/user/501043911/usernetes/containerd/containerd.sock",
  "Attributes": null,
  "BalancerAttributes": null,
  "Type": 0,
  "Metadata": null
}. Err: connection error: desc = "transport: Error while dialing dial unix /run/user/501043911/usernetes/containerd/containerd.sock: connect: no such file or directory"
E0805 22:51:04.039080     436 run.go:74] "command failed" err="failed to run Kubelet: validate service connection: validate CRI v1 runtime API for endpoint \"unix:///run/user/501043911/usernetes/containerd/containerd.sock\": rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /run/user/501043911/usernetes/containerd/containerd.sock: connect: no such file or directory\""

Let me see if I can figure out how to generate that and why it's not there.

vsoch commented 1 year ago

It looks like the rootlesskit service is supposed to execute the boot/containerd.sh

.config/systemd/user/u7s-rootlesskit.service:ExecStart=/home/sochat1_llnl_gov/usernetes/boot/rootlesskit.sh /home/sochat1_llnl_gov/usernetes/boot/containerd.sh

if I execute it on its own, it tries to use system paths (not user paths)

[sochat1_llnl_gov@gffw-compute-a-001 usernetes]$ ./boot/containerd.sh 
INFO[2023-08-05T22:55:18.941105134Z] starting containerd                           revision=1677a17964311325ed1c31e2c0a3589ce6d5c30d version=v1.7.1
INFO[2023-08-05T22:55:18.956472076Z] loading plugin "io.containerd.content.v1.content"...  type=io.containerd.content.v1
INFO[2023-08-05T22:55:18.957045752Z] loading plugin "io.containerd.snapshotter.v1.native"...  type=io.containerd.snapshotter.v1
INFO[2023-08-05T22:55:18.957521832Z] loading plugin "io.containerd.snapshotter.v1.overlayfs"...  type=io.containerd.snapshotter.v1
INFO[2023-08-05T22:55:18.958340232Z] loading plugin "io.containerd.snapshotter.v1.fuse-overlayfs"...  type=io.containerd.snapshotter.v1
INFO[2023-08-05T22:55:18.958579596Z] loading plugin "io.containerd.metadata.v1.bolt"...  type=io.containerd.metadata.v1
INFO[2023-08-05T22:55:18.958808670Z] metadata content store policy set             policy=shared
INFO[2023-08-05T22:55:18.960625994Z] loading plugin "io.containerd.differ.v1.walking"...  type=io.containerd.differ.v1
INFO[2023-08-05T22:55:18.960649370Z] loading plugin "io.containerd.event.v1.exchange"...  type=io.containerd.event.v1
INFO[2023-08-05T22:55:18.960660841Z] loading plugin "io.containerd.gc.v1.scheduler"...  type=io.containerd.gc.v1
INFO[2023-08-05T22:55:18.960680365Z] loading plugin "io.containerd.lease.v1.manager"...  type=io.containerd.lease.v1
INFO[2023-08-05T22:55:18.960692583Z] loading plugin "io.containerd.nri.v1.nri"...  type=io.containerd.nri.v1
INFO[2023-08-05T22:55:18.960703504Z] NRI interface is disabled by configuration.  
INFO[2023-08-05T22:55:18.960713086Z] loading plugin "io.containerd.runtime.v2.task"...  type=io.containerd.runtime.v2
INFO[2023-08-05T22:55:18.960905346Z] loading plugin "io.containerd.runtime.v2.shim"...  type=io.containerd.runtime.v2
INFO[2023-08-05T22:55:18.960921039Z] loading plugin "io.containerd.sandbox.store.v1.local"...  type=io.containerd.sandbox.store.v1
INFO[2023-08-05T22:55:18.960932550Z] loading plugin "io.containerd.sandbox.controller.v1.local"...  type=io.containerd.sandbox.controller.v1
INFO[2023-08-05T22:55:18.960944352Z] loading plugin "io.containerd.streaming.v1.manager"...  type=io.containerd.streaming.v1
INFO[2023-08-05T22:55:18.960957261Z] loading plugin "io.containerd.service.v1.introspection-service"...  type=io.containerd.service.v1
INFO[2023-08-05T22:55:18.960969439Z] loading plugin "io.containerd.service.v1.containers-service"...  type=io.containerd.service.v1
INFO[2023-08-05T22:55:18.960983195Z] loading plugin "io.containerd.service.v1.content-service"...  type=io.containerd.service.v1
INFO[2023-08-05T22:55:18.960994223Z] loading plugin "io.containerd.service.v1.diff-service"...  type=io.containerd.service.v1
INFO[2023-08-05T22:55:18.961006543Z] loading plugin "io.containerd.service.v1.images-service"...  type=io.containerd.service.v1
INFO[2023-08-05T22:55:18.961018415Z] loading plugin "io.containerd.service.v1.namespaces-service"...  type=io.containerd.service.v1
INFO[2023-08-05T22:55:18.961029232Z] loading plugin "io.containerd.service.v1.snapshots-service"...  type=io.containerd.service.v1
INFO[2023-08-05T22:55:18.961039452Z] loading plugin "io.containerd.runtime.v1.linux"...  type=io.containerd.runtime.v1
INFO[2023-08-05T22:55:18.961226739Z] loading plugin "io.containerd.monitor.v1.cgroups"...  type=io.containerd.monitor.v1
INFO[2023-08-05T22:55:18.961495277Z] loading plugin "io.containerd.service.v1.tasks-service"...  type=io.containerd.service.v1
INFO[2023-08-05T22:55:18.961521383Z] loading plugin "io.containerd.grpc.v1.introspection"...  type=io.containerd.grpc.v1
INFO[2023-08-05T22:55:18.961536741Z] loading plugin "io.containerd.transfer.v1.local"...  type=io.containerd.transfer.v1
INFO[2023-08-05T22:55:18.961556098Z] loading plugin "io.containerd.internal.v1.restart"...  type=io.containerd.internal.v1
INFO[2023-08-05T22:55:18.961601250Z] loading plugin "io.containerd.grpc.v1.containers"...  type=io.containerd.grpc.v1
INFO[2023-08-05T22:55:18.961612893Z] loading plugin "io.containerd.grpc.v1.content"...  type=io.containerd.grpc.v1
INFO[2023-08-05T22:55:18.961624634Z] loading plugin "io.containerd.grpc.v1.diff"...  type=io.containerd.grpc.v1
INFO[2023-08-05T22:55:18.961640201Z] loading plugin "io.containerd.grpc.v1.events"...  type=io.containerd.grpc.v1
INFO[2023-08-05T22:55:18.961656284Z] loading plugin "io.containerd.grpc.v1.healthcheck"...  type=io.containerd.grpc.v1
INFO[2023-08-05T22:55:18.961673180Z] loading plugin "io.containerd.grpc.v1.images"...  type=io.containerd.grpc.v1
INFO[2023-08-05T22:55:18.961696965Z] loading plugin "io.containerd.grpc.v1.leases"...  type=io.containerd.grpc.v1
INFO[2023-08-05T22:55:18.961711651Z] loading plugin "io.containerd.grpc.v1.namespaces"...  type=io.containerd.grpc.v1
INFO[2023-08-05T22:55:18.961727461Z] loading plugin "io.containerd.internal.v1.opt"...  type=io.containerd.internal.v1
WARN[2023-08-05T22:55:18.961760550Z] failed to load plugin io.containerd.internal.v1.opt  error="mkdir /opt/containerd: permission denied"
INFO[2023-08-05T22:55:18.961775618Z] loading plugin "io.containerd.grpc.v1.sandbox-controllers"...  type=io.containerd.grpc.v1
INFO[2023-08-05T22:55:18.961792023Z] loading plugin "io.containerd.grpc.v1.sandboxes"...  type=io.containerd.grpc.v1
INFO[2023-08-05T22:55:18.961806697Z] loading plugin "io.containerd.grpc.v1.snapshots"...  type=io.containerd.grpc.v1
INFO[2023-08-05T22:55:18.961821318Z] loading plugin "io.containerd.grpc.v1.streaming"...  type=io.containerd.grpc.v1
INFO[2023-08-05T22:55:18.961837177Z] loading plugin "io.containerd.grpc.v1.tasks"...  type=io.containerd.grpc.v1
INFO[2023-08-05T22:55:18.961854777Z] loading plugin "io.containerd.grpc.v1.transfer"...  type=io.containerd.grpc.v1
INFO[2023-08-05T22:55:18.961869432Z] loading plugin "io.containerd.grpc.v1.version"...  type=io.containerd.grpc.v1
INFO[2023-08-05T22:55:18.961883475Z] loading plugin "io.containerd.grpc.v1.cri"...  type=io.containerd.grpc.v1
INFO[2023-08-05T22:55:18.962044746Z] Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:fuse-overlayfs DefaultRuntimeName:crun DefaultRuntime:{Type: Path: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false PrivilegedWithoutHostDevicesAllDevicesAllowed:false BaseRuntimeSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0 Snapshotter: SandboxMode:} UntrustedWorkloadRuntime:{Type: Path: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false PrivilegedWithoutHostDevicesAllDevicesAllowed:false BaseRuntimeSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0 Snapshotter: SandboxMode:} Runtimes:map[crun:{Type:io.containerd.runc.v2 Path: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[BinaryName:crun] PrivilegedWithoutHostDevices:false PrivilegedWithoutHostDevicesAllDevicesAllowed:false BaseRuntimeSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0 Snapshotter: SandboxMode:podsandbox}] NoPivot:false DisableSnapshotAnnotations:true DiscardUnpackedLayers:false IgnoreBlockIONotEnabledErrors:false IgnoreRdtNotEnabledErrors:false} CniConfig:{NetworkPluginBinDir:/opt/cni/bin NetworkPluginConfDir:/etc/cni/net.d NetworkPluginMaxConfNum:1 NetworkPluginSetupSerially:false NetworkPluginConfTemplate: IPPreference:} Registry:{ConfigPath: Mirrors:map[] Configs:map[] Auths:map[] Headers:map[]} ImageDecryption:{KeyModel:node} DisableTCPService:true StreamServerAddress:127.0.0.1 StreamServerPort:0 StreamIdleTimeout:4h0m0s EnableSelinux:false SelinuxCategoryRange:1024 SandboxImage:registry.k8s.io/pause:3.8 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384 DisableCgroup:false DisableApparmor:true RestrictOOMScoreAdj:true MaxConcurrentDownloads:3 DisableProcMount:false UnsetSeccompProfile: TolerateMissingHugetlbController:true DisableHugetlbController:true DeviceOwnershipFromSecurityContext:false IgnoreImageDefinedVolumes:false NetNSMountsUnderStateDir:false EnableUnprivilegedPorts:false EnableUnprivilegedICMP:false EnableCDI:false CDISpecDirs:[/etc/cdi /var/run/cdi] ImagePullProgressTimeout:1m0s DrainExecSyncIOTimeout:0s} ContainerdRootDir:/home/sochat1_llnl_gov/.local/share/usernetes/containerd ContainerdEndpoint:/run/user/501043911/usernetes/containerd/containerd.sock RootDir:/home/sochat1_llnl_gov/.local/share/usernetes/containerd/io.containerd.grpc.v1.cri StateDir:/run/user/501043911/usernetes/containerd/io.containerd.grpc.v1.cri} 
INFO[2023-08-05T22:55:18.962080032Z] Connect containerd service                   
INFO[2023-08-05T22:55:18.962104066Z] using legacy CRI server                      
INFO[2023-08-05T22:55:18.962112969Z] using experimental NRI integration - disable nri plugin to prevent this 
INFO[2023-08-05T22:55:18.962138243Z] Get image filesystem path "/home/sochat1_llnl_gov/.local/share/usernetes/containerd/io.containerd.snapshotter.v1.fuse-overlayfs" 
WARN[2023-08-05T22:55:18.962391475Z] failed to load plugin io.containerd.grpc.v1.cri  error="failed to create CRI service: failed to create cni conf monitor for default: failed to create the parent of the cni conf dir=/etc/cni: mkdir /etc/cni: permission denied"
INFO[2023-08-05T22:55:18.962408699Z] loading plugin "io.containerd.tracing.processor.v1.otlp"...  type=io.containerd.tracing.processor.v1
INFO[2023-08-05T22:55:18.962423577Z] skip loading plugin "io.containerd.tracing.processor.v1.otlp"...  error="no OpenTelemetry endpoint: skip plugin" type=io.containerd.tracing.processor.v1
INFO[2023-08-05T22:55:18.962434748Z] loading plugin "io.containerd.internal.v1.tracing"...  type=io.containerd.internal.v1
INFO[2023-08-05T22:55:18.962444872Z] skipping tracing processor initialization (no tracing plugin)  error="no OpenTelemetry endpoint: skip plugin"
containerd: failed to get listener for main ttrpc endpoint: chown /run/user/501043911/usernetes/containerd/containerd.sock.ttrpc: operation not permitted

When I wrap in rootlesskit.sh it doesn't work because it says there is already a lock file in my user XDG home. But after running the above, at least there are files in that location for containerd:

$ tree /run/user/501043911/usernetes/containerd/
/run/user/501043911/usernetes/containerd/
├── io.containerd.runtime.v1.linux
└── io.containerd.runtime.v2.task

Maybe that gives you some hints?

vsoch commented 1 year ago

okay going for bike ride / run, shutting this down for now! Thanks for helping on a Saturday!

vsoch commented 1 year ago

Heyo! Wanted to ping for next week and see if anyone @aojea or @AkihiroSuda had thoughts about the above? What should we try next? If you don't have ideas I could take a shot at PR to try and generalize the scripts to not be hard coded for docker-compose (in case there is some tiny error in there leading to the behavior here). Happy Sunday!