Closed dyyfk closed 1 year ago
Hi, @dyyfk thanks for raising this issue, we currently spend most of our time on the functionality, and yes, there are some lacks of docs.
as for the network-aware scheduling, you can check https://github.com/kubernetes-sigs/scheduler-plugins/blob/master/pkg/networkaware/README.md.
if anything does not work as expected, please feel free and welcome to raise a bug here.
/kind support
Mostly, I want to know if the scheduler is behaving as expected and I am looking for a way to test it. I would appreciate it if you could let me know how could I use the existing yaml file to test the scheduler/controller.
cc the author of network aware plugin @jpedro1992
Hi @dyyfk, thank you for raising the issue and interest in the network-aware scheduler.
Which application are you trying to deploy, and which challenges are you facing? Are you using the provided yaml files? Here you can find most of them.
To see if the scheduler/controller behave as expected please check the logs of both. However, it depends exactly on which application you are deploying and both the AppGroup and NetworkTopology CRs.
All components have been implemented. Some are hosted outside this repo, here.
For further information/documentation, please check our KEP.
Hi, @jpedro1992. I want to test if the scheduler would always schedule the pods on the same node using the following scheduler config. Note that I change the weight of NetworkOverhead to 100.
kubectl describe configmap network-aware-scheduler-config -n kube-system
Name: network-aware-scheduler-config
Namespace: kube-system
Labels: <none>
Annotations: <none>
Data
====
scheduler-config.yaml:
----
apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: KubeSchedulerConfiguration
leaderElection:
leaderElect: false
clientConnection:
kubeconfig: "/etc/kubernetes/scheduler.conf"
profiles:
- schedulerName: network-aware-scheduler
plugins:
queueSort:
enabled:
- name: TopologicalSort
disabled:
- name: "*"
preFilter:
enabled:
- name: NetworkOverhead
filter:
enabled:
- name: NetworkOverhead
score:
disabled: # Preferably avoid the combination of NodeResourcesFit with NetworkOverhead
- name: NodeResourcesFit
enabled: # A higher weight is given to NetworkOverhead to favor allocation schemes with lower latency.
- name: NetworkOverhead
weight: 100
pluginConfig:
- name: TopologicalSort
args:
namespaces:
- "default"
- name: NetworkOverhead
args:
namespaces:
- "default"
weightsName: "UserDefined" # or "NetperfCosts"
networkTopologyName: "net-topology-test"
BinaryData
====
Events: <none>
My appgroups looks like the following:
Name: a1
Namespace: default
Labels: <none>
Annotations: <none>
API Version: appgroup.diktyo.k8s.io/v1alpha1
Kind: AppGroup
Metadata:
Creation Timestamp: 2023-04-22T11:03:57Z
Generation: 2
Resource Version: 3695
UID: c6b07ce0-a8ad-4c89-bba1-42e63ab04b71
Spec:
Num Members: 3
Topology Sorting Algorithm: KahnSort
Workloads:
Dependencies:
Max Network Cost: 30
Min Bandwidth: 100Mi
Workload:
API Version: apps/v1
Kind: Deployment
Name: p2
Namespace: default
Selector: p2
Workload:
API Version: apps/v1
Kind: Deployment
Name: p1
Namespace: default
Selector: p1
Dependencies:
Max Network Cost: 20
Min Bandwidth: 250Mi
Workload:
API Version: apps/v1
Kind: Deployment
Name: p3
Namespace: default
Selector: p3
Workload:
API Version: apps/v1
Kind: Deployment
Name: p2
Namespace: default
Selector: p2
Workload:
API Version: apps/v1
Kind: Deployment
Name: p3
Namespace: default
Selector: p3
Status:
Topology Calculation Time: 2023-04-22T11:03:57Z
Topology Order:
Index: 1
Workload:
API Version: apps/v1
Kind: Deployment
Name: p1
Namespace: default
Selector: p1
Index: 2
Workload:
API Version: apps/v1
Kind: Deployment
Name: p2
Namespace: default
Selector: p2
Index: 3
Workload:
API Version: apps/v1
Kind: Deployment
Name: p3
Namespace: default
Selector: p3
Events: <none>
Name: net-topology-test
Namespace: default
Labels: <none>
Annotations: <none>
API Version: networktopology.diktyo.k8s.io/v1alpha1
Kind: NetworkTopology
Metadata:
Creation Timestamp: 2023-04-22T11:04:41Z
Generation: 1
Resource Version: 3759
UID: c6081d0c-f991-42e7-803c-a6cf430c9ab1
Spec:
Configmap Name: netperfMetrics
Weights:
Name: UserDefined
Topology List:
Origin List:
Cost List:
Bandwidth Capacity: 10Gi
Destination: us-east-1
Network Cost: 20
Origin: us-west-1
Cost List:
Bandwidth Capacity: 10Gi
Destination: us-west-1
Network Cost: 20
Origin: us-east-1
Topology Key: topology.kubernetes.io/region
Origin List:
Cost List:
Bandwidth Capacity: 1Gi
Destination: z2
Network Cost: 5
Origin: z1
Cost List:
Bandwidth Capacity: 1Gi
Destination: z1
Network Cost: 5
Origin: z2
Cost List:
Bandwidth Capacity: 1Gi
Destination: z4
Network Cost: 10
Origin: z3
Cost List:
Bandwidth Capacity: 1Gi
Destination: z3
Network Cost: 10
Origin: z4
Topology Key: topology.kubernetes.io/zone
Events: <none>
I am testing the scheduler on an 8-node minikube, with network-plugin enabled.
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
minikube Ready control-plane,master 12h v1.25.7 192.168.49.2 <none> Ubuntu 20.04.5 LTS 5.19.0-1022-aws docker://23.0.2
minikube-m02 Ready <none> 12h v1.25.7 192.168.49.3 <none> Ubuntu 20.04.5 LTS 5.19.0-1022-aws docker://23.0.2
minikube-m03 Ready <none> 12h v1.25.7 192.168.49.4 <none> Ubuntu 20.04.5 LTS 5.19.0-1022-aws docker://23.0.2
minikube-m04 Ready <none> 12h v1.25.7 192.168.49.5 <none> Ubuntu 20.04.5 LTS 5.19.0-1022-aws docker://23.0.2
minikube-m05 Ready <none> 12h v1.25.7 192.168.49.6 <none> Ubuntu 20.04.5 LTS 5.19.0-1022-aws docker://23.0.2
minikube-m06 Ready <none> 12h v1.25.7 192.168.49.7 <none> Ubuntu 20.04.5 LTS 5.19.0-1022-aws docker://23.0.2
minikube-m07 Ready <none> 12h v1.25.7 192.168.49.8 <none> Ubuntu 20.04.5 LTS 5.19.0-1022-aws docker://23.0.2
minikube-m08 Ready <none> 12h v1.25.7 192.168.49.9 <none> Ubuntu 20.04.5 LTS 5.19.0-1022-aws docker://23.0.2
I labeled each node as the following:
kubectl describe nodes | grep topology.kubernetes.io/region
topology.kubernetes.io/region=us-west-1
topology.kubernetes.io/region=us-west-1
topology.kubernetes.io/region=us-west-1
topology.kubernetes.io/region=us-west-1
topology.kubernetes.io/region=us-east-1
topology.kubernetes.io/region=us-east-1
topology.kubernetes.io/region=us-east-1
topology.kubernetes.io/region=us-east-1
kubectl describe nodes | grep topology.kubernetes.io/zone
topology.kubernetes.io/zone=z1
topology.kubernetes.io/zone=z1
topology.kubernetes.io/zone=z2
topology.kubernetes.io/zone=z2
topology.kubernetes.io/zone=z3
topology.kubernetes.io/zone=z3
topology.kubernetes.io/zone=z4
topology.kubernetes.io/zone=z4
My p1, p2, p3 yaml files are the following:
apiVersion: apps/v1
kind: Deployment
metadata:
name: p1
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: p1
template:
metadata:
labels:
app: p1
spec:
schedulerName: network-aware-scheduler
containers:
- name: p1-container
image: nginx
ports:
- containerPort: 80
apiVersion: apps/v1
kind: Deployment
metadata:
name: p2
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: p2
template:
metadata:
labels:
app: p2
spec:
schedulerName: network-aware-scheduler
containers:
- name: p2-container
image: redis
ports:
- containerPort: 6379
apiVersion: apps/v1
kind: Deployment
metadata:
name: p2
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: p2
template:
metadata:
labels:
app: p2
spec:
schedulerName: network-aware-scheduler
containers:
- name: p2-container
image: redis
ports:
- containerPort: 6379
apiVersion: apps/v1
kind: Deployment
metadata:
name: p3
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: p3
template:
metadata:
labels:
app: p3
spec:
schedulerName: network-aware-scheduler
containers:
- name: p3-container
image: postgres
env:
- name: POSTGRES_PASSWORD
value: "password"
ports:
- containerPort: 5432
I expect all the nodes to be in the same zone. But it is not, it appears that the scheduling is random as pods are scheduled across the zones or even regions.
There are no logs in the controllers when I deploy the appgroups crd and network crds. Is this expected? There are no logs in the scheduler when I deploy the p1, p2 and p3. Is this expected?
Snapshots of all the pods
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default p1-d4b878974-mfnm5 1/1 Running 0 114m
default p2-c8bf46f66-9gj24 1/1 Running 0 114m
default p3-9444dfcf9-gr9h8 1/1 Running 0 114m
kube-system calico-kube-controllers-798cc86c47-89g8h 1/1 Running 3 (12h ago) 12h
kube-system calico-node-2gnwz 1/1 Running 0 12h
kube-system calico-node-49wcq 1/1 Running 0 12h
kube-system calico-node-89h4b 1/1 Running 0 12h
kube-system calico-node-cp2fr 1/1 Running 0 12h
kube-system calico-node-dv2bv 1/1 Running 0 12h
kube-system calico-node-hhkpg 1/1 Running 0 12h
kube-system calico-node-lkshk 1/1 Running 0 12h
kube-system calico-node-lr4zk 1/1 Running 0 12h
kube-system coredns-565d847f94-ggkhh 1/1 Running 1 (12h ago) 12h
kube-system etcd-minikube 1/1 Running 0 12h
kube-system kube-apiserver-minikube 1/1 Running 0 12h
kube-system kube-controller-manager-minikube 1/1 Running 0 12h
kube-system kube-proxy-5zccp 1/1 Running 0 12h
kube-system kube-proxy-6p6dd 1/1 Running 0 12h
kube-system kube-proxy-9rhrs 1/1 Running 0 12h
kube-system kube-proxy-g5825 1/1 Running 0 12h
kube-system kube-proxy-kl4p7 1/1 Running 0 12h
kube-system kube-proxy-lr4rt 1/1 Running 0 12h
kube-system kube-proxy-sbh6q 1/1 Running 0 12h
kube-system kube-proxy-t9nqx 1/1 Running 0 12h
kube-system kube-scheduler-minikube 1/1 Running 0 12h
kube-system network-aware-scheduler-5ffc766dd9-tk88r 1/1 Running 0 116m
kube-system registry-gg4x8 1/1 Running 0 12h
kube-system registry-proxy-5k27s 1/1 Running 0 12h
kube-system registry-proxy-9bcps 1/1 Running 0 12h
kube-system registry-proxy-jhxbl 1/1 Running 0 12h
kube-system registry-proxy-m55rg 1/1 Running 0 12h
kube-system registry-proxy-n2952 1/1 Running 0 12h
kube-system registry-proxy-ncczr 1/1 Running 0 12h
kube-system registry-proxy-vpj2n 1/1 Running 0 12h
kube-system registry-proxy-vsrl5 1/1 Running 0 12h
kube-system storage-provisioner 1/1 Running 1 (12h ago) 12h
network-aware-controllers appgroup-controller-5fb544569c-l4856 1/1 Running 0 12h
network-aware-controllers networktopology-controller-67b5fc85bf-m4qqg 1/1 Running 0 12h
scheduler-plugins scheduler-plugins-controller-5d97947dd8-svvb8 1/1 Running 0 12h
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
p1-d4b878974-mfnm5 1/1 Running 0 115m 10.244.120.75 minikube <none> <none>
p2-c8bf46f66-9gj24 1/1 Running 0 115m 10.244.239.139 minikube-m08 <none> <none>
p3-9444dfcf9-gr9h8 1/1 Running 0 115m 10.244.151.6 minikube-m03 <none> <none>
kubectl logs -f network-aware-scheduler-5ffc766dd9-tk88r -n kube-system
I0422 21:03:01.999009 1 serving.go:348] Generated self-signed cert in-memory
I0422 21:03:03.141415 1 server.go:148] "Starting Kubernetes Scheduler" version="v0.25.7"
I0422 21:03:03.141449 1 server.go:150] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0422 21:03:03.146040 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0422 21:03:03.146059 1 shared_informer.go:255] Waiting for caches to sync for RequestHeaderAuthRequestController
I0422 21:03:03.146096 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0422 21:03:03.146120 1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0422 21:03:03.146144 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0422 21:03:03.146151 1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0422 21:03:03.146158 1 secure_serving.go:210] Serving securely on [::]:10259
I0422 21:03:03.146216 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0422 21:03:03.246357 1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0422 21:03:03.246356 1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0422 21:03:03.246358 1 shared_informer.go:262] Caches are synced for RequestHeaderAuthRequestController
kubectl logs -f scheduler-plugins-controller-5d97947dd8-svvb8 -n scheduler-plugins
W0422 10:58:22.601890 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0422 10:58:22.806294 1 logr.go:261] controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"=":8080"
I0422 10:58:22.806763 1 logr.go:261] setup "msg"="starting manager"
I0422 10:58:22.806916 1 elasticquota.go:115] "Starting Elastic Quota control loop"
I0422 10:58:22.806990 1 elasticquota.go:117] "Waiting for informer caches to sync"
I0422 10:58:22.807112 1 internal.go:362] "msg"="Starting server" "addr"={"IP":"::","Port":8080,"Zone":""} "kind"="metrics" "path"="/metrics"
I0422 10:58:22.807219 1 controller.go:185] "msg"="Starting EventSource" "controller"="podgroup" "controllerGroup"="scheduling.x-k8s.io" "controllerKind"="PodGroup" "source"="kind source: *v1alpha1.PodGroup"
I0422 10:58:22.807318 1 controller.go:185] "msg"="Starting EventSource" "controller"="podgroup" "controllerGroup"="scheduling.x-k8s.io" "controllerKind"="PodGroup" "source"="kind source: *v1.Pod"
I0422 10:58:22.807239 1 internal.go:362] "msg"="Starting server" "addr"={"IP":"::","Port":8081,"Zone":""} "kind"="health probe"
I0422 10:58:22.807385 1 controller.go:193] "msg"="Starting Controller" "controller"="podgroup" "controllerGroup"="scheduling.x-k8s.io" "controllerKind"="PodGroup"
I0422 10:58:22.907266 1 elasticquota.go:122] "Elastic Quota sync finished"
I0422 10:58:22.908053 1 controller.go:227] "msg"="Starting workers" "controller"="podgroup" "controllerGroup"="scheduling.x-k8s.io" "controllerKind"="PodGroup" "worker count"=1
kubectl logs -f appgroup-controller-5fb544569c-l4856 -n network-aware-controllers
W0422 10:56:47.748474 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0422 10:56:47.750391 1 appgroup.go:104] "Starting App Group controller"
I0422 10:56:47.850601 1 appgroup.go:111] "App Group sync finished"
kubectl logs -f networktopology-controller-67b5fc85bf-m4qqg -n network-aware-controllers
W0422 10:57:51.562237 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0422 10:57:51.563802 1 networktopology.go:147] "Starting Network Topology controller"
I0422 10:57:51.664820 1 networktopology.go:155] "Network Topology sync finished"
Hi @dyyfk, Thank you for the detailed description of your deployment.
I notice you are missing two important labels (appgroup.diktyo.x-k8s.io and appgroup.diktyo.x-k8s.io.workload) in the pod deployment files based on the AppGroup CRD. Please see the following example below. It should work as expected after including these labels. Please let me know if it worked! Regards
appgroup.diktyo.x-k8s.io: Tells the scheduler which AppGroup the pod belongs to. It can be appgroup.diktyo.k8s.io, depends on the api version you are using.
appgroup.diktyo.x-k8s.io.workload: Tells the scheduler which workload it corresponds to. It can be appgroup.diktyo.x-k8s.io.workload OR workload if a previous version of the AppGroup is deployed. Please check the api version you are using.
# online boutique example for adservice
apiVersion: apps/v1
kind: Deployment
metadata:
name: adservice
spec:
replicas: 1
selector:
matchLabels:
app: adservice
template:
metadata:
labels:
app: adservice
appgroup.diktyo.x-k8s.io: online-boutique
appgroup.diktyo.x-k8s.io.workload: adservice
spec:
schedulerName: network-aware-scheduler
initContainers:
- name: sfx-instrumentation
image: quay.io/signalfuse/sfx-zero-config-agent:latest
# image: sfx-zero-config-agent
# imagePullPolicy: Never
volumeMounts:
- mountPath: /opt/sfx/
name: sfx-instrumentation
containers:
- name: server
image: quay.io/signalfuse/microservices-demo-adservice:433c23881a
ports:
- containerPort: 9555
env:
- name: PORT
value: '9555'
- name: OTEL_EXPORTER_ZIPKIN_SERVICE_NAME
value: adservice
- name: NODE_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: OTEL_EXPORTER
value: zipkin
- name: JAVA_TOOL_OPTIONS
value: -javaagent:/opt/sfx/splunk-otel-javaagent-all.jar
- name: OTEL_EXPORTER_ZIPKIN_ENDPOINT
value: 'http://$(NODE_IP):9411/v1/trace'
volumeMounts:
- mountPath: /opt/sfx
name: sfx-instrumentation
resources:
requests:
cpu: 200m
memory: 180Mi
limits:
cpu: 300m
memory: 300Mi
readinessProbe:
initialDelaySeconds: 60
periodSeconds: 25
exec:
command: ['/bin/grpc_health_probe', '-addr=:9555']
livenessProbe:
initialDelaySeconds: 60
periodSeconds: 30
exec:
command: ['/bin/grpc_health_probe', '-addr=:9555']
volumes:
- emptyDir: {}
name: sfx-instrumentation
---
Hi, it looks like I am using an older version build.
Here is my updated yaml file.
apiVersion: apps/v1
kind: Deployment
metadata:
name: p1
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: p1
template:
metadata:
labels:
app: p1
appgroup.diktyo.k8s.io: a1
appgroup.diktyo.k8s.io.workload: p1
spec:
schedulerName: network-aware-scheduler
containers:
- name: p1-container
image: nginx
ports:
- containerPort: 80
apiVersion: apps/v1
kind: Deployment
metadata:
name: p2
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: p2
template:
metadata:
labels:
app: p2
appgroup.diktyo.k8s.io: a1
appgroup.diktyo.k8s.io.workload: p2
spec:
schedulerName: network-aware-scheduler
containers:
- name: p2-container
image: redis
ports:
- containerPort: 6379
apiVersion: apps/v1
kind: Deployment
metadata:
name: p3
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: p3
template:
metadata:
labels:
app: p3
appgroup.diktyo.k8s.io: a1
appgroup.diktyo.k8s.io.workload: p3
spec:
schedulerName: network-aware-scheduler
containers:
- name: p3-container
image: postgres
env:
- name: POSTGRES_PASSWORD
value: "password"
ports:
- containerPort: 5432
I ran into error in the networktopology-controller as it could not find the workload. It looks like I have "appgroups.appgroup.diktyo.k8s.io" and it is trying to find "appgroup.appgroup.diktyo.k8s.io".
ubuntu@ip-172-31-3-80:~$ kubectl get crds
NAME CREATED AT
appgroups.appgroup.diktyo.k8s.io 2023-04-22T10:54:54Z
appgroups.appgroup.diktyo.x-k8s.io 2023-04-22T10:51:15Z
bgpconfigurations.crd.projectcalico.org 2023-04-22T10:37:08Z
bgppeers.crd.projectcalico.org 2023-04-22T10:37:08Z
blockaffinities.crd.projectcalico.org 2023-04-22T10:37:08Z
caliconodestatuses.crd.projectcalico.org 2023-04-22T10:37:08Z
clusterinformations.crd.projectcalico.org 2023-04-22T10:37:08Z
elasticquotas.scheduling.x-k8s.io 2023-04-22T10:51:15Z
felixconfigurations.crd.projectcalico.org 2023-04-22T10:37:08Z
globalnetworkpolicies.crd.projectcalico.org 2023-04-22T10:37:08Z
globalnetworksets.crd.projectcalico.org 2023-04-22T10:37:08Z
hostendpoints.crd.projectcalico.org 2023-04-22T10:37:08Z
ipamblocks.crd.projectcalico.org 2023-04-22T10:37:08Z
ipamconfigs.crd.projectcalico.org 2023-04-22T10:37:08Z
ipamhandles.crd.projectcalico.org 2023-04-22T10:37:08Z
ippools.crd.projectcalico.org 2023-04-22T10:37:08Z
ipreservations.crd.projectcalico.org 2023-04-22T10:37:08Z
kubecontrollersconfigurations.crd.projectcalico.org 2023-04-22T10:37:08Z
networkpolicies.crd.projectcalico.org 2023-04-22T10:37:08Z
networksets.crd.projectcalico.org 2023-04-22T10:37:08Z
networktopologies.networktopology.diktyo.k8s.io 2023-04-22T10:57:42Z
networktopologies.networktopology.diktyo.x-k8s.io 2023-04-22T10:51:15Z
noderesourcetopologies.topology.node.k8s.io 2023-04-22T10:51:15Z
podgroups.scheduling.x-k8s.io 2023-04-22T10:51:15Z
ubuntu@ip-172-31-3-80:~/scheduler-plugins/manifests/crds$ kubectl logs -f networktopology-controller-67b5fc85bf-m4qqg -n network-aware-controllers
W0422 10:57:51.562237 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0422 10:57:51.563802 1 networktopology.go:147] "Starting Network Topology controller"
I0422 10:57:51.664820 1 networktopology.go:155] "Network Topology sync finished"
E0424 21:31:11.301967 1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.k8s.io \"a1\" not found"
E0424 21:31:11.322252 1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.k8s.io \"a1\" not found"
Hi @dyyfk,
Which version of the controller are you using?
The most updated versions are hosted here: AppGroup and NetworkTopology. The controllers can be built locally and deployment files for K8s are available in the manifests folder.
I just updated both APIs (AppGroup and networkTopology) to the most recent version: v1.0.3-alpha
Hi @jpedro1992 , I have updated the appgroup controller image and networktopology controller image to the latest, and I am also using the latest APIs. But I am still getting errors.
apiVersion: apps/v1
kind: Deployment
metadata:
name: p1
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: p1
template:
metadata:
labels:
app: p1
appgroup.diktyo.x-k8s.io: a1
appgroup.diktyo.x-k8s.io.workload: p1
spec:
schedulerName: network-aware-scheduler
containers:
- name: p1-container
image: nginx
ports:
- containerPort: 80
apiVersion: apps/v1
kind: Deployment
metadata:
name: p2
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: p2
template:
metadata:
labels:
app: p2
appgroup.diktyo.x-k8s.io: a1
appgroup.diktyo.x-k8s.io.workload: p2
spec:
schedulerName: network-aware-scheduler
containers:
- name: p2-container
image: redis
ports:
- containerPort: 6379
apiVersion: apps/v1
kind: Deployment
metadata:
name: p3
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: p3
template:
metadata:
labels:
app: p3
appgroup.diktyo.x-k8s.io: a1
appgroup.diktyo.x-k8s.io.workload: p3
spec:
schedulerName: network-aware-scheduler
containers:
- name: p3-container
image: postgres
env:
- name: POSTGRES_PASSWORD
value: "password"
ports:
- containerPort: 5432
ubuntu@ip-172-31-11-54:~/test_deploy$ kubectl logs -f networktopology-controller-67b5fc85bf-l9wv9 -n network-aware-controllers
W0425 23:27:43.144138 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0425 23:27:43.145849 1 networktopology.go:147] "Starting Network Topology controller"
I0425 23:27:43.247119 1 networktopology.go:155] "Network Topology sync finished"
E0425 23:42:39.281890 1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.x-k8s.io \"a1\" not found"
E0425 23:42:39.287722 1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.x-k8s.io \"a1\" not found"
E0425 23:42:39.301695 1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.x-k8s.io \"a1\" not found"
E0425 23:42:40.035222 1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.x-k8s.io \"a1\" not found"
# Example App Group CRD spec
apiVersion: appgroup.diktyo.x-k8s.io/v1alpha1
kind: AppGroup
metadata:
name: a1
spec:
numMembers: 3
topologySortingAlgorithm: KahnSort
workloads:
- workload:
kind: Deployment
name: p1
selector: p1
apiVersion: apps/v1
namespace: default
dependencies:
- workload:
kind: Deployment
name: p2
selector: p2
apiVersion: apps/v1
namespace: default
minBandwidth: "100Mi"
maxNetworkCost: 30
- workload:
kind: Deployment
name: p2
selector: p2
apiVersion: apps/v1
namespace: default
dependencies:
- workload:
kind: Deployment
name: p3
selector: p3
apiVersion: apps/v1
namespace: default
minBandwidth: "250Mi"
maxNetworkCost: 20
- workload:
kind: Deployment
name: P3-deployment
selector: p3
apiVersion: apps/v1
namespace: default
# Example Network CRD
apiVersion: networktopology.diktyo.x-k8s.io/v1alpha1
kind: NetworkTopology
metadata:
name: net-topology-test
namespace: default
spec:
configmapName: "netperfMetrics"
weights:
# Region label: "topology.kubernetes.io/region"
# Zone Label: "topology.kubernetes.io/zone"
# 2 Regions: us-west-1
# us-east-1
# 4 Zones: us-west-1: z1, z2
# us-east-1: z3, z4
- name: "UserDefined"
topologyList: # Define weights between regions or between zones
- topologyKey: "topology.kubernetes.io/region" # region costs
originList:
- origin: "us-west-1"
costList:
- destination: "us-east-1"
bandwidthCapacity: "10Gi"
networkCost: 20
- origin: "us-east-1"
costList:
- destination: "us-west-1"
bandwidthCapacity: "10Gi"
networkCost: 20
- topologyKey: "topology.kubernetes.io/zone" # zone costs
originList:
- origin: "z1"
costList:
- destination: "z2"
bandwidthCapacity: "1Gi"
networkCost: 5
- origin: "z2"
costList:
- destination: "z1"
bandwidthCapacity: "1Gi"
networkCost: 5
- origin: "z3"
costList:
- destination: "z4"
bandwidthCapacity: "1Gi"
networkCost: 10
- origin: "z4"
costList:
- destination: "z3"
bandwidthCapacity: "1Gi"
networkCost: 10
ubuntu@ip-172-31-11-54:~/test_deploy$ kubectl describe appgroups
Name: a1
Namespace: default
Labels: <none>
Annotations: <none>
API Version: appgroup.diktyo.x-k8s.io/v1alpha1
Kind: AppGroup
Metadata:
Creation Timestamp: 2023-04-25T23:30:33Z
Generation: 9
Resource Version: 5859
UID: fa7720b8-298e-4f8d-a650-dee72767802f
Spec:
Num Members: 3
Topology Sorting Algorithm: KahnSort
Workloads:
Dependencies:
Max Network Cost: 30
Min Bandwidth: 100Mi
Workload:
API Version: apps/v1
Kind: Deployment
Name: p2
Namespace: default
Selector: p2
Workload:
API Version: apps/v1
Kind: Deployment
Name: p1
Namespace: default
Selector: p1
Dependencies:
Max Network Cost: 20
Min Bandwidth: 250Mi
Workload:
API Version: apps/v1
Kind: Deployment
Name: p3
Namespace: default
Selector: p3
Workload:
API Version: apps/v1
Kind: Deployment
Name: p2
Namespace: default
Selector: p2
Workload:
API Version: apps/v1
Kind: Deployment
Name: P3-deployment
Namespace: default
Selector: p3
Status:
Running Workloads: 3
Topology Calculation Time: 2023-04-25T23:30:33Z
Topology Order:
Index: 1
Workload:
API Version: apps/v1
Kind: Deployment
Name: p1
Namespace: default
Selector: p1
Index: 2
Workload:
API Version: apps/v1
Kind: Deployment
Name: p2
Namespace: default
Selector: p2
Index: 3
Workload:
API Version: apps/v1
Kind: Deployment
Name: P3-deployment
Namespace: default
Selector: p3
Events: <none>
ubuntu@ip-172-31-11-54:~/test_deploy$ kubectl describe networktopologies
Name: net-topology-test
Namespace: default
Labels: <none>
Annotations: <none>
API Version: networktopology.diktyo.x-k8s.io/v1alpha1
Kind: NetworkTopology
Metadata:
Creation Timestamp: 2023-04-25T23:33:14Z
Generation: 1
Resource Version: 4356
UID: 36218354-3371-4a36-a855-09159519dcec
Spec:
Configmap Name: netperfMetrics
Weights:
Name: UserDefined
Topology List:
Origin List:
Cost List:
Bandwidth Capacity: 10Gi
Destination: us-east-1
Network Cost: 20
Origin: us-west-1
Cost List:
Bandwidth Capacity: 10Gi
Destination: us-west-1
Network Cost: 20
Origin: us-east-1
Topology Key: topology.kubernetes.io/region
Origin List:
Cost List:
Bandwidth Capacity: 1Gi
Destination: z2
Network Cost: 5
Origin: z1
Cost List:
Bandwidth Capacity: 1Gi
Destination: z1
Network Cost: 5
Origin: z2
Cost List:
Bandwidth Capacity: 1Gi
Destination: z4
Network Cost: 10
Origin: z3
Cost List:
Bandwidth Capacity: 1Gi
Destination: z3
Network Cost: 10
Origin: z4
Topology Key: topology.kubernetes.io/zone
Events: <none>
kubectl get crds
NAME CREATED AT
appgroups.appgroup.diktyo.x-k8s.io 2023-04-25T23:13:48Z
bgpconfigurations.crd.projectcalico.org 2023-04-25T22:58:20Z
bgppeers.crd.projectcalico.org 2023-04-25T22:58:20Z
blockaffinities.crd.projectcalico.org 2023-04-25T22:58:20Z
caliconodestatuses.crd.projectcalico.org 2023-04-25T22:58:20Z
clusterinformations.crd.projectcalico.org 2023-04-25T22:58:20Z
elasticquotas.scheduling.x-k8s.io 2023-04-25T23:13:48Z
felixconfigurations.crd.projectcalico.org 2023-04-25T22:58:20Z
globalnetworkpolicies.crd.projectcalico.org 2023-04-25T22:58:20Z
globalnetworksets.crd.projectcalico.org 2023-04-25T22:58:20Z
hostendpoints.crd.projectcalico.org 2023-04-25T22:58:20Z
ipamblocks.crd.projectcalico.org 2023-04-25T22:58:20Z
ipamconfigs.crd.projectcalico.org 2023-04-25T22:58:20Z
ipamhandles.crd.projectcalico.org 2023-04-25T22:58:20Z
ippools.crd.projectcalico.org 2023-04-25T22:58:20Z
ipreservations.crd.projectcalico.org 2023-04-25T22:58:20Z
kubecontrollersconfigurations.crd.projectcalico.org 2023-04-25T22:58:20Z
networkpolicies.crd.projectcalico.org 2023-04-25T22:58:20Z
networksets.crd.projectcalico.org 2023-04-25T22:58:20Z
networktopologies.networktopology.diktyo.x-k8s.io 2023-04-25T23:13:48Z
noderesourcetopologies.topology.node.k8s.io 2023-04-25T23:13:48Z
podgroups.scheduling.x-k8s.io 2023-04-25T23:13:48Z
Hi @dyyfk,
Everything seems fine to me, not sure why the networkTopology controller is failing. AppGroup seems to be working fine. Have you removed previous versions of the APIs before submitting the new version?
I will double check the deployment for the new version and see what might be happening.
Meanwhile, you could use AppGroup (v0.0.9-alpha) and NetworkTopology (v0.0.8-alpha) with these two images for the controllers: appGroup and networkTopology. This version was working on v1.24.4 with scheduler-plugins v0.24.9.
Hi, @jpedro1992, I appreciate your help. But I would prefer to use it on the latest version since the intended use case is to deploy this scheduler on the latest Kubernetes version.
I also tried deploying the online-boutique example using the yaml file in this thread. But it still has the same problem saying that it could not find the corresponding appgroup.
I would appreciate it if you could try to reproduce the errors I face using the latest build. Thanks again for all your time and efforts!
Here is a list of steps that I used:
sudo usermod -aG docker $USER && newgrp docker minikube start --network-plugin=cni --cni=calico --nodes=4 --insecure-registry "10.0.0.0/24" --kubernetes-version=v1.25.7 minikube addons enable registry
// start a local registry docker run --rm -it --network=host alpine ash -c "apk add socat && socat TCP-LISTEN:5000,reuseaddr,fork TCP:$(minikube ip):5000"
make-local image docker push localhost:5000/appgroup-controller/controller:latest docker push localhost:5000/network-topology-controller/controller:latest
kubectl apply -f crds/
// Uncomment the necessary section of network-aware-controllers in all-in-one.yaml first kubectl apply -f install/all-in-one.yaml
kubectl apply -f networktopology/networktopology-controller-deployment.yaml kubectl apply -f appgroup/appgroup-controller-deployment.yaml
// Change the image in deploy.yaml to
kubectl label nodes
// copy /etc/kubernetes/scheduler.conf
kubectl apply -f scheduler-configmap-v1beta3.yaml kubectl apply -f deploy.yaml
Hi, @jpedro1992. I would like to mention another issue I found after the update.
In the latest version, I could not get the scheduler running, even though kubectl tells the deployment has been configured.
ubuntu@ip-172-31-11-54:~/scheduler-plugins/manifests/networktopology$ kubectl apply -f scheduler-configmap-v1beta3.yaml
configmap/network-aware-scheduler-config configured
ubuntu@ip-172-31-11-54:~/scheduler-plugins/manifests/networktopology$ kubectl apply -f deploy.yaml
deployment.apps/network-aware-scheduler configured
ubuntu@ip-172-31-11-54:~/scheduler-plugins/manifests/networktopology$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default p1-66994f6978-f99gv 0/1 Pending 0 23h
kube-system calico-kube-controllers-798cc86c47-7p4x8 1/1 Running 1 (24h ago) 24h
kube-system calico-node-hqh8r 1/1 Running 0 24h
kube-system calico-node-mc47f 1/1 Running 0 24h
kube-system calico-node-t5tx6 1/1 Running 0 24h
kube-system calico-node-wzl5l 1/1 Running 0 24h
kube-system coredns-565d847f94-bvm4z 1/1 Running 1 (24h ago) 24h
kube-system etcd-minikube 1/1 Running 0 24h
kube-system kube-apiserver-minikube 1/1 Running 0 24h
kube-system kube-controller-manager-minikube 1/1 Running 0 24h
kube-system kube-proxy-ddp8h 1/1 Running 0 24h
kube-system kube-proxy-gzxcw 1/1 Running 0 24h
kube-system kube-proxy-t64hs 1/1 Running 0 24h
kube-system kube-proxy-z2d6c 1/1 Running 0 24h
kube-system kube-scheduler-minikube 1/1 Running 0 24h
kube-system registry-gpjsn 1/1 Running 0 24h
kube-system registry-proxy-5684v 1/1 Running 0 24h
kube-system registry-proxy-7bwcg 1/1 Running 0 24h
kube-system registry-proxy-bvhj4 1/1 Running 0 24h
kube-system registry-proxy-cdlft 1/1 Running 0 24h
kube-system storage-provisioner 1/1 Running 1 (24h ago) 24h
network-aware-controllers appgroup-controller-5fb544569c-tnmpr 1/1 Running 0 23h
network-aware-controllers networktopology-controller-67b5fc85bf-rj5nw 1/1 Running 0 23h
scheduler-plugins scheduler-plugins-controller-5d97947dd8-hddfq 1/1 Running 4 (23h ago) 23h
default p1-fb8975b8d-vm9b8 1/1 Running 0 2d23h
default p2-66c74dbbdc-rvvf9 1/1 Running 0 2d23h
default p3-75f556cc47-dcdnj 1/1 Running 0 2d23h
kube-system calico-kube-controllers-798cc86c47-89g8h 1/1 Running 3 (5d11h ago) 5d11h
kube-system calico-node-2gnwz 1/1 Running 0 5d11h
kube-system calico-node-49wcq 1/1 Running 0 5d11h
kube-system calico-node-89h4b 1/1 Running 0 5d11h
kube-system calico-node-cp2fr 1/1 Running 0 5d11h
kube-system calico-node-dv2bv 1/1 Running 0 5d11h
kube-system calico-node-hhkpg 1/1 Running 0 5d11h
kube-system calico-node-lkshk 1/1 Running 0 5d11h
kube-system calico-node-lr4zk 1/1 Running 0 5d11h
kube-system coredns-565d847f94-ggkhh 1/1 Running 1 (5d11h ago) 5d11h
kube-system etcd-minikube 1/1 Running 0 5d11h
kube-system kube-apiserver-minikube 1/1 Running 0 5d11h
kube-system kube-controller-manager-minikube 1/1 Running 0 5d11h
kube-system kube-proxy-5zccp 1/1 Running 0 5d11h
kube-system kube-proxy-6p6dd 1/1 Running 0 5d11h
kube-system kube-proxy-9rhrs 1/1 Running 0 5d11h
kube-system kube-proxy-g5825 1/1 Running 0 5d11h
kube-system kube-proxy-kl4p7 1/1 Running 0 5d11h
kube-system kube-proxy-lr4rt 1/1 Running 0 5d11h
kube-system kube-proxy-sbh6q 1/1 Running 0 5d11h
kube-system kube-proxy-t9nqx 1/1 Running 0 5d11h
kube-system kube-scheduler-minikube 1/1 Running 0 5d11h
kube-system network-aware-scheduler-5ffc766dd9-tk88r 1/1 Running 0 5d
kube-system storage-provisioner 1/1 Running 1 (5d11h ago) 5d11h
network-aware-controllers appgroup-controller-5fb544569c-l4856 1/1 Running 0 5d10h
network-aware-controllers networktopology-controller-67b5fc85bf-m4qqg 1/1 Running 0 5d10h
scheduler-plugins scheduler-plugins-controller-5d97947dd8-svvb8 1/1 Running 0 5d10h
Hi @dyyfk, thank you for the detailed description!
I went over it and indeed I get the exact same errors as you do, however, I am able to deploy the pods successfully. AppGroup seems fine but networkTopology seems to fail retrieving the AppGroup and converting the costs, but in my case, everything is fine despite that. Please see the logs below.
I will investigate this further, but it can be an error due to previous versions of the containers still be running with a different API version. Please see here.
Please check your rbac rules since I found a few manifests that need to be updated to x-k8s instead of k8s. Example these ones need to be updated. I will create an issue and a PR for this.
AppGroup Controller:
kubectl logs appgroup-controller-cd49d4546-9dlkh -n network-aware-controllers
W0428 12:00:15.364023 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0428 12:00:15.364751 1 appgroup.go:104] "Starting App Group controller"
I0428 12:00:15.465560 1 appgroup.go:111] "App Group sync finished"
NetworkTopology Controller:
kubectl logs networktopology-controller-5fb64c7769-wbhkf -n network-aware-controllers W0428 11:57:40.141750 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0428 11:57:40.142516 1 networktopology.go:147] "Starting Network Topology controller"
I0428 11:57:40.242723 1 networktopology.go:155] "Network Topology sync finished"
I0428 11:57:40.242775 1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
I0428 11:57:40.242788 1 networktopology.go:765] "NetworkTopology SyncHandler: Update costs in the network graph... "
E0428 11:57:40.242813 1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.242869 1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.242953 1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243057 1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243176 1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243263 1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243273 1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243287 1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243313 1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243340 1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243368 1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243400 1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243434 1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243467 1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243571 1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243762 1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243976 1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.244084 1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
I0428 11:57:40.284531 1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
NetworkTopology CRD:
kubectl describe networktopologies
Name: net-topology-test
Namespace: default
Labels: <none>
Annotations: <none>
API Version: networktopology.diktyo.x-k8s.io/v1alpha1
Kind: NetworkTopology
Metadata:
Creation Timestamp: 2023-04-28T11:41:08Z
Generation: 3
Managed Fields:
API Version: networktopology.diktyo.x-k8s.io/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
.:
f:configmapName:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2023-04-28T11:41:08Z
API Version: networktopology.diktyo.x-k8s.io/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:spec:
f:weights:
f:status:
.:
f:nodeCount:
f:weightCalculationTime:
Manager: controller
Operation: Update
Time: 2023-04-28T11:41:09Z
Resource Version: 18425531
UID: b1b147ce-4aa7-43f0-8bb9-9ea4ed5eef50
Spec:
Configmap Name: netperf-metrics
Weights:
Name: UserDefined
Topology List:
Origin List:
Origin: cloud
Topology Key: topology.kubernetes.io/region
Origin List:
Origin: z1
Origin: z2
Origin: z3
Origin: z4
Origin: z5
Origin: z6
Origin: z7
Origin: z8
Origin: z9
Origin: z10
Topology Key: topology.kubernetes.io/zone
Name: NetperfCosts
Topology List:
Origin List:
Cost List:
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: cloud
Network Cost: 28
Origin:
Cost List:
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination:
Network Cost: 28
Origin: cloud
Topology Key: topology.kubernetes.io/region
Origin List:
Origin:
Cost List:
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z10
Network Cost: 76
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z2
Network Cost: 77
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z3
Network Cost: 68
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z4
Network Cost: 77
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z6
Network Cost: 49
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z7
Network Cost: 49
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z8
Network Cost: 74
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z9
Network Cost: 79
Origin: z1
Cost List:
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z1
Network Cost: 76
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z2
Network Cost: 55
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z3
Network Cost: 46
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z4
Network Cost: 55
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z6
Network Cost: 27
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z7
Network Cost: 27
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z8
Network Cost: 52
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z9
Network Cost: 57
Origin: z10
Cost List:
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z1
Network Cost: 77
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z10
Network Cost: 55
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z3
Network Cost: 47
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z4
Network Cost: 56
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z6
Network Cost: 28
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z7
Network Cost: 28
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z8
Network Cost: 53
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z9
Network Cost: 58
Origin: z2
Cost List:
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z1
Network Cost: 68
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z10
Network Cost: 46
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z2
Network Cost: 47
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z4
Network Cost: 47
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z6
Network Cost: 19
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z7
Network Cost: 19
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z8
Network Cost: 44
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z9
Network Cost: 49
Origin: z3
Cost List:
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z1
Network Cost: 77
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z10
Network Cost: 55
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z2
Network Cost: 56
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z3
Network Cost: 47
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z6
Network Cost: 28
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z7
Network Cost: 28
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z8
Network Cost: 53
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z9
Network Cost: 58
Origin: z4
Cost List:
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z1
Network Cost: 49
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z10
Network Cost: 27
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z2
Network Cost: 28
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z3
Network Cost: 19
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z4
Network Cost: 28
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z7
Network Cost: 0
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z8
Network Cost: 25
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z9
Network Cost: 30
Origin: z6
Cost List:
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z1
Network Cost: 49
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z10
Network Cost: 27
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z2
Network Cost: 28
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z3
Network Cost: 19
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z4
Network Cost: 28
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z6
Network Cost: 0
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z8
Network Cost: 25
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z9
Network Cost: 30
Origin: z7
Cost List:
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z1
Network Cost: 74
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z10
Network Cost: 52
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z2
Network Cost: 53
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z3
Network Cost: 44
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z4
Network Cost: 53
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z6
Network Cost: 25
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z7
Network Cost: 25
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z9
Network Cost: 55
Origin: z8
Cost List:
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z1
Network Cost: 79
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z10
Network Cost: 57
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z2
Network Cost: 58
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z3
Network Cost: 49
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z4
Network Cost: 58
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z6
Network Cost: 30
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z7
Network Cost: 30
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: z8
Network Cost: 55
Origin: z9
Topology Key: topology.kubernetes.io/zone
Status:
Node Count: 10
Weight Calculation Time: 2023-04-28T11:57:40Z
Events: <none>
Small part of the scheduler logs:
I0428 12:04:50.043577 1 eventhandlers.go:116] "Add event for unscheduled pod" pod="default/recommendationservice-84df96db59-6zz8q"
I0428 12:04:50.043630 1 scheduling_queue.go:957] "About to try and schedule pod" pod="default/recommendationservice-84df96db59-6zz8q"
I0428 12:04:50.043644 1 schedule_one.go:85] "Attempting to schedule pod" pod="default/recommendationservice-84df96db59-6zz8q"
I0428 12:04:50.043696 1 networkoverhead.go:612] "namespaces: %s" [default]="(MISSING)"
I0428 12:04:50.043713 1 networkoverhead.go:614] "appGroup CR" namespace="default" ag.lister=&{indexer:0xc0007910c8}
I0428 12:04:50.043723 1 networkoverhead.go:629] "namespaces: %s" [default]="(MISSING)"
I0428 12:04:50.043736 1 networkoverhead.go:631] "networkTopology CR:" namespace="default" nt.lister=&{indexer:0xc000a08090}
I0428 12:04:50.044779 1 networkoverhead.go:242] "Node info" name="n1.ml.ilabt-imec-be.wall2.ilabt.iminds.be" region="cloud" zone="z1"
I0428 12:04:50.044845 1 networkoverhead.go:249] "Map" costMap=map[{Origin:cloud Destination:}:28 {Origin:z1 Destination:z10}:76 {Origin:z1 Destination:z2}:77 {Origin:z1 Destination:z3}:68 {Origin:z1 Destination:z4}:77 {Origin:z1 Destination:z6}:49 {Origin:z1 Destination:z7}:49 {Origin:z1 Destination:z8}:74 {Origin:z1 Destination:z9}:79]
I0428 12:04:50.044876 1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=6
I0428 12:04:50.044902 1 networkoverhead.go:270] "Node final cost" cost=600
I0428 12:04:50.044922 1 networkoverhead.go:242] "Node info" name="n2.ml.ilabt-imec-be.wall2.ilabt.iminds.be" region="cloud" zone="z2"
I0428 12:04:50.044967 1 networkoverhead.go:249] "Map" costMap=map[{Origin:cloud Destination:}:28 {Origin:z2 Destination:z1}:77 {Origin:z2 Destination:z10}:55 {Origin:z2 Destination:z3}:47 {Origin:z2 Destination:z4}:56 {Origin:z2 Destination:z6}:28 {Origin:z2 Destination:z7}:28 {Origin:z2 Destination:z8}:53 {Origin:z2 Destination:z9}:58]
I0428 12:04:50.044992 1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=6
I0428 12:04:50.045027 1 networkoverhead.go:270] "Node final cost" cost=600
I0428 12:04:50.045049 1 networkoverhead.go:242] "Node info" name="n4.ml.ilabt-imec-be.wall2.ilabt.iminds.be" region="cloud" zone="z4"
I0428 12:04:50.045087 1 networkoverhead.go:249] "Map" costMap=map[{Origin:cloud Destination:}:28 {Origin:z4 Destination:z1}:77 {Origin:z4 Destination:z10}:55 {Origin:z4 Destination:z2}:56 {Origin:z4 Destination:z3}:47 {Origin:z4 Destination:z6}:28 {Origin:z4 Destination:z7}:28 {Origin:z4 Destination:z8}:53 {Origin:z4 Destination:z9}:58]
I0428 12:04:50.045114 1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=6
I0428 12:04:50.045137 1 networkoverhead.go:270] "Node final cost" cost=600
I0428 12:04:50.045155 1 networkoverhead.go:242] "Node info" name="n10.ml.ilabt-imec-be.wall2.ilabt.iminds.be" region="cloud" zone="z10"
I0428 12:04:50.045194 1 networkoverhead.go:249] "Map" costMap=map[{Origin:cloud Destination:}:28 {Origin:z10 Destination:z1}:76 {Origin:z10 Destination:z2}:55 {Origin:z10 Destination:z3}:46 {Origin:z10 Destination:z4}:55 {Origin:z10 Destination:z6}:27 {Origin:z10 Destination:z7}:27 {Origin:z10 Destination:z8}:52 {Origin:z10 Destination:z9}:57]
I0428 12:04:50.045219 1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=6
I0428 12:04:50.045243 1 networkoverhead.go:270] "Node final cost" cost=600
I0428 12:04:50.045263 1 networkoverhead.go:242] "Node info" name="n6.ml.ilabt-imec-be.wall2.ilabt.iminds.be" region="cloud" zone="z6"
I0428 12:04:50.045302 1 networkoverhead.go:249] "Map" costMap=map[{Origin:cloud Destination:}:28 {Origin:z6 Destination:z1}:49 {Origin:z6 Destination:z10}:27 {Origin:z6 Destination:z2}:28 {Origin:z6 Destination:z3}:19 {Origin:z6 Destination:z4}:28 {Origin:z6 Destination:z7}:0 {Origin:z6 Destination:z8}:25 {Origin:z6 Destination:z9}:30]
I0428 12:04:50.045329 1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=6
I0428 12:04:50.045350 1 networkoverhead.go:270] "Node final cost" cost=600
I0428 12:04:50.045367 1 networkoverhead.go:242] "Node info" name="n5.ml.ilabt-imec-be.wall2.ilabt.iminds.be" region="" zone=""
I0428 12:04:50.045384 1 networkoverhead.go:249] "Map" costMap=map[]
I0428 12:04:50.045407 1 networkoverhead.go:263] "Number of dependencies" satisfied=6 violated=0
I0428 12:04:50.045427 1 networkoverhead.go:270] "Node final cost" cost=0
I0428 12:04:50.045446 1 networkoverhead.go:242] "Node info" name="n7.ml.ilabt-imec-be.wall2.ilabt.iminds.be" region="cloud" zone="z7"
I0428 12:04:50.045482 1 networkoverhead.go:249] "Map" costMap=map[{Origin:cloud Destination:}:28 {Origin:z7 Destination:z1}:49 {Origin:z7 Destination:z10}:27 {Origin:z7 Destination:z2}:28 {Origin:z7 Destination:z3}:19 {Origin:z7 Destination:z4}:28 {Origin:z7 Destination:z6}:0 {Origin:z7 Destination:z8}:25 {Origin:z7 Destination:z9}:30]
I0428 12:04:50.045505 1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=6
I0428 12:04:50.045526 1 networkoverhead.go:270] "Node final cost" cost=600
I0428 12:04:50.045544 1 networkoverhead.go:242] "Node info" name="n9.ml.ilabt-imec-be.wall2.ilabt.iminds.be" region="cloud" zone="z9"
I0428 12:04:50.045581 1 networkoverhead.go:249] "Map" costMap=map[{Origin:cloud Destination:}:28 {Origin:z9 Destination:z1}:79 {Origin:z9 Destination:z10}:57 {Origin:z9 Destination:z2}:58 {Origin:z9 Destination:z3}:49 {Origin:z9 Destination:z4}:58 {Origin:z9 Destination:z6}:30 {Origin:z9 Destination:z7}:30 {Origin:z9 Destination:z8}:55]
I0428 12:04:50.045602 1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=6
I0428 12:04:50.045623 1 networkoverhead.go:270] "Node final cost" cost=600
I0428 12:04:50.045641 1 networkoverhead.go:242] "Node info" name="n8.ml.ilabt-imec-be.wall2.ilabt.iminds.be" region="cloud" zone="z8"
I0428 12:04:50.045688 1 networkoverhead.go:249] "Map" costMap=map[{Origin:cloud Destination:}:28 {Origin:z8 Destination:z1}:74 {Origin:z8 Destination:z10}:52 {Origin:z8 Destination:z2}:53 {Origin:z8 Destination:z3}:44 {Origin:z8 Destination:z4}:53 {Origin:z8 Destination:z6}:25 {Origin:z8 Destination:z7}:25 {Origin:z8 Destination:z9}:55]
I0428 12:04:50.045716 1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=6
I0428 12:04:50.045736 1 networkoverhead.go:270] "Node final cost" cost=600
I0428 12:04:50.045756 1 networkoverhead.go:242] "Node info" name="n3.ml.ilabt-imec-be.wall2.ilabt.iminds.be" region="cloud" zone="z3"
I0428 12:04:50.045796 1 networkoverhead.go:249] "Map" costMap=map[{Origin:cloud Destination:}:28 {Origin:z3 Destination:z1}:68 {Origin:z3 Destination:z10}:46 {Origin:z3 Destination:z2}:47 {Origin:z3 Destination:z4}:47 {Origin:z3 Destination:z6}:19 {Origin:z3 Destination:z7}:19 {Origin:z3 Destination:z8}:44 {Origin:z3 Destination:z9}:49]
I0428 12:04:50.045815 1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=6
I0428 12:04:50.045836 1 networkoverhead.go:270] "Node final cost" cost=600
I0428 12:04:50.046202 1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=6
I0428 12:04:50.046289 1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=6
I0428 12:04:50.046328 1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=6
I0428 12:04:50.046384 1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=6
I0428 12:04:50.046421 1 networkoverhead.go:331] "Number of dependencies:" satisfied=6 violated=0
I0428 12:04:50.046475 1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=6
I0428 12:04:50.046500 1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=6
I0428 12:04:50.046511 1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=6
I0428 12:04:50.046564 1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=6
I0428 12:04:50.046389 1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=6
I0428 12:04:50.046827 1 default_binder.go:52] "Attempting to bind pod to node" pod="default/recommendationservice-84df96db59-6zz8q" node="n5.ml.ilabt-imec-be.wall2.ilabt.iminds.be"
I0428 12:04:50.046951 1 request.go:1170] Request Body:
00000000 6b 38 73 00 0a 0d 0a 02 76 31 12 07 42 69 6e 64 |k8s.....v1..Bind|
00000010 69 6e 67 12 a0 01 0a 61 0a 26 72 65 63 6f 6d 6d |ing....a.&recomm|
00000020 65 6e 64 61 74 69 6f 6e 73 65 72 76 69 63 65 2d |endationservice-|
00000030 38 34 64 66 39 36 64 62 35 39 2d 36 7a 7a 38 71 |84df96db59-6zz8q|
00000040 12 00 1a 07 64 65 66 61 75 6c 74 22 00 2a 24 62 |....default".*$b|
00000050 37 35 65 37 66 39 36 2d 34 33 35 32 2d 34 33 32 |75e7f96-4352-432|
00000060 34 2d 62 63 30 63 2d 30 63 32 64 37 66 34 63 34 |4-bc0c-0c2d7f4c4|
00000070 65 35 38 32 00 38 00 42 00 12 3b 0a 04 4e 6f 64 |e582.8.B..;..Nod|
00000080 65 12 00 1a 29 6e 35 2e 6d 6c 2e 69 6c 61 62 74 |e...)n5.ml.ilabt|
00000090 2d 69 6d 65 63 2d 62 65 2e 77 61 6c 6c 32 2e 69 |-imec-be.wall2.i|
000000a0 6c 61 62 74 2e 69 6d 69 6e 64 73 2e 62 65 22 00 |labt.iminds.be".|
000000b0 2a 00 32 00 3a 00 1a 00 22 00 |*.2.:...".|
I0428 12:04:50.047065 1 round_trippers.go:466] curl -v -XPOST -H "Accept: application/vnd.kubernetes.protobuf, */*" -H "Content-Type: application/vnd.kubernetes.protobuf" -H "User-Agent: kube-scheduler/v0.0.0 (linux/amd64) kubernetes/$Format/scheduler" 'https://10.2.35.65:6443/api/v1/namespaces/default/pods/recommendationservice-84df96db59-6zz8q/binding'
I0428 12:04:50.050738 1 eventhandlers.go:159] "Delete event for unscheduled pod" pod="default/recommendationservice-84df96db59-6zz8q"
I0428 12:04:50.050785 1 eventhandlers.go:184] "Add event for scheduled pod" pod="default/recommendationservice-84df96db59-6zz8q"
I0428 12:04:50.051720 1 round_trippers.go:553] POST https://10.2.35.65:6443/api/v1/namespaces/default/pods/recommendationservice-84df96db59-6zz8q/binding 201 Created in 4 milliseconds
Deployed Pods:
kubectl get pods
NAME READY STATUS RESTARTS AGE
adservice-c4cbc7c8b-fl49q 0/1 Running 0 44s
adservice-c4cbc7c8b-m7gjs 0/1 Running 0 44s
adservice-c4cbc7c8b-sstc9 0/1 Running 0 44s
cartservice-6df55f79c4-2mt9h 1/1 Running 1 (35s ago) 44s
cartservice-6df55f79c4-874w7 1/1 Running 1 (37s ago) 44s
cartservice-6df55f79c4-cnlvw 1/1 Running 1 (37s ago) 44s
checkoutservice-784797b87-7pjnf 1/1 Running 0 44s
checkoutservice-784797b87-f98qd 1/1 Running 0 44s
checkoutservice-784797b87-fjvqt 1/1 Running 0 44s
currencyservice-8f567b44f-2lc5t 1/1 Running 0 44s
currencyservice-8f567b44f-gt79t 1/1 Running 0 44s
currencyservice-8f567b44f-xk864 1/1 Running 0 44s
emailservice-deployment-86885f8f64-bnlr9 1/1 Running 0 44s
emailservice-deployment-86885f8f64-n9nt7 1/1 Running 0 44s
emailservice-deployment-86885f8f64-v7tfq 1/1 Running 0 44s
frontend-788886bf4d-2sgkz 1/1 Running 0 44s
frontend-788886bf4d-9vgpw 1/1 Running 0 43s
frontend-788886bf4d-n4qqh 1/1 Running 0 43s
paymentservice-dcb658d-2fkbq 1/1 Running 0 44s
paymentservice-dcb658d-t6djk 1/1 Running 0 43s
paymentservice-dcb658d-xlpcr 1/1 Running 0 43s
productcatalogservice-5d4c7fc654-gw9mn 1/1 Running 0 43s
productcatalogservice-5d4c7fc654-qxrgj 1/1 Running 0 43s
productcatalogservice-5d4c7fc654-rfgks 1/1 Running 0 43s
recommendationservice-84df96db59-6zz8q 1/1 Running 0 42s
recommendationservice-84df96db59-fhbn2 1/1 Running 0 42s
recommendationservice-84df96db59-pv824 1/1 Running 0 43s
redis-cart-775cd7cb9d-9g6c9 1/1 Running 0 43s
redis-cart-775cd7cb9d-c4hsd 1/1 Running 0 42s
redis-cart-775cd7cb9d-jcpj8 1/1 Running 0 42s
shippingservice-5cc9965bd-bmzhr 1/1 Running 0 43s
shippingservice-5cc9965bd-c97ss 1/1 Running 0 42s
shippingservice-5cc9965bd-jw28f 1/1 Running 0 42s
Hi @dyyfk,
I restarted my cluster from scratch (kubeadm). The errors disappear after that.
Something gets broken in the cluster even when previous versions of the CRD and containers are deleted. Not exactly sure why. If you restart your minikube cluster, everything should be fine after that.
Sorry for the extra hassle!
kubectl logs networktopology-controller-5fb64c7769-b5j8n -n network-aware-controllers
W0428 12:58:25.009875 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0428 12:58:25.012995 1 networktopology.go:147] "Starting Network Topology controller"
I0428 12:58:25.113796 1 networktopology.go:155] "Network Topology sync finished"
I0428 13:16:22.869089 1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
I0428 13:16:22.869173 1 networktopology.go:765] "NetworkTopology SyncHandler: Update costs in the network graph... "
I0428 13:16:22.877379 1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
I0428 13:16:32.071381 1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
I0428 13:22:10.672740 1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
I0428 13:22:10.672791 1 networktopology.go:765] "NetworkTopology SyncHandler: Update costs in the network graph... "
I0428 13:22:10.733714 1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
Hi, @jpedro1992 Thanks again for your time! I have tried starting from scratch using the latest deployment in my minikube cluster. But the same error still persists.
I also want to let you know that my scheduler's log is different from yours. In particular, it does not show anything when I am trying to schedule a pod using the network-aware-scheduler. I think this is what the problem is.
ubuntu@ip-172-31-11-54:~$ kubectl logs -f network-aware-scheduler-567b9b9b89-m4qf9 -n kube-system
I0428 21:13:46.885430 1 serving.go:348] Generated self-signed cert in-memory
I0428 21:13:46.886756 1 configfile.go:59] "KubeSchedulerConfiguration v1beta3 is deprecated in v1.26, will be removed in v1.29"
I0428 21:13:47.501657 1 server.go:152] "Starting Kubernetes Scheduler" version="v0.25.7"
I0428 21:13:47.501690 1 server.go:154] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0428 21:13:47.506596 1 secure_serving.go:210] Serving securely on [::]:10259
I0428 21:13:47.506694 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0428 21:13:47.507430 1 shared_informer.go:273] Waiting for caches to sync for RequestHeaderAuthRequestController
I0428 21:13:47.507031 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0428 21:13:47.507579 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0428 21:13:47.507758 1 shared_informer.go:273] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0428 21:13:47.507562 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0428 21:13:47.508169 1 shared_informer.go:273] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0428 21:13:47.607787 1 shared_informer.go:280] Caches are synced for RequestHeaderAuthRequestController
I0428 21:13:47.608219 1 shared_informer.go:280] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0428 21:13:47.607902 1 shared_informer.go:280] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I want to confirm how did you start the scheduler? I start the scheduler as the following
kubectl apply -f scheduler-plugins/manifests/appgroup/scheduler-configmap-v1beta3.yaml/
kubectl apply -f scheduler-plugins/manifests/appgroup/deploy.yaml
In addition, my network controller does not have anything in the following
I0428 13:16:22.869089 1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
I0428 13:16:22.869173 1 networktopology.go:765] "NetworkTopology SyncHandler: Update costs in the network graph... "
I0428 13:16:22.877379 1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
I0428 13:16:32.071381 1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
I0428 13:22:10.672740 1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
I0428 13:22:10.672791 1 networktopology.go:765] "NetworkTopology SyncHandler: Update costs in the network graph... "
I0428 13:22:10.733714 1 networktopology.go:635] ConfigMap netperf-metrics retrieved... ```
I think it could be that I did not use this netperf component. Instead, I used the example.crd for a hardcoded network topology. But in theory, they should not have an impact on the network-topology-controller. Is my assumption correct?
kubectl get configmap --all-namespaces
NAMESPACE NAME DATA AGE
default kube-root-ca.crt 1 10h
kube-node-lease kube-root-ca.crt 1 10h
kube-public cluster-info 8 10h
kube-public kube-root-ca.crt 1 10h
kube-system calico-config 4 10h
kube-system coredns 1 10h
kube-system extension-apiserver-authentication 6 10h
kube-system kube-proxy 2 10h
kube-system kube-root-ca.crt 1 10h
kube-system kubeadm-config 1 10h
kube-system kubelet-config 1 10h
kube-system network-aware-scheduler-config 1 10h
network-aware-controllers kube-root-ca.crt 1 10h
scheduler-plugins kube-root-ca.crt 1 10h
Hi @jpedro1992, I now used the netperf component on a eks cluster on aws. and I am getting the same error as yours. However, my network topologies's costs are all 0s.
ubuntu@ip-172-31-6-60:~/networktopology-api/manifests$ kubectl describe networktopologies
Name: net-topology-test
Namespace: default
Labels: <none>
Annotations: <none>
API Version: networktopology.diktyo.x-k8s.io/v1alpha1
Kind: NetworkTopology
Metadata:
Creation Timestamp: 2023-04-30T23:16:41Z
Generation: 2
Resource Version: 3509558
UID: f1165d12-4b36-4954-a89f-2b8af7af889a
Spec:
Configmap Name: netperf-metrics
Weights:
Name: UserDefined
Topology List:
Origin List:
Origin: us-west-2
Topology Key: topology.kubernetes.io/region
Origin List:
Origin: us-west-2b-1
Origin: us-west-2b-2
Origin: us-west-2b-3
Origin: us-west-2b-4
Topology Key: topology.kubernetes.io/zone
Name: NetperfCosts
Topology List:
Origin List:
Origin: us-west-2
Topology Key: topology.kubernetes.io/region
Origin List:
Cost List:
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: us-west-2b-2
Network Cost: 0
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: us-west-2b-3
Network Cost: 0
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: us-west-2b-4
Network Cost: 0
Origin: us-west-2b-1
Cost List:
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: us-west-2b-1
Network Cost: 0
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: us-west-2b-3
Network Cost: 0
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: us-west-2b-4
Network Cost: 0
Origin: us-west-2b-2
Cost List:
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: us-west-2b-1
Network Cost: 0
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: us-west-2b-2
Network Cost: 0
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: us-west-2b-4
Network Cost: 0
Origin: us-west-2b-3
Cost List:
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: us-west-2b-1
Network Cost: 0
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: us-west-2b-2
Network Cost: 0
Bandwidth Allocated: 0
Bandwidth Capacity: 1G
Destination: us-west-2b-3
Network Cost: 0
Origin: us-west-2b-4
Topology Key: topology.kubernetes.io/zone
Status:
Node Count: 4
Weight Calculation Time: 2023-04-30T23:16:41Z
Events: <none>
For the appgroupcontroller, I am using image v1.0.3-alpha For the networkcontroller, I am using the debug image pulled from dockerhub
I0430 23:16:41.608153 1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
I0430 23:16:41.608197 1 networktopology.go:765] "NetworkTopology SyncHandler: Update costs in the network graph... "
I0430 23:16:41.608222 1 networktopology.go:782] "N1: %v - N2: %v - Region1: %v - Region2: %v - Zone1: %v - Zone2: %v" ip-10-0-20-110.us-west-2.compute.internal="ip-10-0-20-229.us-west-2.compute.internal" us-west-2="us-west-2" us-west-2b-2="us-west-2b-3"
I0430 23:16:41.608258 1 networktopology.go:787] "Key: %v" netperf.p90.latency.milliseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-20-229.us-west-2.compute.internal="(MISSING)"
I0430 23:16:41.608301 1 networktopology.go:788] "configmap.Data: %v" map[netperf_p90_latency_microseconds.origin.ip-10-0-16-170.us-west-2.compute.internal.destination.ip-10-0-20-110.us-west-2.compute.internal:108 netperf_p90_latency_microseconds.origin.ip-10-0-16-170.us-west-2.compute.internal.destination.ip-10-0-20-229.us-west-2.compute.internal:109 netperf_p90_latency_microseconds.origin.ip-10-0-16-170.us-west-2.compute.internal.destination.ip-10-0-26-106.us-west-2.compute.internal:116 netperf_p90_latency_microseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-16-170.us-west-2.compute.internal:108 netperf_p90_latency_microseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-20-229.us-west-2.compute.internal:81 netperf_p90_latency_microseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-26-106.us-west-2.compute.internal:260 netperf_p90_latency_microseconds.origin.ip-10-0-20-229.us-west-2.compute.internal.destination.ip-10-0-16-170.us-west-2.compute.internal:109 netperf_p90_latency_microseconds.origin.ip-10-0-20-229.us-west-2.compute.internal.destination.ip-10-0-20-110.us-west-2.compute.internal:78 netperf_p90_latency_microseconds.origin.ip-10-0-20-229.us-west-2.compute.internal.destination.ip-10-0-26-106.us-west-2.compute.internal:77 netperf_p90_latency_microseconds.origin.ip-10-0-26-106.us-west-2.compute.internal.destination.ip-10-0-16-170.us-west-2.compute.internal:106 netperf_p90_latency_microseconds.origin.ip-10-0-26-106.us-west-2.compute.internal.destination.ip-10-0-20-110.us-west-2.compute.internal:75 netperf_p90_latency_microseconds.origin.ip-10-0-26-106.us-west-2.compute.internal.destination.ip-10-0-20-229.us-west-2.compute.internal:92]="(MISSING)"
E0430 23:16:41.608322 1 networktopology.go:792] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
I0430 23:16:41.608334 1 networktopology.go:795] "Cost: %v" %!s(int=0)="(MISSING)"
I0430 23:16:41.608352 1 networktopology.go:782] "N1: %v - N2: %v - Region1: %v - Region2: %v - Zone1: %v - Zone2: %v" ip-10-0-20-110.us-west-2.compute.internal="ip-10-0-26-106.us-west-2.compute.internal" us-west-2="us-west-2" us-west-2b-2="us-west-2b-4"
I0430 23:16:41.608362 1 networktopology.go:787] "Key: %v" netperf.p90.latency.milliseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-26-106.us-west-2.compute.internal="(MISSING)"
I0430 23:16:41.608386 1 networktopology.go:788] "configmap.Data: %v" map[netperf_p90_latency_microseconds.origin.ip-10-0-16-170.us-west-2.compute.internal.destination.ip-10-0-20-110.us-west-2.compute.internal:108 netperf_p90_latency_microseconds.origin.ip-10-0-16-170.us-west-2.compute.internal.destination.ip-10-0-20-229.us-west-2.compute.internal:109 netperf_p90_latency_microseconds.origin.ip-10-0-16-170.us-west-2.compute.internal.destination.ip-10-0-26-106.us-west-2.compute.internal:116 netperf_p90_latency_microseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-16-170.us-west-2.compute.internal:108 netperf_p90_latency_microseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-20-229.us-west-2.compute.internal:81 netperf_p90_latency_microseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-26-106.us-west-2.compute.internal:260 netperf_p90_latency_microseconds.origin.ip-10-0-20-229.us-west-2.compute.internal.destination.ip-10-0-16-170.us-west-2.compute.internal:109 netperf_p90_latency_microseconds.origin.ip-10-0-20-229.us-west-2.compute.internal.destination.ip-10-0-20-110.us-west-2.compute.internal:78 netperf_p90_latency_microseconds.origin.ip-10-0-20-229.us-west-2.compute.internal.destination.ip-10-0-26-106.us-west-2.compute.internal:77 netperf_p90_latency_microseconds.origin.ip-10-0-26-106.us-west-2.compute.internal.destination.ip-10-0-16-170.us-west-2.compute.internal:106 netperf_p90_latency_microseconds.origin.ip-10-0-26-106.us-west-2.compute.internal.destination.ip-10-0-20-110.us-west-2.compute.internal:75 netperf_p90_latency_microseconds.origin.ip-10-0-26-106.us-west-2.compute.internal.destination.ip-10-0-20-229.us-west-2.compute.internal:92]="(MISSING)"
E0430 23:16:41.608404 1 networktopology.go:792] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
However, it looks like the network's metrics has been recorded successfully.
ubuntu@ip-172-31-6-60:~/networktopology-api/manifests$ kubectl describe configmap netperf-metrics
Name: netperf-metrics
Namespace: default
Labels: <none>
Annotations: <none>
Data
====
netperf_p90_latency_microseconds.origin.ip-10-0-16-170.us-west-2.compute.internal.destination.ip-10-0-20-229.us-west-2.compute.internal:
----
109
netperf_p90_latency_microseconds.origin.ip-10-0-16-170.us-west-2.compute.internal.destination.ip-10-0-26-106.us-west-2.compute.internal:
----
116
netperf_p90_latency_microseconds.origin.ip-10-0-20-229.us-west-2.compute.internal.destination.ip-10-0-16-170.us-west-2.compute.internal:
----
109
netperf_p90_latency_microseconds.origin.ip-10-0-20-229.us-west-2.compute.internal.destination.ip-10-0-26-106.us-west-2.compute.internal:
----
77
netperf_p90_latency_microseconds.origin.ip-10-0-26-106.us-west-2.compute.internal.destination.ip-10-0-16-170.us-west-2.compute.internal:
----
106
netperf_p90_latency_microseconds.origin.ip-10-0-26-106.us-west-2.compute.internal.destination.ip-10-0-20-229.us-west-2.compute.internal:
----
92
netperf_p90_latency_microseconds.origin.ip-10-0-16-170.us-west-2.compute.internal.destination.ip-10-0-20-110.us-west-2.compute.internal:
----
108
netperf_p90_latency_microseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-16-170.us-west-2.compute.internal:
----
108
netperf_p90_latency_microseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-20-229.us-west-2.compute.internal:
----
81
netperf_p90_latency_microseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-26-106.us-west-2.compute.internal:
----
260
netperf_p90_latency_microseconds.origin.ip-10-0-20-229.us-west-2.compute.internal.destination.ip-10-0-20-110.us-west-2.compute.internal:
----
78
netperf_p90_latency_microseconds.origin.ip-10-0-26-106.us-west-2.compute.internal.destination.ip-10-0-20-110.us-west-2.compute.internal:
----
75
BinaryData
====
Events: <none>
Here is the crd that I am using:
ubuntu@ip-172-31-6-60:~/networktopology-api/manifests$ cat example.yaml
# Example Network CRD
apiVersion: networktopology.diktyo.x-k8s.io/v1alpha1
kind: NetworkTopology
metadata:
name: net-topology-test
namespace: default
spec:
configmapName: "netperf-metrics"
weights:
# Region label: "topology.kubernetes.io/region"
# Zone Label: "topology.kubernetes.io/zone"
# 2 Regions: us-west-1
# us-east-1
# 4 Zones: us-west-1: z1, z2
# us-east-1: z3, z4
- name: "UserDefined"
topologyList: # Define weights between regions or between zones
- topologyKey: "topology.kubernetes.io/region" # region costs
originList:
- origin: "us-west-2"
costList:
- topologyKey: "topology.kubernetes.io/zone" # zone costs
originList:
- origin: "us-west-2b-1"
costList:
- origin: "us-west-2b-2"
costList:
- origin: "us-west-2b-3"
costList:
- origin: "us-west-2b-4"
costList:
If I am using the example crd like this, I will get the following errors
ubuntu@ip-172-31-6-60:~/test_deploy$ kubectl logs -f networktopology-controller-56948b9547-gxrwv -n network-aware-controllers
W0430 23:09:34.020217 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0430 23:09:34.020964 1 networktopology.go:147] "Starting Network Topology controller"
I0430 23:09:34.121500 1 networktopology.go:155] "Network Topology sync finished"
E0430 23:14:41.533989 1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.x-k8s.io \"a1\" not found"
E0430 23:14:41.534228 1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.x-k8s.io \"a1\" not found"
E0430 23:14:41.541700 1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.x-k8s.io \"a1\" not found"
Hi @dyyfk,
I am sharing my deployment files below. You can definitely use the networkTopology CR without the netperfComponent. I guess you might be missing at least a label in pod deployment files or any specific rbac rule, because the scheduler does not recognize the pods.
na.yaml:
apiVersion: v1
kind: Namespace
metadata:
name: network-aware-controllers
appgroup-controller.yaml:
apiVersion: v1
kind: ServiceAccount
metadata:
name: appgroup-controller
namespace: network-aware-controllers
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: appgroup-controller
rules:
- apiGroups: [""]
resources: ["pods", "nodes", "configmaps"]
verbs: ["get", "list", "watch"]
- apiGroups: ["appgroup.diktyo.x-k8s.io"]
resources: ["appgroups"]
verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: appgroup-controller
subjects:
- kind: ServiceAccount
name: appgroup-controller
namespace: network-aware-controllers
roleRef:
kind: ClusterRole
name: appgroup-controller
apiGroup: rbac.authorization.k8s.io
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: appgroup-controller
namespace: network-aware-controllers
labels:
app: appgroup-controller
spec:
replicas: 1
selector:
matchLabels:
app: appgroup-controller
template:
metadata:
labels:
app: appgroup-controller
spec:
serviceAccountName: appgroup-controller
containers:
- name: appgroup-controller
image: jpedro1992/appgroup-controller:v1.0.3-alpha
command:
- /bin/controller
imagePullPolicy: IfNotPresent
networktopology-controller.yaml:
apiVersion: v1
kind: ServiceAccount
metadata:
name: networktopology-controller
namespace: network-aware-controllers
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: networktopology-controller
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch", "patch"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
- apiGroups: ["appgroup.diktyo.x-k8s.io"]
resources: ["appgroups"]
verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
- apiGroups: ["networktopology.diktyo.x-k8s.io"]
resources: ["networktopologies"]
verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: networktopology-controller
subjects:
- kind: ServiceAccount
name: networktopology-controller
namespace: network-aware-controllers
roleRef:
kind: ClusterRole
name: networktopology-controller
apiGroup: rbac.authorization.k8s.io
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: networktopology-controller
namespace: network-aware-controllers
labels:
app: networktopology-controller
spec:
replicas: 1
selector:
matchLabels:
app: networktopology-controller
template:
metadata:
labels:
app: networktopology-controller
spec:
serviceAccountName: networktopology-controller
containers:
- name: networktopology-controller
image: jpedro1992/networktopology-controller:v1.0.3-alpha
command:
- /bin/controller
imagePullPolicy: Always #IfNotPresent
sched-cc.yaml:
apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: KubeSchedulerConfiguration
leaderElection:
leaderElect: false
clientConnection:
kubeconfig: "/etc/kubernetes/scheduler.conf"
profiles:
- schedulerName: network-aware-scheduler
plugins:
queueSort:
enabled:
- name: TopologicalSort
disabled:
- name: "*"
preFilter:
enabled:
- name: NetworkOverhead
filter:
enabled:
- name: NetworkOverhead
score:
disabled: # Preferably avoid the combination of NodeResourcesFit with NetworkOverhead
- name: NodeResourcesFit
enabled: # A higher weight is given to NetworkOverhead to favor allocation schemes with lower latency.
- name: NetworkOverhead
weight: 5
# - name: BalancedAllocation
# weight: 1
pluginConfig:
- name: TopologicalSort
args:
namespaces:
- "default"
- name: NetworkOverhead
args:
namespaces:
- "default"
weightsName: "NetperfCosts" # or Dijkstra
networkTopologyName: "net-topology-test"
deployment scheduler and controller (sig-scheduling)
deploy.yaml:
# First part
# Apply extra privileges to system:kube-scheduler.
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: system:kube-scheduler:plugins
rules:
- apiGroups: ["scheduling.sigs.x-k8s.io"]
resources: ["podgroups", "elasticquotas", "podgroups/status", "elasticquotas/status"]
verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
- apiGroups: ["appgroup.diktyo.x-k8s.io"]
resources: ["appgroups"]
verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
- apiGroups: ["networktopology.diktyo.x-k8s.io"]
resources: ["networktopologies"]
verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: system:kube-scheduler:plugins
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:kube-scheduler:plugins
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: system:kube-scheduler
---
# Second part
# Install the controller image.
apiVersion: v1
kind: Namespace
metadata:
name: scheduler-plugins
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: scheduler-plugins-controller
namespace: scheduler-plugins
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: scheduler-plugins-controller
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
- apiGroups: ["scheduling.x-k8s.io"]
resources: ["podgroups", "elasticquotas", "podgroups/status", "elasticquotas/status"]
verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "patch", "update"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: scheduler-plugins-controller
subjects:
- kind: ServiceAccount
name: scheduler-plugins-controller
namespace: scheduler-plugins
roleRef:
kind: ClusterRole
name: scheduler-plugins-controller
apiGroup: rbac.authorization.k8s.io
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: scheduler-plugins-controller
namespace: scheduler-plugins
labels:
app: scheduler-plugins-controller
spec:
replicas: 1
selector:
matchLabels:
app: scheduler-plugins-controller
template:
metadata:
labels:
app: scheduler-plugins-controller
spec:
serviceAccountName: scheduler-plugins-controller
containers:
- name: scheduler-plugins-controller
image: registry.k8s.io/scheduler-plugins/controller:v0.25.7
imagePullPolicy: IfNotPresent
---
# Install the scheduler
apiVersion: apps/v1
kind: Deployment
metadata:
name: scheduler-plugins-scheduler
namespace: scheduler-plugins
spec:
replicas: 1
selector:
matchLabels:
component: scheduler
tier: control-plane
template:
metadata:
labels:
component: scheduler
tier: control-plane
spec:
nodeSelector:
kubernetes.io/hostname: "master.n1" # Modify to your master node name! Unless you populate all nodes with sched-cc.yaml
containers:
- image: registry.k8s.io/scheduler-plugins/kube-scheduler:v0.25.7
#imagePullPolicy: Never
command:
- /bin/kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --config=/etc/kubernetes/sched-cc.yaml
- -v=9
name: scheduler-plugins
securityContext:
privileged: true
volumeMounts:
- mountPath: /etc/kubernetes
name: etckubernetes
hostNetwork: false
hostPID: false
volumes:
- hostPath:
path: /etc/kubernetes/
type: Directory
name: etckubernetes
Hi, @jpedro1992 Thanks for sharing your yaml file. I was able to get the scheduler running successfully on my minikube.
Although the scheduler is running and scheduling nodes, it keeps telling that the final costs of all nodes are all 0s. It seems like the dependency of any pod is also not respected as all pods have this line: "Number of dependencies" satisfied=0 violated=0". This is not the case as pods should have a dependency.
Scheduler logs:
I0504 02:30:07.766408 1 round_trippers.go:466] curl -v -XGET -H "Accept: application/vnd.kubernetes.protobuf, */*" -H "User-Agent: kube-scheduler/v0.0.0 (linux/amd64) kubernetes/$Format/scheduler" 'https://192.168.49.2:8443/apis/storage.k8s.io/v1/csidrivers?allowWatchBookmarks=true&resourceVersion=6810&timeout=5m32s&timeoutSeconds=332&watch=true'
I0504 02:30:07.774639 1 round_trippers.go:553] GET https://192.168.49.2:8443/apis/storage.k8s.io/v1/csidrivers?allowWatchBookmarks=true&resourceVersion=6810&timeout=5m32s&timeoutSeconds=332&watch=true 200 OK in 1 milliseconds
I0504 02:30:07.774668 1 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 0 ms Duration 1 ms
I0504 02:30:07.774678 1 round_trippers.go:577] Response Headers:
I0504 02:30:07.774691 1 round_trippers.go:580] Audit-Id: 3a0d8bcd-d0bb-4ec2-8256-0a78703ed19c
I0504 02:30:07.774700 1 round_trippers.go:580] Cache-Control: no-cache, private
I0504 02:30:07.774708 1 round_trippers.go:580] Content-Type: application/vnd.kubernetes.protobuf;stream=watch
I0504 02:30:07.774716 1 round_trippers.go:580] X-Kubernetes-Pf-Flowschema-Uid: 46c850ec-633b-4590-974a-44dff367d480
I0504 02:30:07.774724 1 round_trippers.go:580] X-Kubernetes-Pf-Prioritylevel-Uid: 78634a56-cf2f-4491-ab29-d3a44816f613
I0504 02:30:07.774733 1 round_trippers.go:580] Date: Thu, 04 May 2023 02:30:07 GMT
I0504 02:30:15.901697 1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p2-76867498b7-ffg96"
I0504 02:30:15.901889 1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p1-59bcb65f56-cr59l"
I0504 02:30:15.927373 1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p3-5f458cf755-wswjn"
I0504 02:30:16.138097 1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p1-59bcb65f56-cr59l"
I0504 02:30:16.176018 1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p2-76867498b7-ffg96"
I0504 02:30:16.186971 1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p3-5f458cf755-wswjn"
I0504 02:30:16.870253 1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p3-5f458cf755-wswjn"
I0504 02:30:16.878755 1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p2-76867498b7-ffg96"
I0504 02:30:16.887701 1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p2-76867498b7-ffg96"
I0504 02:30:16.901057 1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p3-5f458cf755-wswjn"
I0504 02:30:16.916165 1 eventhandlers.go:229] "Delete event for scheduled pod" pod="default/p3-5f458cf755-wswjn"
I0504 02:30:16.916620 1 eventhandlers.go:229] "Delete event for scheduled pod" pod="default/p2-76867498b7-ffg96"
I0504 02:30:16.940136 1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p1-59bcb65f56-cr59l"
I0504 02:30:16.952460 1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p1-59bcb65f56-cr59l"
I0504 02:30:16.957156 1 eventhandlers.go:229] "Delete event for scheduled pod" pod="default/p1-59bcb65f56-cr59l"
I0504 02:30:18.403144 1 eventhandlers.go:116] "Add event for unscheduled pod" pod="default/p1-59bcb65f56-4rrfc"
I0504 02:30:18.403217 1 scheduling_queue.go:957] "About to try and schedule pod" pod="default/p1-59bcb65f56-4rrfc"
I0504 02:30:18.403229 1 schedule_one.go:85] "Attempting to schedule pod" pod="default/p1-59bcb65f56-4rrfc"
I0504 02:30:18.403311 1 networkoverhead.go:612] "namespaces: %s" [default]="(MISSING)"
I0504 02:30:18.403339 1 networkoverhead.go:614] "appGroup CR" namespace="default" ag.lister=&{indexer:0xc000654cd8}
I0504 02:30:18.403348 1 networkoverhead.go:629] "namespaces: %s" [default]="(MISSING)"
I0504 02:30:18.403359 1 networkoverhead.go:631] "networkTopology CR:" namespace="default" nt.lister=&{indexer:0xc000655620}
I0504 02:30:18.403413 1 networkoverhead.go:242] "Node info" name="minikube" region="us-west-1" zone="z1"
I0504 02:30:18.403613 1 networkoverhead.go:249] "Map" costMap=map[{Origin:us-west-1 Destination:us-east-1}:200 {Origin:z1 Destination:z2}:10]
I0504 02:30:18.403783 1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=0
I0504 02:30:18.403832 1 networkoverhead.go:270] "Node final cost" cost=0
I0504 02:30:18.403904 1 networkoverhead.go:242] "Node info" name="minikube-m02" region="us-west-1" zone="z2"
I0504 02:30:18.403954 1 networkoverhead.go:249] "Map" costMap=map[{Origin:us-west-1 Destination:us-east-1}:200 {Origin:z2 Destination:z1}:20]
I0504 02:30:18.403975 1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=0
I0504 02:30:18.404008 1 networkoverhead.go:270] "Node final cost" cost=0
I0504 02:30:18.404022 1 networkoverhead.go:242] "Node info" name="minikube-m03" region="us-east-1" zone="z3"
I0504 02:30:18.404037 1 networkoverhead.go:249] "Map" costMap=map[{Origin:z3 Destination:z4}:30]
I0504 02:30:18.404050 1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=0
I0504 02:30:18.404058 1 networkoverhead.go:270] "Node final cost" cost=0
I0504 02:30:18.404069 1 networkoverhead.go:242] "Node info" name="minikube-m04" region="us-east-1" zone="z4"
I0504 02:30:18.404081 1 networkoverhead.go:249] "Map" costMap=map[{Origin:z4 Destination:z3}:40]
I0504 02:30:18.404097 1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=0
I0504 02:30:18.404107 1 networkoverhead.go:270] "Node final cost" cost=0
I0504 02:30:18.404192 1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=0
I0504 02:30:18.404209 1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=0
I0504 02:30:18.404224 1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=0
I0504 02:30:18.404229 1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=0
I0504 02:30:18.404399 1 networkoverhead.go:359] "Score:" pod="p1-59bcb65f56-4rrfc" node="minikube" finalScore=0
I0504 02:30:18.404527 1 networkoverhead.go:359] "Score:" pod="p1-59bcb65f56-4rrfc" node="minikube-m02" finalScore=0
I0504 02:30:18.404550 1 networkoverhead.go:359] "Score:" pod="p1-59bcb65f56-4rrfc" node="minikube-m03" finalScore=0
I0504 02:30:18.404597 1 networkoverhead.go:359] "Score:" pod="p1-59bcb65f56-4rrfc" node="minikube-m04" finalScore=0
I0504 02:30:18.404739 1 networkoverhead.go:365] "before normalization: " scores=[{Name:minikube Score:0} {Name:minikube-m02 Score:0} {Name:minikube-m03 Score:0} {Name:minikube-m04 Score:0}]
I0504 02:30:18.404899 1 default_binder.go:52] "Attempting to bind pod to node" pod="default/p1-59bcb65f56-4rrfc" node="minikube-m04"
I0504 02:30:18.404974 1 request.go:1170] Request Body:
00000000 6b 38 73 00 0a 0d 0a 02 76 31 12 07 42 69 6e 64 |k8s.....v1..Bind|
00000010 69 6e 67 12 70 0a 4e 0a 13 70 31 2d 35 39 62 63 |ing.p.N..p1-59bc|
00000020 62 36 35 66 35 36 2d 34 72 72 66 63 12 00 1a 07 |b65f56-4rrfc....|
00000030 64 65 66 61 75 6c 74 22 00 2a 24 63 65 36 33 35 |default".*$ce635|
00000040 62 32 31 2d 63 66 62 35 2d 34 38 36 39 2d 61 38 |b21-cfb5-4869-a8|
00000050 35 39 2d 38 35 30 34 62 37 30 65 31 30 64 65 32 |59-8504b70e10de2|
00000060 00 38 00 42 00 12 1e 0a 04 4e 6f 64 65 12 00 1a |.8.B.....Node...|
00000070 0c 6d 69 6e 69 6b 75 62 65 2d 6d 30 34 22 00 2a |.minikube-m04".*|
00000080 00 32 00 3a 00 1a 00 22 00 |.2.:...".|
I0504 02:30:18.405059 1 round_trippers.go:466] curl -v -XPOST -H "Accept: application/vnd.kubernetes.protobuf, */*" -H "Content-Type: application/vnd.kubernetes.protobuf" -H "User-Agent: kube-scheduler/v0.0.0 (linux/amd64) kubernetes/$Format/scheduler" 'https://192.168.49.2:8443/api/v1/namespaces/default/pods/p1-59bcb65f56-4rrfc/binding'
I0504 02:30:18.412695 1 eventhandlers.go:116] "Add event for unscheduled pod" pod="default/p2-76867498b7-6lh62"
I0504 02:30:18.412787 1 scheduling_queue.go:957] "About to try and schedule pod" pod="default/p2-76867498b7-6lh62"
I0504 02:30:18.412812 1 schedule_one.go:85] "Attempting to schedule pod" pod="default/p2-76867498b7-6lh62"
I0504 02:30:18.412883 1 networkoverhead.go:612] "namespaces: %s" [default]="(MISSING)"
I0504 02:30:18.412907 1 networkoverhead.go:614] "appGroup CR" namespace="default" ag.lister=&{indexer:0xc000654cd8}
I0504 02:30:18.412918 1 networkoverhead.go:629] "namespaces: %s" [default]="(MISSING)"
I0504 02:30:18.412928 1 networkoverhead.go:631] "networkTopology CR:" namespace="default" nt.lister=&{indexer:0xc000655620}
I0504 02:30:18.412966 1 networkoverhead.go:242] "Node info" name="minikube" region="us-west-1" zone="z1"
I0504 02:30:18.413328 1 networkoverhead.go:249] "Map" costMap=map[{Origin:us-west-1 Destination:us-east-1}:200 {Origin:z1 Destination:z2}:10]
I0504 02:30:18.413452 1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=0
I0504 02:30:18.413496 1 networkoverhead.go:270] "Node final cost" cost=0
I0504 02:30:18.413583 1 networkoverhead.go:242] "Node info" name="minikube-m02" region="us-west-1" zone="z2"
I0504 02:30:18.413715 1 networkoverhead.go:249] "Map" costMap=map[{Origin:us-west-1 Destination:us-east-1}:200 {Origin:z2 Destination:z1}:20]
I0504 02:30:18.413810 1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=0
I0504 02:30:18.413869 1 networkoverhead.go:270] "Node final cost" cost=0
I0504 02:30:18.413949 1 networkoverhead.go:242] "Node info" name="minikube-m03" region="us-east-1" zone="z3"
I0504 02:30:18.414011 1 networkoverhead.go:249] "Map" costMap=map[{Origin:z3 Destination:z4}:30]
I0504 02:30:18.414093 1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=0
I0504 02:30:18.414148 1 networkoverhead.go:270] "Node final cost" cost=0
I0504 02:30:18.414866 1 networkoverhead.go:242] "Node info" name="minikube-m04" region="us-east-1" zone="z4"
I0504 02:30:18.415103 1 networkoverhead.go:249] "Map" costMap=map[{Origin:z4 Destination:z3}:40]
I0504 02:30:18.415328 1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=0
I0504 02:30:18.415500 1 networkoverhead.go:270] "Node final cost" cost=0
I0504 02:30:18.416662 1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=0
I0504 02:30:18.416843 1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=0
I0504 02:30:18.417015 1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=0
I0504 02:30:18.417182 1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=0
I0504 02:30:18.417541 1 networkoverhead.go:359] "Score:" pod="p2-76867498b7-6lh62" node="minikube" finalScore=0
I0504 02:30:18.417843 1 networkoverhead.go:359] "Score:" pod="p2-76867498b7-6lh62" node="minikube-m03" finalScore=0
I0504 02:30:18.418091 1 networkoverhead.go:359] "Score:" pod="p2-76867498b7-6lh62" node="minikube-m04" finalScore=0
I0504 02:30:18.418139 1 networkoverhead.go:359] "Score:" pod="p2-76867498b7-6lh62" node="minikube-m02" finalScore=0
I0504 02:30:18.418678 1 networkoverhead.go:365] "before normalization: " scores=[{Name:minikube Score:0} {Name:minikube-m02 Score:0} {Name:minikube-m03 Score:0} {Name:minikube-m04 Score:0}]
I0504 02:30:18.419126 1 default_binder.go:52] "Attempting to bind pod to node" pod="default/p2-76867498b7-6lh62" node="minikube-m02"
I0504 02:30:18.419405 1 request.go:1170] Request Body:
00000000 6b 38 73 00 0a 0d 0a 02 76 31 12 07 42 69 6e 64 |k8s.....v1..Bind|
00000010 69 6e 67 12 70 0a 4e 0a 13 70 32 2d 37 36 38 36 |ing.p.N..p2-7686|
00000020 37 34 39 38 62 37 2d 36 6c 68 36 32 12 00 1a 07 |7498b7-6lh62....|
00000030 64 65 66 61 75 6c 74 22 00 2a 24 65 32 37 36 33 |default".*$e2763|
00000040 62 37 35 2d 64 37 39 35 2d 34 32 39 63 2d 61 34 |b75-d795-429c-a4|
00000050 33 32 2d 37 39 65 61 37 36 64 62 31 34 61 33 32 |32-79ea76db14a32|
00000060 00 38 00 42 00 12 1e 0a 04 4e 6f 64 65 12 00 1a |.8.B.....Node...|
00000070 0c 6d 69 6e 69 6b 75 62 65 2d 6d 30 32 22 00 2a |.minikube-m02".*|
00000080 00 32 00 3a 00 1a 00 22 00 |.2.:...".|
I0504 02:30:18.419707 1 round_trippers.go:466] curl -v -XPOST -H "Accept: application/vnd.kubernetes.protobuf, */*" -H "Content-Type: application/vnd.kubernetes.protobuf" -H "User-Agent: kube-scheduler/v0.0.0 (linux/amd64) kubernetes/$Format/scheduler" 'https://192.168.49.2:8443/api/v1/namespaces/default/pods/p2-76867498b7-6lh62/binding'
I0504 02:30:18.427998 1 eventhandlers.go:159] "Delete event for unscheduled pod" pod="default/p1-59bcb65f56-4rrfc"
I0504 02:30:18.428140 1 eventhandlers.go:184] "Add event for scheduled pod" pod="default/p1-59bcb65f56-4rrfc"
I0504 02:30:18.428881 1 round_trippers.go:553] POST https://192.168.49.2:8443/api/v1/namespaces/default/pods/p1-59bcb65f56-4rrfc/binding 201 Created in 19 milliseconds
I0504 02:30:18.428916 1 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 19 ms Duration 19 ms
I0504 02:30:18.428926 1 round_trippers.go:577] Response Headers:
I0504 02:30:18.428935 1 round_trippers.go:580] Audit-Id: b63b0746-f514-4ab7-92db-f580656d1bc2
I0504 02:30:18.428944 1 round_trippers.go:580] Cache-Control: no-cache, private
I0504 02:30:18.428953 1 round_trippers.go:580] Content-Type: application/vnd.kubernetes.protobuf
I0504 02:30:18.428961 1 round_trippers.go:580] X-Kubernetes-Pf-Flowschema-Uid: 46c850ec-633b-4590-974a-44dff367d480
I0504 02:30:18.428969 1 round_trippers.go:580] X-Kubernetes-Pf-Prioritylevel-Uid: 78634a56-cf2f-4491-ab29-d3a44816f613
I0504 02:30:18.428978 1 round_trippers.go:580] Content-Length: 48
I0504 02:30:18.428990 1 round_trippers.go:580] Date: Thu, 04 May 2023 02:30:18 GMT
I0504 02:30:18.429033 1 request.go:1170] Response Body:
00000000 6b 38 73 00 0a 0c 0a 02 76 31 12 06 53 74 61 74 |k8s.....v1..Stat|
00000010 75 73 12 18 0a 06 0a 00 12 00 1a 00 12 07 53 75 |us............Su|
00000020 63 63 65 73 73 1a 00 22 00 30 c9 01 1a 00 22 00 |ccess..".0....".|
I0504 02:30:18.429553 1 round_trippers.go:553] POST https://192.168.49.2:8443/api/v1/namespaces/default/pods/p2-76867498b7-6lh62/binding 201 Created in 9 milliseconds
I0504 02:30:18.433101 1 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 9 ms Duration 9 ms
I0504 02:30:18.433514 1 round_trippers.go:577] Response Headers:
I0504 02:30:18.433758 1 round_trippers.go:580] Content-Type: application/vnd.kubernetes.protobuf
I0504 02:30:18.433819 1 round_trippers.go:580] X-Kubernetes-Pf-Flowschema-Uid: 46c850ec-633b-4590-974a-44dff367d480
I0504 02:30:18.433898 1 round_trippers.go:580] X-Kubernetes-Pf-Prioritylevel-Uid: 78634a56-cf2f-4491-ab29-d3a44816f613
I0504 02:30:18.433939 1 round_trippers.go:580] Content-Length: 48
I0504 02:30:18.434000 1 round_trippers.go:580] Date: Thu, 04 May 2023 02:30:18 GMT
I0504 02:30:18.434075 1 round_trippers.go:580] Audit-Id: 726d26ba-6875-49c2-86b6-f46d3a10c5bd
I0504 02:30:18.434163 1 round_trippers.go:580] Cache-Control: no-cache, private
I0504 02:30:18.434264 1 request.go:1170] Response Body:
00000000 6b 38 73 00 0a 0c 0a 02 76 31 12 06 53 74 61 74 |k8s.....v1..Stat|
00000010 75 73 12 18 0a 06 0a 00 12 00 1a 00 12 07 53 75 |us............Su|
00000020 63 63 65 73 73 1a 00 22 00 30 c9 01 1a 00 22 00 |ccess..".0....".|
I0504 02:30:18.434418 1 cache.go:402] "Finished binding for pod, can be expired" pod="default/p2-76867498b7-6lh62"
I0504 02:30:18.434576 1 schedule_one.go:266] "Successfully bound pod to node" pod="default/p2-76867498b7-6lh62" node="minikube-m02" evaluatedNodes=4 feasibleNodes=4
I0504 02:30:18.434831 1 request.go:1170] Request Body:
00000000 6b 38 73 00 0a 19 0a 10 65 76 65 6e 74 73 2e 6b |k8s.....events.k|
00000010 38 73 2e 69 6f 2f 76 31 12 05 45 76 65 6e 74 12 |8s.io/v1..Event.|
00000020 ec 02 0a 3b 0a 24 70 32 2d 37 36 38 36 37 34 39 |...;.$p2-7686749|
00000030 38 62 37 2d 36 6c 68 36 32 2e 31 37 35 62 63 66 |8b7-6lh62.175bcf|
00000040 64 39 36 39 34 64 38 35 63 37 12 00 1a 07 64 65 |d9694d85c7....de|
00000050 66 61 75 6c 74 22 00 2a 00 32 00 38 00 42 00 12 |fault".*.2.8.B..|
00000060 0c 08 ba b1 cc a2 06 10 98 b3 9a cf 01 22 17 6e |.............".n|
00000070 65 74 77 6f 72 6b 2d 61 77 61 72 65 2d 73 63 68 |etwork-aware-sch|
00000080 65 64 75 6c 65 72 2a 44 6e 65 74 77 6f 72 6b 2d |eduler*Dnetwork-|
00000090 61 77 61 72 65 2d 73 63 68 65 64 75 6c 65 72 2d |aware-scheduler-|
000000a0 73 63 68 65 64 75 6c 65 72 2d 70 6c 75 67 69 6e |scheduler-plugin|
000000b0 73 2d 73 63 68 65 64 75 6c 65 72 2d 36 37 64 34 |s-scheduler-67d4|
000000c0 38 39 63 35 63 38 2d 70 72 32 77 66 32 07 42 69 |89c5c8-pr2wf2.Bi|
000000d0 6e 64 69 6e 67 3a 09 53 63 68 65 64 75 6c 65 64 |nding:.Scheduled|
000000e0 42 55 0a 03 50 6f 64 12 07 64 65 66 61 75 6c 74 |BU..Pod..default|
000000f0 1a 13 70 32 2d 37 36 38 36 37 34 39 38 62 37 2d |..p2-76867498b7-|
00000100 36 6c 68 36 32 22 24 65 32 37 36 33 62 37 35 2d |6lh62"$e2763b75-|
00000110 64 37 39 35 2d 34 32 39 63 2d 61 34 33 32 2d 37 |d795-429c-a432-7|
00000120 39 65 61 37 36 64 62 31 34 61 33 2a 02 76 31 32 |9ea76db14a3*.v12|
00000130 04 36 38 36 38 3a 00 52 41 53 75 63 63 65 73 73 |.6868:.RASuccess|
00000140 66 75 6c 6c 79 20 61 73 73 69 67 6e 65 64 20 64 |fully assigned d|
00000150 65 66 61 75 6c 74 2f 70 32 2d 37 36 38 36 37 34 |efault/p2-768674|
00000160 39 38 62 37 2d 36 6c 68 36 32 20 74 6f 20 6d 69 |98b7-6lh62 to mi|
00000170 6e 69 6b 75 62 65 2d 6d 30 32 5a 06 4e 6f 72 6d |nikube-m02Z.Norm|
00000180 61 6c 62 04 0a 00 12 00 6a 00 72 00 78 00 1a 00 |alb.....j.r.x...|
00000190 22 00 |".|
I0504 02:30:18.435001 1 round_trippers.go:466] curl -v -XPOST -H "Accept: application/vnd.kubernetes.protobuf, */*" -H "Content-Type: application/vnd.kubernetes.protobuf" -H "User-Agent: kube-scheduler/v0.0.0 (linux/amd64) kubernetes/$Format/scheduler" 'https://192.168.49.2:8443/apis/events.k8s.io/v1/namespaces/default/events'
I0504 02:30:18.429087 1 cache.go:402] "Finished binding for pod, can be expired" pod="default/p1-59bcb65f56-4rrfc"
I0504 02:30:18.447976 1 schedule_one.go:266] "Successfully bound pod to node" pod="default/p1-59bcb65f56-4rrfc" node="minikube-m04" evaluatedNodes=4 feasibleNodes=4
I0504 02:30:18.448278 1 request.go:1170] Request Body:
00000000 6b 38 73 00 0a 19 0a 10 65 76 65 6e 74 73 2e 6b |k8s.....events.k|
00000010 38 73 2e 69 6f 2f 76 31 12 05 45 76 65 6e 74 12 |8s.io/v1..Event.|
00000020 ec 02 0a 3b 0a 24 70 31 2d 35 39 62 63 62 36 35 |...;.$p1-59bcb65|
00000030 66 35 36 2d 34 72 72 66 63 2e 31 37 35 62 63 66 |f56-4rrfc.175bcf|
00000040 64 39 36 61 31 36 32 64 34 32 12 00 1a 07 64 65 |d96a162d42....de|
00000050 66 61 75 6c 74 22 00 2a 00 32 00 38 00 42 00 12 |fault".*.2.8.B..|
00000060 0c 08 ba b1 cc a2 06 10 a8 e2 bc d5 01 22 17 6e |.............".n|
00000070 65 74 77 6f 72 6b 2d 61 77 61 72 65 2d 73 63 68 |etwork-aware-sch|
00000080 65 64 75 6c 65 72 2a 44 6e 65 74 77 6f 72 6b 2d |eduler*Dnetwork-|
00000090 61 77 61 72 65 2d 73 63 68 65 64 75 6c 65 72 2d |aware-scheduler-|
000000a0 73 63 68 65 64 75 6c 65 72 2d 70 6c 75 67 69 6e |scheduler-plugin|
000000b0 73 2d 73 63 68 65 64 75 6c 65 72 2d 36 37 64 34 |s-scheduler-67d4|
000000c0 38 39 63 35 63 38 2d 70 72 32 77 66 32 07 42 69 |89c5c8-pr2wf2.Bi|
000000d0 6e 64 69 6e 67 3a 09 53 63 68 65 64 75 6c 65 64 |nding:.Scheduled|
000000e0 42 55 0a 03 50 6f 64 12 07 64 65 66 61 75 6c 74 |BU..Pod..default|
000000f0 1a 13 70 31 2d 35 39 62 63 62 36 35 66 35 36 2d |..p1-59bcb65f56-|
00000100 34 72 72 66 63 22 24 63 65 36 33 35 62 32 31 2d |4rrfc"$ce635b21-|
00000110 63 66 62 35 2d 34 38 36 39 2d 61 38 35 39 2d 38 |cfb5-4869-a859-8|
00000120 35 30 34 62 37 30 65 31 30 64 65 2a 02 76 31 32 |504b70e10de*.v12|
00000130 04 36 38 36 36 3a 00 52 41 53 75 63 63 65 73 73 |.6866:.RASuccess|
00000140 66 75 6c 6c 79 20 61 73 73 69 67 6e 65 64 20 64 |fully assigned d|
00000150 65 66 61 75 6c 74 2f 70 31 2d 35 39 62 63 62 36 |efault/p1-59bcb6|
00000160 35 66 35 36 2d 34 72 72 66 63 20 74 6f 20 6d 69 |5f56-4rrfc to mi|
00000170 6e 69 6b 75 62 65 2d 6d 30 34 5a 06 4e 6f 72 6d |nikube-m04Z.Norm|
00000180 61 6c 62 04 0a 00 12 00 6a 00 72 00 78 00 1a 00 |alb.....j.r.x...|
00000190 22 00 |".|
I0504 02:30:18.448518 1 round_trippers.go:466] curl -v -XPOST -H "Content-Type: application/vnd.kubernetes.protobuf" -H "Accept: application/vnd.kubernetes.protobuf, */*" -H "User-Agent: kube-scheduler/v0.0.0 (linux/amd64) kubernetes/$Format/scheduler" 'https://192.168.49.2:8443/apis/events.k8s.io/v1/namespaces/default/events'
I0504 02:30:18.449015 1 eventhandlers.go:184] "Add event for scheduled pod" pod="default/p2-76867498b7-6lh62"
I0504 02:30:18.449625 1 eventhandlers.go:159] "Delete event for unscheduled pod" pod="default/p2-76867498b7-6lh62"
I0504 02:30:18.472008 1 round_trippers.go:553] POST https://192.168.49.2:8443/apis/events.k8s.io/v1/namespaces/default/events 201 Created in 35 milliseconds
I0504 02:30:18.472051 1 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 32 ms Duration 35 ms
I0504 02:30:18.472061 1 round_trippers.go:577] Response Headers:
I0504 02:30:18.472071 1 round_trippers.go:580] X-Kubernetes-Pf-Prioritylevel-Uid: 78634a56-cf2f-4491-ab29-d3a44816f613
I0504 02:30:18.472080 1 round_trippers.go:580] Content-Length: 664
I0504 02:30:18.472649 1 round_trippers.go:580] Date: Thu, 04 May 2023 02:30:18 GMT
I0504 02:30:18.473389 1 round_trippers.go:580] Audit-Id: 9f3b48e8-4be0-489c-af79-a5f518cd8cbd
I0504 02:30:18.473555 1 round_trippers.go:580] Cache-Control: no-cache, private
I0504 02:30:18.473670 1 round_trippers.go:580] Content-Type: application/vnd.kubernetes.protobuf
I0504 02:30:18.473723 1 round_trippers.go:580] X-Kubernetes-Pf-Flowschema-Uid: 46c850ec-633b-4590-974a-44dff367d480
I0504 02:30:18.473852 1 request.go:1170] Response Body:
00000000 6b 38 73 00 0a 19 0a 10 65 76 65 6e 74 73 2e 6b |k8s.....events.k|
00000010 38 73 2e 69 6f 2f 76 31 12 05 45 76 65 6e 74 12 |8s.io/v1..Event.|
00000020 f2 04 0a c0 02 0a 24 70 32 2d 37 36 38 36 37 34 |......$p2-768674|
00000030 39 38 62 37 2d 36 6c 68 36 32 2e 31 37 35 62 63 |98b7-6lh62.175bc|
00000040 66 64 39 36 39 34 64 38 35 63 37 12 00 1a 07 64 |fd9694d85c7....d|
00000050 65 66 61 75 6c 74 22 00 2a 24 64 31 33 36 39 31 |efault".*$d13691|
00000060 62 66 2d 61 66 62 31 2d 34 62 30 32 2d 62 35 65 |bf-afb1-4b02-b5e|
00000070 66 2d 32 62 64 37 38 35 34 31 30 62 64 61 32 04 |f-2bd785410bda2.|
00000080 36 38 38 38 38 00 42 08 08 ba b1 cc a2 06 10 00 |68888.B.........|
00000090 8a 01 d1 01 0a 0e 6b 75 62 65 2d 73 63 68 65 64 |......kube-sched|
000000a0 75 6c 65 72 12 06 55 70 64 61 74 65 1a 10 65 76 |uler..Update..ev|
000000b0 65 6e 74 73 2e 6b 38 73 2e 69 6f 2f 76 31 22 08 |ents.k8s.io/v1".|
000000c0 08 ba b1 cc a2 06 10 00 32 08 46 69 65 6c 64 73 |........2.Fields|
000000d0 56 31 3a 8e 01 0a 8b 01 7b 22 66 3a 61 63 74 69 |V1:.....{"f:acti|
000000e0 6f 6e 22 3a 7b 7d 2c 22 66 3a 65 76 65 6e 74 54 |on":{},"f:eventT|
000000f0 69 6d 65 22 3a 7b 7d 2c 22 66 3a 6e 6f 74 65 22 |ime":{},"f:note"|
00000100 3a 7b 7d 2c 22 66 3a 72 65 61 73 6f 6e 22 3a 7b |:{},"f:reason":{|
00000110 7d 2c 22 66 3a 72 65 67 61 72 64 69 6e 67 22 3a |},"f:regarding":|
00000120 7b 7d 2c 22 66 3a 72 65 70 6f 72 74 69 6e 67 43 |{},"f:reportingC|
00000130 6f 6e 74 72 6f 6c 6c 65 72 22 3a 7b 7d 2c 22 66 |ontroller":{},"f|
00000140 3a 72 65 70 6f 72 74 69 6e 67 49 6e 73 74 61 6e |:reportingInstan|
00000150 63 65 22 3a 7b 7d 2c 22 66 3a 74 79 70 65 22 3a |ce":{},"f:type":|
00000160 7b 7d 7d 42 00 12 0c 08 ba b1 cc a2 06 10 98 b3 |{}}B............|
00000170 9a cf 01 22 17 6e 65 74 77 6f 72 6b 2d 61 77 61 |...".network-awa|
00000180 72 65 2d 73 63 68 65 64 75 6c 65 72 2a 44 6e 65 |re-scheduler*Dne|
00000190 74 77 6f 72 6b 2d 61 77 61 72 65 2d 73 63 68 65 |twork-aware-sche|
000001a0 64 75 6c 65 72 2d 73 63 68 65 64 75 6c 65 72 2d |duler-scheduler-|
000001b0 70 6c 75 67 69 6e 73 2d 73 63 68 65 64 75 6c 65 |plugins-schedule|
000001c0 72 2d 36 37 64 34 38 39 63 35 63 38 2d 70 72 32 |r-67d489c5c8-pr2|
000001d0 77 66 32 07 42 69 6e 64 69 6e 67 3a 09 53 63 68 |wf2.Binding:.Sch|
000001e0 65 64 75 6c 65 64 42 55 0a 03 50 6f 64 12 07 64 |eduledBU..Pod..d|
000001f0 65 66 61 75 6c 74 1a 13 70 32 2d 37 36 38 36 37 |efault..p2-76867|
00000200 34 39 38 62 37 2d 36 6c 68 36 32 22 24 65 32 37 |498b7-6lh62"$e27|
00000210 36 33 62 37 35 2d 64 37 39 35 2d 34 32 39 63 2d |63b75-d795-429c-|
00000220 61 34 33 32 2d 37 39 65 61 37 36 64 62 31 34 61 |a432-79ea76db14a|
00000230 33 2a 02 76 31 32 04 36 38 36 38 3a 00 52 41 53 |3*.v12.6868:.RAS|
00000240 75 63 63 65 73 73 66 75 6c 6c 79 20 61 73 73 69 |uccessfully assi|
00000250 67 6e 65 64 20 64 65 66 61 75 6c 74 2f 70 32 2d |gned default/p2-|
00000260 37 36 38 36 37 34 39 38 62 37 2d 36 6c 68 36 32 |76867498b7-6lh62|
00000270 20 74 6f 20 6d 69 6e 69 6b 75 62 65 2d 6d 30 32 | to minikube-m02|
00000280 5a 06 4e 6f 72 6d 61 6c 62 04 0a 00 12 00 6a 00 |Z.Normalb.....j.|
00000290 72 00 78 00 1a 00 22 00 |r.x...".|
I0504 02:30:18.473973 1 round_trippers.go:553] POST https://192.168.49.2:8443/apis/events.k8s.io/v1/namespaces/default/events 201 Created in 25 milliseconds
I0504 02:30:18.474052 1 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 20 ms Duration 25 ms
I0504 02:30:18.474100 1 round_trippers.go:577] Response Headers:
I0504 02:30:18.474150 1 round_trippers.go:580] Content-Type: application/vnd.kubernetes.protobuf
I0504 02:30:18.474211 1 round_trippers.go:580] X-Kubernetes-Pf-Flowschema-Uid: 46c850ec-633b-4590-974a-44dff367d480
I0504 02:30:18.474272 1 round_trippers.go:580] X-Kubernetes-Pf-Prioritylevel-Uid: 78634a56-cf2f-4491-ab29-d3a44816f613
I0504 02:30:18.474313 1 round_trippers.go:580] Content-Length: 664
I0504 02:30:18.474383 1 round_trippers.go:580] Date: Thu, 04 May 2023 02:30:18 GMT
I0504 02:30:18.474435 1 round_trippers.go:580] Audit-Id: 8703dc2b-a143-41d9-9da7-f4285ea10ffc
I0504 02:30:18.474482 1 round_trippers.go:580] Cache-Control: no-cache, private
I0504 02:30:18.474573 1 request.go:1170] Response Body:
00000000 6b 38 73 00 0a 19 0a 10 65 76 65 6e 74 73 2e 6b |k8s.....events.k|
00000010 38 73 2e 69 6f 2f 76 31 12 05 45 76 65 6e 74 12 |8s.io/v1..Event.|
00000020 f2 04 0a c0 02 0a 24 70 31 2d 35 39 62 63 62 36 |......$p1-59bcb6|
00000030 35 66 35 36 2d 34 72 72 66 63 2e 31 37 35 62 63 |5f56-4rrfc.175bc|
00000040 66 64 39 36 61 31 36 32 64 34 32 12 00 1a 07 64 |fd96a162d42....d|
00000050 65 66 61 75 6c 74 22 00 2a 24 32 30 36 30 63 32 |efault".*$2060c2|
00000060 33 34 2d 32 65 65 31 2d 34 63 65 36 2d 61 38 61 |34-2ee1-4ce6-a8a|
00000070 35 2d 39 32 36 66 38 33 38 35 64 33 34 34 32 04 |5-926f8385d3442.|
00000080 36 38 38 39 38 00 42 08 08 ba b1 cc a2 06 10 00 |68898.B.........|
00000090 8a 01 d1 01 0a 0e 6b 75 62 65 2d 73 63 68 65 64 |......kube-sched|
000000a0 75 6c 65 72 12 06 55 70 64 61 74 65 1a 10 65 76 |uler..Update..ev|
000000b0 65 6e 74 73 2e 6b 38 73 2e 69 6f 2f 76 31 22 08 |ents.k8s.io/v1".|
000000c0 08 ba b1 cc a2 06 10 00 32 08 46 69 65 6c 64 73 |........2.Fields|
000000d0 56 31 3a 8e 01 0a 8b 01 7b 22 66 3a 61 63 74 69 |V1:.....{"f:acti|
000000e0 6f 6e 22 3a 7b 7d 2c 22 66 3a 65 76 65 6e 74 54 |on":{},"f:eventT|
000000f0 69 6d 65 22 3a 7b 7d 2c 22 66 3a 6e 6f 74 65 22 |ime":{},"f:note"|
00000100 3a 7b 7d 2c 22 66 3a 72 65 61 73 6f 6e 22 3a 7b |:{},"f:reason":{|
00000110 7d 2c 22 66 3a 72 65 67 61 72 64 69 6e 67 22 3a |},"f:regarding":|
00000120 7b 7d 2c 22 66 3a 72 65 70 6f 72 74 69 6e 67 43 |{},"f:reportingC|
00000130 6f 6e 74 72 6f 6c 6c 65 72 22 3a 7b 7d 2c 22 66 |ontroller":{},"f|
00000140 3a 72 65 70 6f 72 74 69 6e 67 49 6e 73 74 61 6e |:reportingInstan|
00000150 63 65 22 3a 7b 7d 2c 22 66 3a 74 79 70 65 22 3a |ce":{},"f:type":|
00000160 7b 7d 7d 42 00 12 0c 08 ba b1 cc a2 06 10 a8 e2 |{}}B............|
00000170 bc d5 01 22 17 6e 65 74 77 6f 72 6b 2d 61 77 61 |...".network-awa|
00000180 72 65 2d 73 63 68 65 64 75 6c 65 72 2a 44 6e 65 |re-scheduler*Dne|
00000190 74 77 6f 72 6b 2d 61 77 61 72 65 2d 73 63 68 65 |twork-aware-sche|
000001a0 64 75 6c 65 72 2d 73 63 68 65 64 75 6c 65 72 2d |duler-scheduler-|
000001b0 70 6c 75 67 69 6e 73 2d 73 63 68 65 64 75 6c 65 |plugins-schedule|
000001c0 72 2d 36 37 64 34 38 39 63 35 63 38 2d 70 72 32 |r-67d489c5c8-pr2|
000001d0 77 66 32 07 42 69 6e 64 69 6e 67 3a 09 53 63 68 |wf2.Binding:.Sch|
000001e0 65 64 75 6c 65 64 42 55 0a 03 50 6f 64 12 07 64 |eduledBU..Pod..d|
000001f0 65 66 61 75 6c 74 1a 13 70 31 2d 35 39 62 63 62 |efault..p1-59bcb|
00000200 36 35 66 35 36 2d 34 72 72 66 63 22 24 63 65 36 |65f56-4rrfc"$ce6|
00000210 33 35 62 32 31 2d 63 66 62 35 2d 34 38 36 39 2d |35b21-cfb5-4869-|
00000220 61 38 35 39 2d 38 35 30 34 62 37 30 65 31 30 64 |a859-8504b70e10d|
00000230 65 2a 02 76 31 32 04 36 38 36 36 3a 00 52 41 53 |e*.v12.6866:.RAS|
00000240 75 63 63 65 73 73 66 75 6c 6c 79 20 61 73 73 69 |uccessfully assi|
00000250 67 6e 65 64 20 64 65 66 61 75 6c 74 2f 70 31 2d |gned default/p1-|
00000260 35 39 62 63 62 36 35 66 35 36 2d 34 72 72 66 63 |59bcb65f56-4rrfc|
00000270 20 74 6f 20 6d 69 6e 69 6b 75 62 65 2d 6d 30 34 | to minikube-m04|
00000280 5a 06 4e 6f 72 6d 61 6c 62 04 0a 00 12 00 6a 00 |Z.Normalb.....j.|
00000290 72 00 78 00 1a 00 22 00 |r.x...".|
I0504 02:30:18.474868 1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p1-59bcb65f56-4rrfc"
I0504 02:30:18.475092 1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p2-76867498b7-6lh62"
I0504 02:30:18.475204 1 eventhandlers.go:116] "Add event for unscheduled pod" pod="default/p3-5f458cf755-r4sc7"
I0504 02:30:18.475316 1 scheduling_queue.go:957] "About to try and schedule pod" pod="default/p3-5f458cf755-r4sc7"
I0504 02:30:18.475395 1 schedule_one.go:85] "Attempting to schedule pod" pod="default/p3-5f458cf755-r4sc7"
I0504 02:30:18.475518 1 networkoverhead.go:612] "namespaces: %s" [default]="(MISSING)"
I0504 02:30:18.475582 1 networkoverhead.go:614] "appGroup CR" namespace="default" ag.lister=&{indexer:0xc000654cd8}
I0504 02:30:18.475605 1 networkoverhead.go:629] "namespaces: %s" [default]="(MISSING)"
I0504 02:30:18.475617 1 networkoverhead.go:631] "networkTopology CR:" namespace="default" nt.lister=&{indexer:0xc000655620}
I0504 02:30:18.475721 1 networkoverhead.go:324] "Score all nodes equally, return"
I0504 02:30:18.475746 1 networkoverhead.go:324] "Score all nodes equally, return"
I0504 02:30:18.475758 1 networkoverhead.go:324] "Score all nodes equally, return"
I0504 02:30:18.475769 1 networkoverhead.go:324] "Score all nodes equally, return"
I0504 02:30:18.476018 1 networkoverhead.go:365] "before normalization: " scores=[{Name:minikube Score:0} {Name:minikube-m02 Score:0} {Name:minikube-m03 Score:0} {Name:minikube-m04 Score:0}]
I0504 02:30:18.476169 1 default_binder.go:52] "Attempting to bind pod to node" pod="default/p3-5f458cf755-r4sc7" node="minikube-m04"
I am deploying the appgroup crd without the netpert controller, the following is my yaml file.
# Example App Group CRD spec
apiVersion: appgroup.diktyo.x-k8s.io/v1alpha1
kind: AppGroup
metadata:
name: a1
spec:
numMembers: 3
topologySortingAlgorithm: KahnSort
workloads:
- workload:
kind: Deployment
name: p1
selector: p1
apiVersion: apps/v1
namespace: default
dependencies:
- workload:
kind: Deployment
name: p2
selector: p2
apiVersion: apps/v1
namespace: default
minBandwidth: "100Mi"
maxNetworkCost: 30
- workload:
kind: Deployment
name: p2
selector: p2
apiVersion: apps/v1
namespace: default
dependencies:
- workload:
kind: Deployment
name: p3
selector: p3
apiVersion: apps/v1
namespace: default
minBandwidth: "250Mi"
maxNetworkCost: 20
- workload:
kind: Deployment
name: p3
selector: p3
apiVersion: apps/v1
namespace: default
ubuntu@ip-172-31-11-54:~/test_deploy$ kubectl describe appgroups a1
Name: a1
Namespace: default
Labels: <none>
Annotations: <none>
API Version: appgroup.diktyo.x-k8s.io/v1alpha1
Kind: AppGroup
Metadata:
Creation Timestamp: 2023-05-04T02:29:30Z
Generation: 8
Resource Version: 6937
UID: 4bb06824-1f6f-4b71-84f2-47f09cf52103
Spec:
Num Members: 3
Topology Sorting Algorithm: KahnSort
Workloads:
Dependencies:
Max Network Cost: 30
Min Bandwidth: 100Mi
Workload:
API Version: apps/v1
Kind: Deployment
Name: p2
Namespace: default
Selector: p2
Workload:
API Version: apps/v1
Kind: Deployment
Name: p1
Namespace: default
Selector: p1
Dependencies:
Max Network Cost: 20
Min Bandwidth: 250Mi
Workload:
API Version: apps/v1
Kind: Deployment
Name: p3
Namespace: default
Selector: p3
Workload:
API Version: apps/v1
Kind: Deployment
Name: p2
Namespace: default
Selector: p2
Workload:
API Version: apps/v1
Kind: Deployment
Name: p3
Namespace: default
Selector: p3
Status:
Running Workloads: 3
Topology Calculation Time: 2023-05-04T02:29:30Z
Topology Order:
Index: 1
Workload:
API Version: apps/v1
Kind: Deployment
Name: p1
Namespace: default
Selector: p1
Index: 2
Workload:
API Version: apps/v1
Kind: Deployment
Name: p2
Namespace: default
Selector: p2
Index: 3
Workload:
API Version: apps/v1
Kind: Deployment
Name: p3
Namespace: default
Selector: p3
Events: <none>
kubectl logs -f appgroup-controller-b9d5f9bb7-wrr98 -n network-aware-controllers
W0504 02:27:04.969172 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0504 02:27:04.970483 1 appgroup.go:104] "Starting App Group controller"
I0504 02:27:05.070875 1 appgroup.go:111] "App Group sync finished"
kubectl logs -f networktopology-controller-65b7b4b464-tw2n6 -n network-aware-controllers
W0504 02:27:10.274079 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0504 02:27:10.275322 1 networktopology.go:147] "Starting Network Topology controller"
E0504 02:27:10.318650 1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.x-k8s.io \"a1\" not found"
E0504 02:27:10.318914 1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.x-k8s.io \"a1\" not found"
Hi @dyyfk,
Can you scale all deployments to 3 replicas for instance, and see if the same pattern occurs? Or change the sorting algorithm?
Because your app follows the sequential pattern (p1 -> p2 -> p3) and you also chose to use KahnSort meaning pods will be deployed in the exact same order. So, no dependencies will exist in the cluster when deploying the pods sequentially, thus the costs are all 0, and Diktyo plugins do not affect the placement decision. However, if you scale all deployments, pods are already deployed in the cluster and different costs should be calculated for the different nodes.
The error in the networkTopology controller is docker related I am afraid. I had to completely reset my setup to get rid of it.
I want to use this scheduler as a second scheduler but found the documentation incomplete. What is the current state of this project? Is the scheduler implementation completed? How do I ensure the scheduler is working as expected?
I tried to reproduce the work in this video but I could not find all the yaml files in this video. https://www.youtube.com/watch?v=E4cP275_OCs