kubernetes-sigs / scheduler-plugins

Repository for out-of-tree scheduler plugins based on scheduler framework.
Apache License 2.0
1.11k stars 516 forks source link

[Network-Aware-Scheduling] Current state of the project? #579

Closed dyyfk closed 1 year ago

dyyfk commented 1 year ago

I want to use this scheduler as a second scheduler but found the documentation incomplete. What is the current state of this project? Is the scheduler implementation completed? How do I ensure the scheduler is working as expected?

I tried to reproduce the work in this video but I could not find all the yaml files in this video. https://www.youtube.com/watch?v=E4cP275_OCs

zwpaper commented 1 year ago

Hi, @dyyfk thanks for raising this issue, we currently spend most of our time on the functionality, and yes, there are some lacks of docs.

as for the network-aware scheduling, you can check https://github.com/kubernetes-sigs/scheduler-plugins/blob/master/pkg/networkaware/README.md.

if anything does not work as expected, please feel free and welcome to raise a bug here.

/kind support

dyyfk commented 1 year ago

Mostly, I want to know if the scheduler is behaving as expected and I am looking for a way to test it. I would appreciate it if you could let me know how could I use the existing yaml file to test the scheduler/controller.

Huang-Wei commented 1 year ago

cc the author of network aware plugin @jpedro1992

jpedro1992 commented 1 year ago

Hi @dyyfk, thank you for raising the issue and interest in the network-aware scheduler.

Which application are you trying to deploy, and which challenges are you facing? Are you using the provided yaml files? Here you can find most of them.

To see if the scheduler/controller behave as expected please check the logs of both. However, it depends exactly on which application you are deploying and both the AppGroup and NetworkTopology CRs.

All components have been implemented. Some are hosted outside this repo, here.

For further information/documentation, please check our KEP.

dyyfk commented 1 year ago

Hi, @jpedro1992. I want to test if the scheduler would always schedule the pods on the same node using the following scheduler config. Note that I change the weight of NetworkOverhead to 100.

kubectl describe configmap network-aware-scheduler-config -n kube-system
Name:         network-aware-scheduler-config
Namespace:    kube-system
Labels:       <none>
Annotations:  <none>

Data
====
scheduler-config.yaml:
----
apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: KubeSchedulerConfiguration
leaderElection:
  leaderElect: false
clientConnection:
  kubeconfig: "/etc/kubernetes/scheduler.conf"
profiles:
  - schedulerName: network-aware-scheduler
    plugins:
      queueSort:
        enabled:
          - name: TopologicalSort
        disabled:
          - name: "*"
      preFilter:
        enabled:
          - name: NetworkOverhead
      filter:
        enabled:
          - name: NetworkOverhead
      score:
        disabled: # Preferably avoid the combination of NodeResourcesFit with NetworkOverhead
          - name: NodeResourcesFit
        enabled: # A higher weight is given to NetworkOverhead to favor allocation schemes with lower latency.
          - name: NetworkOverhead
            weight: 100
    pluginConfig:
      - name: TopologicalSort
        args:
          namespaces:
            - "default"
      - name: NetworkOverhead
        args:
          namespaces:
            - "default"
          weightsName: "UserDefined" # or "NetperfCosts"
          networkTopologyName: "net-topology-test"

BinaryData
====

Events:  <none>

My appgroups looks like the following:

Name:         a1
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  appgroup.diktyo.k8s.io/v1alpha1
Kind:         AppGroup
Metadata:
  Creation Timestamp:  2023-04-22T11:03:57Z
  Generation:          2
  Resource Version:    3695
  UID:                 c6b07ce0-a8ad-4c89-bba1-42e63ab04b71
Spec:
  Num Members:                 3
  Topology Sorting Algorithm:  KahnSort
  Workloads:
    Dependencies:
      Max Network Cost:  30
      Min Bandwidth:     100Mi
      Workload:
        API Version:  apps/v1
        Kind:         Deployment
        Name:         p2
        Namespace:    default
        Selector:     p2
    Workload:
      API Version:  apps/v1
      Kind:         Deployment
      Name:         p1
      Namespace:    default
      Selector:     p1
    Dependencies:
      Max Network Cost:  20
      Min Bandwidth:     250Mi
      Workload:
        API Version:  apps/v1
        Kind:         Deployment
        Name:         p3
        Namespace:    default
        Selector:     p3
    Workload:
      API Version:  apps/v1
      Kind:         Deployment
      Name:         p2
      Namespace:    default
      Selector:     p2
    Workload:
      API Version:  apps/v1
      Kind:         Deployment
      Name:         p3
      Namespace:    default
      Selector:     p3
Status:
  Topology Calculation Time:  2023-04-22T11:03:57Z
  Topology Order:
    Index:  1
    Workload:
      API Version:  apps/v1
      Kind:         Deployment
      Name:         p1
      Namespace:    default
      Selector:     p1
    Index:          2
    Workload:
      API Version:  apps/v1
      Kind:         Deployment
      Name:         p2
      Namespace:    default
      Selector:     p2
    Index:          3
    Workload:
      API Version:  apps/v1
      Kind:         Deployment
      Name:         p3
      Namespace:    default
      Selector:     p3
Events:             <none>
Name:         net-topology-test
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  networktopology.diktyo.k8s.io/v1alpha1
Kind:         NetworkTopology
Metadata:
  Creation Timestamp:  2023-04-22T11:04:41Z
  Generation:          1
  Resource Version:    3759
  UID:                 c6081d0c-f991-42e7-803c-a6cf430c9ab1
Spec:
  Configmap Name:  netperfMetrics
  Weights:
    Name:  UserDefined
    Topology List:
      Origin List:
        Cost List:
          Bandwidth Capacity:  10Gi
          Destination:         us-east-1
          Network Cost:        20
        Origin:                us-west-1
        Cost List:
          Bandwidth Capacity:  10Gi
          Destination:         us-west-1
          Network Cost:        20
        Origin:                us-east-1
      Topology Key:            topology.kubernetes.io/region
      Origin List:
        Cost List:
          Bandwidth Capacity:  1Gi
          Destination:         z2
          Network Cost:        5
        Origin:                z1
        Cost List:
          Bandwidth Capacity:  1Gi
          Destination:         z1
          Network Cost:        5
        Origin:                z2
        Cost List:
          Bandwidth Capacity:  1Gi
          Destination:         z4
          Network Cost:        10
        Origin:                z3
        Cost List:
          Bandwidth Capacity:  1Gi
          Destination:         z3
          Network Cost:        10
        Origin:                z4
      Topology Key:            topology.kubernetes.io/zone
Events:                        <none>

I am testing the scheduler on an 8-node minikube, with network-plugin enabled.

kubectl get nodes -o  wide
NAME           STATUS   ROLES                  AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION    CONTAINER-RUNTIME
minikube       Ready    control-plane,master   12h   v1.25.7   192.168.49.2   <none>        Ubuntu 20.04.5 LTS   5.19.0-1022-aws   docker://23.0.2
minikube-m02   Ready    <none>                 12h   v1.25.7   192.168.49.3   <none>        Ubuntu 20.04.5 LTS   5.19.0-1022-aws   docker://23.0.2
minikube-m03   Ready    <none>                 12h   v1.25.7   192.168.49.4   <none>        Ubuntu 20.04.5 LTS   5.19.0-1022-aws   docker://23.0.2
minikube-m04   Ready    <none>                 12h   v1.25.7   192.168.49.5   <none>        Ubuntu 20.04.5 LTS   5.19.0-1022-aws   docker://23.0.2
minikube-m05   Ready    <none>                 12h   v1.25.7   192.168.49.6   <none>        Ubuntu 20.04.5 LTS   5.19.0-1022-aws   docker://23.0.2
minikube-m06   Ready    <none>                 12h   v1.25.7   192.168.49.7   <none>        Ubuntu 20.04.5 LTS   5.19.0-1022-aws   docker://23.0.2
minikube-m07   Ready    <none>                 12h   v1.25.7   192.168.49.8   <none>        Ubuntu 20.04.5 LTS   5.19.0-1022-aws   docker://23.0.2
minikube-m08   Ready    <none>                 12h   v1.25.7   192.168.49.9   <none>        Ubuntu 20.04.5 LTS   5.19.0-1022-aws   docker://23.0.2

I labeled each node as the following:

kubectl describe nodes | grep topology.kubernetes.io/region
                    topology.kubernetes.io/region=us-west-1
                    topology.kubernetes.io/region=us-west-1
                    topology.kubernetes.io/region=us-west-1
                    topology.kubernetes.io/region=us-west-1
                    topology.kubernetes.io/region=us-east-1
                    topology.kubernetes.io/region=us-east-1
                    topology.kubernetes.io/region=us-east-1
                    topology.kubernetes.io/region=us-east-1
kubectl describe nodes | grep topology.kubernetes.io/zone
                    topology.kubernetes.io/zone=z1
                    topology.kubernetes.io/zone=z1
                    topology.kubernetes.io/zone=z2
                    topology.kubernetes.io/zone=z2
                    topology.kubernetes.io/zone=z3
                    topology.kubernetes.io/zone=z3
                    topology.kubernetes.io/zone=z4
                    topology.kubernetes.io/zone=z4

My p1, p2, p3 yaml files are the following:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: p1
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: p1
  template:
    metadata:
      labels:
        app: p1
    spec:
      schedulerName:  network-aware-scheduler
      containers:
      - name: p1-container
        image: nginx
        ports:
        - containerPort: 80
apiVersion: apps/v1
kind: Deployment
metadata:
  name: p2
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: p2
  template:
    metadata:
      labels:
        app: p2
    spec:
      schedulerName:  network-aware-scheduler
      containers:
      - name: p2-container
        image: redis
        ports:
        - containerPort: 6379
apiVersion: apps/v1
kind: Deployment
metadata:
  name: p2
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: p2
  template:
    metadata:
      labels:
        app: p2
    spec:
      schedulerName:  network-aware-scheduler
      containers:
      - name: p2-container
        image: redis
        ports:
        - containerPort: 6379
apiVersion: apps/v1
kind: Deployment
metadata:
  name: p3
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: p3
  template:
    metadata:
      labels:
        app: p3
    spec:
      schedulerName: network-aware-scheduler
      containers:
      - name: p3-container
        image: postgres
        env:
        - name: POSTGRES_PASSWORD
          value: "password"
        ports:
        - containerPort: 5432

Questions

I expect all the nodes to be in the same zone. But it is not, it appears that the scheduling is random as pods are scheduled across the zones or even regions.

There are no logs in the controllers when I deploy the appgroups crd and network crds. Is this expected? There are no logs in the scheduler when I deploy the p1, p2 and p3. Is this expected?

Snapshots of all the pods

kubectl get pods --all-namespaces
NAMESPACE                   NAME                                            READY   STATUS    RESTARTS      AGE
default                     p1-d4b878974-mfnm5                              1/1     Running   0             114m
default                     p2-c8bf46f66-9gj24                              1/1     Running   0             114m
default                     p3-9444dfcf9-gr9h8                              1/1     Running   0             114m
kube-system                 calico-kube-controllers-798cc86c47-89g8h        1/1     Running   3 (12h ago)   12h
kube-system                 calico-node-2gnwz                               1/1     Running   0             12h
kube-system                 calico-node-49wcq                               1/1     Running   0             12h
kube-system                 calico-node-89h4b                               1/1     Running   0             12h
kube-system                 calico-node-cp2fr                               1/1     Running   0             12h
kube-system                 calico-node-dv2bv                               1/1     Running   0             12h
kube-system                 calico-node-hhkpg                               1/1     Running   0             12h
kube-system                 calico-node-lkshk                               1/1     Running   0             12h
kube-system                 calico-node-lr4zk                               1/1     Running   0             12h
kube-system                 coredns-565d847f94-ggkhh                        1/1     Running   1 (12h ago)   12h
kube-system                 etcd-minikube                                   1/1     Running   0             12h
kube-system                 kube-apiserver-minikube                         1/1     Running   0             12h
kube-system                 kube-controller-manager-minikube                1/1     Running   0             12h
kube-system                 kube-proxy-5zccp                                1/1     Running   0             12h
kube-system                 kube-proxy-6p6dd                                1/1     Running   0             12h
kube-system                 kube-proxy-9rhrs                                1/1     Running   0             12h
kube-system                 kube-proxy-g5825                                1/1     Running   0             12h
kube-system                 kube-proxy-kl4p7                                1/1     Running   0             12h
kube-system                 kube-proxy-lr4rt                                1/1     Running   0             12h
kube-system                 kube-proxy-sbh6q                                1/1     Running   0             12h
kube-system                 kube-proxy-t9nqx                                1/1     Running   0             12h
kube-system                 kube-scheduler-minikube                         1/1     Running   0             12h
kube-system                 network-aware-scheduler-5ffc766dd9-tk88r        1/1     Running   0             116m
kube-system                 registry-gg4x8                                  1/1     Running   0             12h
kube-system                 registry-proxy-5k27s                            1/1     Running   0             12h
kube-system                 registry-proxy-9bcps                            1/1     Running   0             12h
kube-system                 registry-proxy-jhxbl                            1/1     Running   0             12h
kube-system                 registry-proxy-m55rg                            1/1     Running   0             12h
kube-system                 registry-proxy-n2952                            1/1     Running   0             12h
kube-system                 registry-proxy-ncczr                            1/1     Running   0             12h
kube-system                 registry-proxy-vpj2n                            1/1     Running   0             12h
kube-system                 registry-proxy-vsrl5                            1/1     Running   0             12h
kube-system                 storage-provisioner                             1/1     Running   1 (12h ago)   12h
network-aware-controllers   appgroup-controller-5fb544569c-l4856            1/1     Running   0             12h
network-aware-controllers   networktopology-controller-67b5fc85bf-m4qqg     1/1     Running   0             12h
scheduler-plugins           scheduler-plugins-controller-5d97947dd8-svvb8   1/1     Running   0             12h
kubectl get pods -o wide
NAME                 READY   STATUS    RESTARTS   AGE    IP               NODE           NOMINATED NODE   READINESS GATES
p1-d4b878974-mfnm5   1/1     Running   0          115m   10.244.120.75    minikube       <none>           <none>
p2-c8bf46f66-9gj24   1/1     Running   0          115m   10.244.239.139   minikube-m08   <none>           <none>
p3-9444dfcf9-gr9h8   1/1     Running   0          115m   10.244.151.6     minikube-m03   <none>           <none>
kubectl logs -f network-aware-scheduler-5ffc766dd9-tk88r -n kube-system
I0422 21:03:01.999009       1 serving.go:348] Generated self-signed cert in-memory
I0422 21:03:03.141415       1 server.go:148] "Starting Kubernetes Scheduler" version="v0.25.7"
I0422 21:03:03.141449       1 server.go:150] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0422 21:03:03.146040       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0422 21:03:03.146059       1 shared_informer.go:255] Waiting for caches to sync for RequestHeaderAuthRequestController
I0422 21:03:03.146096       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0422 21:03:03.146120       1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0422 21:03:03.146144       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0422 21:03:03.146151       1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0422 21:03:03.146158       1 secure_serving.go:210] Serving securely on [::]:10259
I0422 21:03:03.146216       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0422 21:03:03.246357       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0422 21:03:03.246356       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0422 21:03:03.246358       1 shared_informer.go:262] Caches are synced for RequestHeaderAuthRequestController
kubectl logs -f scheduler-plugins-controller-5d97947dd8-svvb8 -n scheduler-plugins
W0422 10:58:22.601890       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0422 10:58:22.806294       1 logr.go:261] controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"=":8080"
I0422 10:58:22.806763       1 logr.go:261] setup "msg"="starting manager"
I0422 10:58:22.806916       1 elasticquota.go:115] "Starting Elastic Quota control loop"
I0422 10:58:22.806990       1 elasticquota.go:117] "Waiting for informer caches to sync"
I0422 10:58:22.807112       1 internal.go:362]  "msg"="Starting server" "addr"={"IP":"::","Port":8080,"Zone":""} "kind"="metrics" "path"="/metrics"
I0422 10:58:22.807219       1 controller.go:185]  "msg"="Starting EventSource" "controller"="podgroup" "controllerGroup"="scheduling.x-k8s.io" "controllerKind"="PodGroup" "source"="kind source: *v1alpha1.PodGroup"
I0422 10:58:22.807318       1 controller.go:185]  "msg"="Starting EventSource" "controller"="podgroup" "controllerGroup"="scheduling.x-k8s.io" "controllerKind"="PodGroup" "source"="kind source: *v1.Pod"
I0422 10:58:22.807239       1 internal.go:362]  "msg"="Starting server" "addr"={"IP":"::","Port":8081,"Zone":""} "kind"="health probe"
I0422 10:58:22.807385       1 controller.go:193]  "msg"="Starting Controller" "controller"="podgroup" "controllerGroup"="scheduling.x-k8s.io" "controllerKind"="PodGroup"
I0422 10:58:22.907266       1 elasticquota.go:122] "Elastic Quota sync finished"
I0422 10:58:22.908053       1 controller.go:227]  "msg"="Starting workers" "controller"="podgroup" "controllerGroup"="scheduling.x-k8s.io" "controllerKind"="PodGroup" "worker count"=1
kubectl logs -f appgroup-controller-5fb544569c-l4856 -n network-aware-controllers
W0422 10:56:47.748474       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0422 10:56:47.750391       1 appgroup.go:104] "Starting App Group controller"
I0422 10:56:47.850601       1 appgroup.go:111] "App Group sync finished"
kubectl logs -f networktopology-controller-67b5fc85bf-m4qqg -n network-aware-controllers
W0422 10:57:51.562237       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0422 10:57:51.563802       1 networktopology.go:147] "Starting Network Topology controller"
I0422 10:57:51.664820       1 networktopology.go:155] "Network Topology sync finished"
jpedro1992 commented 1 year ago

Hi @dyyfk, Thank you for the detailed description of your deployment.

I notice you are missing two important labels (appgroup.diktyo.x-k8s.io and appgroup.diktyo.x-k8s.io.workload) in the pod deployment files based on the AppGroup CRD. Please see the following example below. It should work as expected after including these labels. Please let me know if it worked! Regards

appgroup.diktyo.x-k8s.io: Tells the scheduler which AppGroup the pod belongs to. It can be appgroup.diktyo.k8s.io, depends on the api version you are using.

appgroup.diktyo.x-k8s.io.workload: Tells the scheduler which workload it corresponds to. It can be appgroup.diktyo.x-k8s.io.workload OR workload if a previous version of the AppGroup is deployed. Please check the api version you are using.

# online boutique example for adservice
apiVersion: apps/v1
kind: Deployment
metadata:
  name: adservice
spec:
  replicas: 1
  selector:
    matchLabels:
      app: adservice
  template:
    metadata:
      labels:
        app: adservice
        appgroup.diktyo.x-k8s.io: online-boutique 
        appgroup.diktyo.x-k8s.io.workload: adservice
    spec:
      schedulerName: network-aware-scheduler
      initContainers:
        - name: sfx-instrumentation
          image: quay.io/signalfuse/sfx-zero-config-agent:latest
          # image: sfx-zero-config-agent
          # imagePullPolicy: Never
          volumeMounts:
            - mountPath: /opt/sfx/
              name: sfx-instrumentation
      containers:
        - name: server
          image: quay.io/signalfuse/microservices-demo-adservice:433c23881a
          ports:
            - containerPort: 9555
          env:
            - name: PORT
              value: '9555'
            - name: OTEL_EXPORTER_ZIPKIN_SERVICE_NAME
              value: adservice
            - name: NODE_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.hostIP
            - name: OTEL_EXPORTER
              value: zipkin
            - name: JAVA_TOOL_OPTIONS
              value: -javaagent:/opt/sfx/splunk-otel-javaagent-all.jar
            - name: OTEL_EXPORTER_ZIPKIN_ENDPOINT
              value: 'http://$(NODE_IP):9411/v1/trace'
          volumeMounts:
            - mountPath: /opt/sfx
              name: sfx-instrumentation
          resources:
            requests:
              cpu: 200m
              memory: 180Mi
            limits:
              cpu: 300m
              memory: 300Mi
          readinessProbe:
            initialDelaySeconds: 60
            periodSeconds: 25
            exec:
              command: ['/bin/grpc_health_probe', '-addr=:9555']
          livenessProbe:
            initialDelaySeconds: 60
            periodSeconds: 30
            exec:
              command: ['/bin/grpc_health_probe', '-addr=:9555']
      volumes:
        - emptyDir: {}
          name: sfx-instrumentation
---
dyyfk commented 1 year ago

Hi, it looks like I am using an older version build.

Here is my updated yaml file.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: p1
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: p1
  template:
    metadata:
      labels:
        app: p1
        appgroup.diktyo.k8s.io: a1
        appgroup.diktyo.k8s.io.workload: p1
    spec:
      schedulerName:  network-aware-scheduler
      containers:
      - name: p1-container
        image: nginx
        ports:
        - containerPort: 80

apiVersion: apps/v1
kind: Deployment
metadata:
  name: p2
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: p2
  template:
    metadata:
      labels:
        app: p2
        appgroup.diktyo.k8s.io: a1
        appgroup.diktyo.k8s.io.workload: p2
    spec:
      schedulerName:  network-aware-scheduler
      containers:
      - name: p2-container
        image: redis
        ports:
        - containerPort: 6379
apiVersion: apps/v1
kind: Deployment
metadata:
  name: p3
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: p3
  template:
    metadata:
      labels:
        app: p3
        appgroup.diktyo.k8s.io: a1
        appgroup.diktyo.k8s.io.workload: p3
    spec:
      schedulerName: network-aware-scheduler
      containers:
      - name: p3-container
        image: postgres
        env:
        - name: POSTGRES_PASSWORD
          value: "password"
        ports:
        - containerPort: 5432

I ran into error in the networktopology-controller as it could not find the workload. It looks like I have "appgroups.appgroup.diktyo.k8s.io" and it is trying to find "appgroup.appgroup.diktyo.k8s.io".

ubuntu@ip-172-31-3-80:~$ kubectl get crds
NAME                                                  CREATED AT
appgroups.appgroup.diktyo.k8s.io                      2023-04-22T10:54:54Z
appgroups.appgroup.diktyo.x-k8s.io                    2023-04-22T10:51:15Z
bgpconfigurations.crd.projectcalico.org               2023-04-22T10:37:08Z
bgppeers.crd.projectcalico.org                        2023-04-22T10:37:08Z
blockaffinities.crd.projectcalico.org                 2023-04-22T10:37:08Z
caliconodestatuses.crd.projectcalico.org              2023-04-22T10:37:08Z
clusterinformations.crd.projectcalico.org             2023-04-22T10:37:08Z
elasticquotas.scheduling.x-k8s.io                     2023-04-22T10:51:15Z
felixconfigurations.crd.projectcalico.org             2023-04-22T10:37:08Z
globalnetworkpolicies.crd.projectcalico.org           2023-04-22T10:37:08Z
globalnetworksets.crd.projectcalico.org               2023-04-22T10:37:08Z
hostendpoints.crd.projectcalico.org                   2023-04-22T10:37:08Z
ipamblocks.crd.projectcalico.org                      2023-04-22T10:37:08Z
ipamconfigs.crd.projectcalico.org                     2023-04-22T10:37:08Z
ipamhandles.crd.projectcalico.org                     2023-04-22T10:37:08Z
ippools.crd.projectcalico.org                         2023-04-22T10:37:08Z
ipreservations.crd.projectcalico.org                  2023-04-22T10:37:08Z
kubecontrollersconfigurations.crd.projectcalico.org   2023-04-22T10:37:08Z
networkpolicies.crd.projectcalico.org                 2023-04-22T10:37:08Z
networksets.crd.projectcalico.org                     2023-04-22T10:37:08Z
networktopologies.networktopology.diktyo.k8s.io       2023-04-22T10:57:42Z
networktopologies.networktopology.diktyo.x-k8s.io     2023-04-22T10:51:15Z
noderesourcetopologies.topology.node.k8s.io           2023-04-22T10:51:15Z
podgroups.scheduling.x-k8s.io                         2023-04-22T10:51:15Z
ubuntu@ip-172-31-3-80:~/scheduler-plugins/manifests/crds$ kubectl logs -f networktopology-controller-67b5fc85bf-m4qqg -n network-aware-controllers
W0422 10:57:51.562237       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0422 10:57:51.563802       1 networktopology.go:147] "Starting Network Topology controller"
I0422 10:57:51.664820       1 networktopology.go:155] "Network Topology sync finished"
E0424 21:31:11.301967       1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.k8s.io \"a1\" not found"
E0424 21:31:11.322252       1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.k8s.io \"a1\" not found"
jpedro1992 commented 1 year ago

Hi @dyyfk,

Which version of the controller are you using?

The most updated versions are hosted here: AppGroup and NetworkTopology. The controllers can be built locally and deployment files for K8s are available in the manifests folder.

I just updated both APIs (AppGroup and networkTopology) to the most recent version: v1.0.3-alpha

dyyfk commented 1 year ago

Hi @jpedro1992 , I have updated the appgroup controller image and networktopology controller image to the latest, and I am also using the latest APIs. But I am still getting errors.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: p1
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: p1
  template:
    metadata:
      labels:
        app: p1
        appgroup.diktyo.x-k8s.io: a1
        appgroup.diktyo.x-k8s.io.workload: p1
    spec:
      schedulerName:  network-aware-scheduler
      containers:
      - name: p1-container
        image: nginx
        ports:
        - containerPort: 80
apiVersion: apps/v1
kind: Deployment
metadata:
  name: p2
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: p2
  template:
    metadata:
      labels:
        app: p2
        appgroup.diktyo.x-k8s.io: a1
        appgroup.diktyo.x-k8s.io.workload: p2
    spec:
      schedulerName:  network-aware-scheduler
      containers:
      - name: p2-container
        image: redis
        ports:
        - containerPort: 6379
apiVersion: apps/v1
kind: Deployment
metadata:
  name: p3
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: p3
  template:
    metadata:
      labels:
        app: p3
        appgroup.diktyo.x-k8s.io: a1
        appgroup.diktyo.x-k8s.io.workload: p3
    spec:
      schedulerName: network-aware-scheduler
      containers:
      - name: p3-container
        image: postgres
        env:
        - name: POSTGRES_PASSWORD
          value: "password"
        ports:
        - containerPort: 5432
ubuntu@ip-172-31-11-54:~/test_deploy$ kubectl logs -f networktopology-controller-67b5fc85bf-l9wv9 -n network-aware-controllers
W0425 23:27:43.144138       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0425 23:27:43.145849       1 networktopology.go:147] "Starting Network Topology controller"
I0425 23:27:43.247119       1 networktopology.go:155] "Network Topology sync finished"
E0425 23:42:39.281890       1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.x-k8s.io \"a1\" not found"
E0425 23:42:39.287722       1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.x-k8s.io \"a1\" not found"
E0425 23:42:39.301695       1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.x-k8s.io \"a1\" not found"
E0425 23:42:40.035222       1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.x-k8s.io \"a1\" not found"
# Example App Group CRD spec
apiVersion: appgroup.diktyo.x-k8s.io/v1alpha1
kind: AppGroup
metadata:
  name: a1
spec:
  numMembers: 3
  topologySortingAlgorithm: KahnSort
  workloads:
    - workload:
        kind: Deployment
        name: p1
        selector: p1
        apiVersion: apps/v1
        namespace: default
      dependencies:
        - workload:
            kind: Deployment
            name: p2
            selector: p2
            apiVersion: apps/v1
            namespace: default
          minBandwidth: "100Mi"
          maxNetworkCost: 30
    - workload:
        kind: Deployment
        name: p2
        selector: p2
        apiVersion: apps/v1
        namespace: default
      dependencies:
        - workload:
            kind: Deployment
            name: p3
            selector: p3
            apiVersion: apps/v1
            namespace: default
          minBandwidth: "250Mi"
          maxNetworkCost: 20
    - workload:
        kind: Deployment
        name: P3-deployment
        selector: p3
        apiVersion: apps/v1
        namespace: default

# Example Network CRD
apiVersion: networktopology.diktyo.x-k8s.io/v1alpha1
kind: NetworkTopology
metadata:
  name: net-topology-test
  namespace: default
spec:
  configmapName: "netperfMetrics"
  weights:
    # Region label: "topology.kubernetes.io/region"
    # Zone Label:   "topology.kubernetes.io/zone"
    # 2 Regions:  us-west-1
    #             us-east-1
    # 4 Zones:    us-west-1: z1, z2
    #             us-east-1: z3, z4
    - name: "UserDefined"
      topologyList: # Define weights between regions or between zones
        - topologyKey: "topology.kubernetes.io/region" # region costs
          originList:
            - origin: "us-west-1"
              costList:
                - destination: "us-east-1"
                  bandwidthCapacity: "10Gi"
                  networkCost: 20
            - origin: "us-east-1"
              costList:
                - destination: "us-west-1"
                  bandwidthCapacity: "10Gi"
                  networkCost: 20
        - topologyKey: "topology.kubernetes.io/zone" # zone costs
          originList:
            - origin: "z1"
              costList:
                - destination: "z2"
                  bandwidthCapacity: "1Gi"
                  networkCost: 5
            - origin: "z2"
              costList:
                - destination: "z1"
                  bandwidthCapacity: "1Gi"
                  networkCost: 5
            - origin: "z3"
              costList:
                - destination: "z4"
                  bandwidthCapacity: "1Gi"
                  networkCost: 10
            - origin: "z4"
              costList:
                - destination: "z3"
                  bandwidthCapacity: "1Gi"
                  networkCost: 10
ubuntu@ip-172-31-11-54:~/test_deploy$ kubectl describe appgroups
Name:         a1
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  appgroup.diktyo.x-k8s.io/v1alpha1
Kind:         AppGroup
Metadata:
  Creation Timestamp:  2023-04-25T23:30:33Z
  Generation:          9
  Resource Version:    5859
  UID:                 fa7720b8-298e-4f8d-a650-dee72767802f
Spec:
  Num Members:                 3
  Topology Sorting Algorithm:  KahnSort
  Workloads:
    Dependencies:
      Max Network Cost:  30
      Min Bandwidth:     100Mi
      Workload:
        API Version:  apps/v1
        Kind:         Deployment
        Name:         p2
        Namespace:    default
        Selector:     p2
    Workload:
      API Version:  apps/v1
      Kind:         Deployment
      Name:         p1
      Namespace:    default
      Selector:     p1
    Dependencies:
      Max Network Cost:  20
      Min Bandwidth:     250Mi
      Workload:
        API Version:  apps/v1
        Kind:         Deployment
        Name:         p3
        Namespace:    default
        Selector:     p3
    Workload:
      API Version:  apps/v1
      Kind:         Deployment
      Name:         p2
      Namespace:    default
      Selector:     p2
    Workload:
      API Version:  apps/v1
      Kind:         Deployment
      Name:         P3-deployment
      Namespace:    default
      Selector:     p3
Status:
  Running Workloads:          3
  Topology Calculation Time:  2023-04-25T23:30:33Z
  Topology Order:
    Index:  1
    Workload:
      API Version:  apps/v1
      Kind:         Deployment
      Name:         p1
      Namespace:    default
      Selector:     p1
    Index:          2
    Workload:
      API Version:  apps/v1
      Kind:         Deployment
      Name:         p2
      Namespace:    default
      Selector:     p2
    Index:          3
    Workload:
      API Version:  apps/v1
      Kind:         Deployment
      Name:         P3-deployment
      Namespace:    default
      Selector:     p3
Events:             <none>
ubuntu@ip-172-31-11-54:~/test_deploy$ kubectl describe networktopologies
Name:         net-topology-test
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  networktopology.diktyo.x-k8s.io/v1alpha1
Kind:         NetworkTopology
Metadata:
  Creation Timestamp:  2023-04-25T23:33:14Z
  Generation:          1
  Resource Version:    4356
  UID:                 36218354-3371-4a36-a855-09159519dcec
Spec:
  Configmap Name:  netperfMetrics
  Weights:
    Name:  UserDefined
    Topology List:
      Origin List:
        Cost List:
          Bandwidth Capacity:  10Gi
          Destination:         us-east-1
          Network Cost:        20
        Origin:                us-west-1
        Cost List:
          Bandwidth Capacity:  10Gi
          Destination:         us-west-1
          Network Cost:        20
        Origin:                us-east-1
      Topology Key:            topology.kubernetes.io/region
      Origin List:
        Cost List:
          Bandwidth Capacity:  1Gi
          Destination:         z2
          Network Cost:        5
        Origin:                z1
        Cost List:
          Bandwidth Capacity:  1Gi
          Destination:         z1
          Network Cost:        5
        Origin:                z2
        Cost List:
          Bandwidth Capacity:  1Gi
          Destination:         z4
          Network Cost:        10
        Origin:                z3
        Cost List:
          Bandwidth Capacity:  1Gi
          Destination:         z3
          Network Cost:        10
        Origin:                z4
      Topology Key:            topology.kubernetes.io/zone
Events:                        <none>
kubectl get crds
NAME                                                  CREATED AT
appgroups.appgroup.diktyo.x-k8s.io                    2023-04-25T23:13:48Z
bgpconfigurations.crd.projectcalico.org               2023-04-25T22:58:20Z
bgppeers.crd.projectcalico.org                        2023-04-25T22:58:20Z
blockaffinities.crd.projectcalico.org                 2023-04-25T22:58:20Z
caliconodestatuses.crd.projectcalico.org              2023-04-25T22:58:20Z
clusterinformations.crd.projectcalico.org             2023-04-25T22:58:20Z
elasticquotas.scheduling.x-k8s.io                     2023-04-25T23:13:48Z
felixconfigurations.crd.projectcalico.org             2023-04-25T22:58:20Z
globalnetworkpolicies.crd.projectcalico.org           2023-04-25T22:58:20Z
globalnetworksets.crd.projectcalico.org               2023-04-25T22:58:20Z
hostendpoints.crd.projectcalico.org                   2023-04-25T22:58:20Z
ipamblocks.crd.projectcalico.org                      2023-04-25T22:58:20Z
ipamconfigs.crd.projectcalico.org                     2023-04-25T22:58:20Z
ipamhandles.crd.projectcalico.org                     2023-04-25T22:58:20Z
ippools.crd.projectcalico.org                         2023-04-25T22:58:20Z
ipreservations.crd.projectcalico.org                  2023-04-25T22:58:20Z
kubecontrollersconfigurations.crd.projectcalico.org   2023-04-25T22:58:20Z
networkpolicies.crd.projectcalico.org                 2023-04-25T22:58:20Z
networksets.crd.projectcalico.org                     2023-04-25T22:58:20Z
networktopologies.networktopology.diktyo.x-k8s.io     2023-04-25T23:13:48Z
noderesourcetopologies.topology.node.k8s.io           2023-04-25T23:13:48Z
podgroups.scheduling.x-k8s.io                         2023-04-25T23:13:48Z
jpedro1992 commented 1 year ago

Hi @dyyfk,

Everything seems fine to me, not sure why the networkTopology controller is failing. AppGroup seems to be working fine. Have you removed previous versions of the APIs before submitting the new version?

I will double check the deployment for the new version and see what might be happening.

Meanwhile, you could use AppGroup (v0.0.9-alpha) and NetworkTopology (v0.0.8-alpha) with these two images for the controllers: appGroup and networkTopology. This version was working on v1.24.4 with scheduler-plugins v0.24.9.

dyyfk commented 1 year ago

Hi, @jpedro1992, I appreciate your help. But I would prefer to use it on the latest version since the intended use case is to deploy this scheduler on the latest Kubernetes version.

I also tried deploying the online-boutique example using the yaml file in this thread. But it still has the same problem saying that it could not find the corresponding appgroup.

I would appreciate it if you could try to reproduce the errors I face using the latest build. Thanks again for all your time and efforts!

Here is a list of steps that I used:

Minikube

sudo usermod -aG docker $USER && newgrp docker minikube start --network-plugin=cni --cni=calico --nodes=4 --insecure-registry "10.0.0.0/24" --kubernetes-version=v1.25.7 minikube addons enable registry

Docker

// start a local registry docker run --rm -it --network=host alpine ash -c "apk add socat && socat TCP-LISTEN:5000,reuseaddr,fork TCP:$(minikube ip):5000"

make-local image docker push localhost:5000/appgroup-controller/controller:latest docker push localhost:5000/network-topology-controller/controller:latest

Kubectl (Inside scheduler-plugins/manifests)

CRD

kubectl apply -f crds/

// Uncomment the necessary section of network-aware-controllers in all-in-one.yaml first kubectl apply -f install/all-in-one.yaml

Controller

kubectl apply -f networktopology/networktopology-controller-deployment.yaml kubectl apply -f appgroup/appgroup-controller-deployment.yaml

// Change the image in deploy.yaml to

Scheduler

kubectl label nodes node-role.kubernetes.io/master="" kubectl label nodes topology.kubernetes.io/region="us-west-1" kubectl label nodes topology.kubernetes.io/zone="z1"

// copy /etc/kubernetes/scheduler.conf

kubectl apply -f scheduler-configmap-v1beta3.yaml kubectl apply -f deploy.yaml

dyyfk commented 1 year ago

Hi, @jpedro1992. I would like to mention another issue I found after the update.

In the latest version, I could not get the scheduler running, even though kubectl tells the deployment has been configured.

Latest Version

ubuntu@ip-172-31-11-54:~/scheduler-plugins/manifests/networktopology$ kubectl apply -f scheduler-configmap-v1beta3.yaml
configmap/network-aware-scheduler-config configured
ubuntu@ip-172-31-11-54:~/scheduler-plugins/manifests/networktopology$ kubectl apply -f deploy.yaml
deployment.apps/network-aware-scheduler configured
ubuntu@ip-172-31-11-54:~/scheduler-plugins/manifests/networktopology$ kubectl get pods --all-namespaces
NAMESPACE                   NAME                                            READY   STATUS    RESTARTS      AGE
default                     p1-66994f6978-f99gv                             0/1     Pending   0             23h
kube-system                 calico-kube-controllers-798cc86c47-7p4x8        1/1     Running   1 (24h ago)   24h
kube-system                 calico-node-hqh8r                               1/1     Running   0             24h
kube-system                 calico-node-mc47f                               1/1     Running   0             24h
kube-system                 calico-node-t5tx6                               1/1     Running   0             24h
kube-system                 calico-node-wzl5l                               1/1     Running   0             24h
kube-system                 coredns-565d847f94-bvm4z                        1/1     Running   1 (24h ago)   24h
kube-system                 etcd-minikube                                   1/1     Running   0             24h
kube-system                 kube-apiserver-minikube                         1/1     Running   0             24h
kube-system                 kube-controller-manager-minikube                1/1     Running   0             24h
kube-system                 kube-proxy-ddp8h                                1/1     Running   0             24h
kube-system                 kube-proxy-gzxcw                                1/1     Running   0             24h
kube-system                 kube-proxy-t64hs                                1/1     Running   0             24h
kube-system                 kube-proxy-z2d6c                                1/1     Running   0             24h
kube-system                 kube-scheduler-minikube                         1/1     Running   0             24h
kube-system                 registry-gpjsn                                  1/1     Running   0             24h
kube-system                 registry-proxy-5684v                            1/1     Running   0             24h
kube-system                 registry-proxy-7bwcg                            1/1     Running   0             24h
kube-system                 registry-proxy-bvhj4                            1/1     Running   0             24h
kube-system                 registry-proxy-cdlft                            1/1     Running   0             24h
kube-system                 storage-provisioner                             1/1     Running   1 (24h ago)   24h
network-aware-controllers   appgroup-controller-5fb544569c-tnmpr            1/1     Running   0             23h
network-aware-controllers   networktopology-controller-67b5fc85bf-rj5nw     1/1     Running   0             23h
scheduler-plugins           scheduler-plugins-controller-5d97947dd8-hddfq   1/1     Running   4 (23h ago)   23h

Old Version

default                     p1-fb8975b8d-vm9b8                              1/1     Running   0               2d23h
default                     p2-66c74dbbdc-rvvf9                             1/1     Running   0               2d23h
default                     p3-75f556cc47-dcdnj                             1/1     Running   0               2d23h
kube-system                 calico-kube-controllers-798cc86c47-89g8h        1/1     Running   3 (5d11h ago)   5d11h
kube-system                 calico-node-2gnwz                               1/1     Running   0               5d11h
kube-system                 calico-node-49wcq                               1/1     Running   0               5d11h
kube-system                 calico-node-89h4b                               1/1     Running   0               5d11h
kube-system                 calico-node-cp2fr                               1/1     Running   0               5d11h
kube-system                 calico-node-dv2bv                               1/1     Running   0               5d11h
kube-system                 calico-node-hhkpg                               1/1     Running   0               5d11h
kube-system                 calico-node-lkshk                               1/1     Running   0               5d11h
kube-system                 calico-node-lr4zk                               1/1     Running   0               5d11h
kube-system                 coredns-565d847f94-ggkhh                        1/1     Running   1 (5d11h ago)   5d11h
kube-system                 etcd-minikube                                   1/1     Running   0               5d11h
kube-system                 kube-apiserver-minikube                         1/1     Running   0               5d11h
kube-system                 kube-controller-manager-minikube                1/1     Running   0               5d11h
kube-system                 kube-proxy-5zccp                                1/1     Running   0               5d11h
kube-system                 kube-proxy-6p6dd                                1/1     Running   0               5d11h
kube-system                 kube-proxy-9rhrs                                1/1     Running   0               5d11h
kube-system                 kube-proxy-g5825                                1/1     Running   0               5d11h
kube-system                 kube-proxy-kl4p7                                1/1     Running   0               5d11h
kube-system                 kube-proxy-lr4rt                                1/1     Running   0               5d11h
kube-system                 kube-proxy-sbh6q                                1/1     Running   0               5d11h
kube-system                 kube-proxy-t9nqx                                1/1     Running   0               5d11h
kube-system                 kube-scheduler-minikube                         1/1     Running   0               5d11h
kube-system                 network-aware-scheduler-5ffc766dd9-tk88r        1/1     Running   0               5d
kube-system                 storage-provisioner                             1/1     Running   1 (5d11h ago)   5d11h
network-aware-controllers   appgroup-controller-5fb544569c-l4856            1/1     Running   0               5d10h
network-aware-controllers   networktopology-controller-67b5fc85bf-m4qqg     1/1     Running   0               5d10h
scheduler-plugins           scheduler-plugins-controller-5d97947dd8-svvb8   1/1     Running   0               5d10h
jpedro1992 commented 1 year ago

Hi @dyyfk, thank you for the detailed description!

I went over it and indeed I get the exact same errors as you do, however, I am able to deploy the pods successfully. AppGroup seems fine but networkTopology seems to fail retrieving the AppGroup and converting the costs, but in my case, everything is fine despite that. Please see the logs below.

I will investigate this further, but it can be an error due to previous versions of the containers still be running with a different API version. Please see here.

Please check your rbac rules since I found a few manifests that need to be updated to x-k8s instead of k8s. Example these ones need to be updated. I will create an issue and a PR for this.

AppGroup Controller:

kubectl logs appgroup-controller-cd49d4546-9dlkh -n network-aware-controllers
W0428 12:00:15.364023       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0428 12:00:15.364751       1 appgroup.go:104] "Starting App Group controller"
I0428 12:00:15.465560       1 appgroup.go:111] "App Group sync finished"

NetworkTopology Controller:

kubectl logs networktopology-controller-5fb64c7769-wbhkf -n network-aware-controllers                                   W0428 11:57:40.141750       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0428 11:57:40.142516       1 networktopology.go:147] "Starting Network Topology controller"
I0428 11:57:40.242723       1 networktopology.go:155] "Network Topology sync finished"
I0428 11:57:40.242775       1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
I0428 11:57:40.242788       1 networktopology.go:765] "NetworkTopology SyncHandler: Update costs in the network graph... "
E0428 11:57:40.242813       1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.242869       1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.242953       1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243057       1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243176       1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243263       1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243273       1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243287       1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243313       1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243340       1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243368       1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243400       1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243434       1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243467       1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243571       1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243762       1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.243976       1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
E0428 11:57:40.244084       1 networktopology.go:791] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
I0428 11:57:40.284531       1 networktopology.go:635] ConfigMap netperf-metrics retrieved...

NetworkTopology CRD:

kubectl describe networktopologies
Name:         net-topology-test
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  networktopology.diktyo.x-k8s.io/v1alpha1
Kind:         NetworkTopology
Metadata:
  Creation Timestamp:  2023-04-28T11:41:08Z
  Generation:          3
  Managed Fields:
    API Version:  networktopology.diktyo.x-k8s.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:configmapName:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2023-04-28T11:41:08Z
    API Version:  networktopology.diktyo.x-k8s.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:weights:
      f:status:
        .:
        f:nodeCount:
        f:weightCalculationTime:
    Manager:         controller
    Operation:       Update
    Time:            2023-04-28T11:41:09Z
  Resource Version:  18425531
  UID:               b1b147ce-4aa7-43f0-8bb9-9ea4ed5eef50
Spec:
  Configmap Name:  netperf-metrics
  Weights:
    Name:  UserDefined
    Topology List:
      Origin List:
        Origin:      cloud
      Topology Key:  topology.kubernetes.io/region
      Origin List:
        Origin:      z1
        Origin:      z2
        Origin:      z3
        Origin:      z4
        Origin:      z5
        Origin:      z6
        Origin:      z7
        Origin:      z8
        Origin:      z9
        Origin:      z10
      Topology Key:  topology.kubernetes.io/zone
    Name:            NetperfCosts
    Topology List:
      Origin List:
        Cost List:
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          cloud
          Network Cost:         28
        Origin:
        Cost List:
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:
          Network Cost:         28
        Origin:                 cloud
      Topology Key:             topology.kubernetes.io/region
      Origin List:
        Origin:
        Cost List:
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z10
          Network Cost:         76
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z2
          Network Cost:         77
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z3
          Network Cost:         68
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z4
          Network Cost:         77
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z6
          Network Cost:         49
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z7
          Network Cost:         49
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z8
          Network Cost:         74
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z9
          Network Cost:         79
        Origin:                 z1
        Cost List:
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z1
          Network Cost:         76
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z2
          Network Cost:         55
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z3
          Network Cost:         46
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z4
          Network Cost:         55
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z6
          Network Cost:         27
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z7
          Network Cost:         27
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z8
          Network Cost:         52
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z9
          Network Cost:         57
        Origin:                 z10
        Cost List:
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z1
          Network Cost:         77
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z10
          Network Cost:         55
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z3
          Network Cost:         47
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z4
          Network Cost:         56
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z6
          Network Cost:         28
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z7
          Network Cost:         28
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z8
          Network Cost:         53
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z9
          Network Cost:         58
        Origin:                 z2
        Cost List:
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z1
          Network Cost:         68
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z10
          Network Cost:         46
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z2
          Network Cost:         47
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z4
          Network Cost:         47
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z6
          Network Cost:         19
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z7
          Network Cost:         19
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z8
          Network Cost:         44
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z9
          Network Cost:         49
        Origin:                 z3
        Cost List:
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z1
          Network Cost:         77
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z10
          Network Cost:         55
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z2
          Network Cost:         56
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z3
          Network Cost:         47
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z6
          Network Cost:         28
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z7
          Network Cost:         28
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z8
          Network Cost:         53
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z9
          Network Cost:         58
        Origin:                 z4
        Cost List:
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z1
          Network Cost:         49
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z10
          Network Cost:         27
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z2
          Network Cost:         28
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z3
          Network Cost:         19
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z4
          Network Cost:         28
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z7
          Network Cost:         0
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z8
          Network Cost:         25
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z9
          Network Cost:         30
        Origin:                 z6
        Cost List:
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z1
          Network Cost:         49
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z10
          Network Cost:         27
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z2
          Network Cost:         28
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z3
          Network Cost:         19
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z4
          Network Cost:         28
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z6
          Network Cost:         0
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z8
          Network Cost:         25
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z9
          Network Cost:         30
        Origin:                 z7
        Cost List:
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z1
          Network Cost:         74
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z10
          Network Cost:         52
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z2
          Network Cost:         53
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z3
          Network Cost:         44
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z4
          Network Cost:         53
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z6
          Network Cost:         25
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z7
          Network Cost:         25
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z9
          Network Cost:         55
        Origin:                 z8
        Cost List:
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z1
          Network Cost:         79
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z10
          Network Cost:         57
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z2
          Network Cost:         58
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z3
          Network Cost:         49
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z4
          Network Cost:         58
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z6
          Network Cost:         30
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z7
          Network Cost:         30
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          z8
          Network Cost:         55
        Origin:                 z9
      Topology Key:             topology.kubernetes.io/zone
Status:
  Node Count:               10
  Weight Calculation Time:  2023-04-28T11:57:40Z
Events:                     <none>

Small part of the scheduler logs:

I0428 12:04:50.043577       1 eventhandlers.go:116] "Add event for unscheduled pod" pod="default/recommendationservice-84df96db59-6zz8q"
I0428 12:04:50.043630       1 scheduling_queue.go:957] "About to try and schedule pod" pod="default/recommendationservice-84df96db59-6zz8q"
I0428 12:04:50.043644       1 schedule_one.go:85] "Attempting to schedule pod" pod="default/recommendationservice-84df96db59-6zz8q"
I0428 12:04:50.043696       1 networkoverhead.go:612] "namespaces: %s" [default]="(MISSING)"
I0428 12:04:50.043713       1 networkoverhead.go:614] "appGroup CR" namespace="default" ag.lister=&{indexer:0xc0007910c8}
I0428 12:04:50.043723       1 networkoverhead.go:629] "namespaces: %s" [default]="(MISSING)"
I0428 12:04:50.043736       1 networkoverhead.go:631] "networkTopology CR:" namespace="default" nt.lister=&{indexer:0xc000a08090}
I0428 12:04:50.044779       1 networkoverhead.go:242] "Node info" name="n1.ml.ilabt-imec-be.wall2.ilabt.iminds.be" region="cloud" zone="z1"
I0428 12:04:50.044845       1 networkoverhead.go:249] "Map" costMap=map[{Origin:cloud Destination:}:28 {Origin:z1 Destination:z10}:76 {Origin:z1 Destination:z2}:77 {Origin:z1 Destination:z3}:68 {Origin:z1 Destination:z4}:77 {Origin:z1 Destination:z6}:49 {Origin:z1 Destination:z7}:49 {Origin:z1 Destination:z8}:74 {Origin:z1 Destination:z9}:79]
I0428 12:04:50.044876       1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=6
I0428 12:04:50.044902       1 networkoverhead.go:270] "Node final cost" cost=600
I0428 12:04:50.044922       1 networkoverhead.go:242] "Node info" name="n2.ml.ilabt-imec-be.wall2.ilabt.iminds.be" region="cloud" zone="z2"
I0428 12:04:50.044967       1 networkoverhead.go:249] "Map" costMap=map[{Origin:cloud Destination:}:28 {Origin:z2 Destination:z1}:77 {Origin:z2 Destination:z10}:55 {Origin:z2 Destination:z3}:47 {Origin:z2 Destination:z4}:56 {Origin:z2 Destination:z6}:28 {Origin:z2 Destination:z7}:28 {Origin:z2 Destination:z8}:53 {Origin:z2 Destination:z9}:58]
I0428 12:04:50.044992       1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=6
I0428 12:04:50.045027       1 networkoverhead.go:270] "Node final cost" cost=600
I0428 12:04:50.045049       1 networkoverhead.go:242] "Node info" name="n4.ml.ilabt-imec-be.wall2.ilabt.iminds.be" region="cloud" zone="z4"
I0428 12:04:50.045087       1 networkoverhead.go:249] "Map" costMap=map[{Origin:cloud Destination:}:28 {Origin:z4 Destination:z1}:77 {Origin:z4 Destination:z10}:55 {Origin:z4 Destination:z2}:56 {Origin:z4 Destination:z3}:47 {Origin:z4 Destination:z6}:28 {Origin:z4 Destination:z7}:28 {Origin:z4 Destination:z8}:53 {Origin:z4 Destination:z9}:58]
I0428 12:04:50.045114       1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=6
I0428 12:04:50.045137       1 networkoverhead.go:270] "Node final cost" cost=600
I0428 12:04:50.045155       1 networkoverhead.go:242] "Node info" name="n10.ml.ilabt-imec-be.wall2.ilabt.iminds.be" region="cloud" zone="z10"
I0428 12:04:50.045194       1 networkoverhead.go:249] "Map" costMap=map[{Origin:cloud Destination:}:28 {Origin:z10 Destination:z1}:76 {Origin:z10 Destination:z2}:55 {Origin:z10 Destination:z3}:46 {Origin:z10 Destination:z4}:55 {Origin:z10 Destination:z6}:27 {Origin:z10 Destination:z7}:27 {Origin:z10 Destination:z8}:52 {Origin:z10 Destination:z9}:57]
I0428 12:04:50.045219       1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=6
I0428 12:04:50.045243       1 networkoverhead.go:270] "Node final cost" cost=600
I0428 12:04:50.045263       1 networkoverhead.go:242] "Node info" name="n6.ml.ilabt-imec-be.wall2.ilabt.iminds.be" region="cloud" zone="z6"
I0428 12:04:50.045302       1 networkoverhead.go:249] "Map" costMap=map[{Origin:cloud Destination:}:28 {Origin:z6 Destination:z1}:49 {Origin:z6 Destination:z10}:27 {Origin:z6 Destination:z2}:28 {Origin:z6 Destination:z3}:19 {Origin:z6 Destination:z4}:28 {Origin:z6 Destination:z7}:0 {Origin:z6 Destination:z8}:25 {Origin:z6 Destination:z9}:30]
I0428 12:04:50.045329       1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=6
I0428 12:04:50.045350       1 networkoverhead.go:270] "Node final cost" cost=600
I0428 12:04:50.045367       1 networkoverhead.go:242] "Node info" name="n5.ml.ilabt-imec-be.wall2.ilabt.iminds.be" region="" zone=""
I0428 12:04:50.045384       1 networkoverhead.go:249] "Map" costMap=map[]
I0428 12:04:50.045407       1 networkoverhead.go:263] "Number of dependencies" satisfied=6 violated=0
I0428 12:04:50.045427       1 networkoverhead.go:270] "Node final cost" cost=0
I0428 12:04:50.045446       1 networkoverhead.go:242] "Node info" name="n7.ml.ilabt-imec-be.wall2.ilabt.iminds.be" region="cloud" zone="z7"
I0428 12:04:50.045482       1 networkoverhead.go:249] "Map" costMap=map[{Origin:cloud Destination:}:28 {Origin:z7 Destination:z1}:49 {Origin:z7 Destination:z10}:27 {Origin:z7 Destination:z2}:28 {Origin:z7 Destination:z3}:19 {Origin:z7 Destination:z4}:28 {Origin:z7 Destination:z6}:0 {Origin:z7 Destination:z8}:25 {Origin:z7 Destination:z9}:30]
I0428 12:04:50.045505       1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=6
I0428 12:04:50.045526       1 networkoverhead.go:270] "Node final cost" cost=600
I0428 12:04:50.045544       1 networkoverhead.go:242] "Node info" name="n9.ml.ilabt-imec-be.wall2.ilabt.iminds.be" region="cloud" zone="z9"
I0428 12:04:50.045581       1 networkoverhead.go:249] "Map" costMap=map[{Origin:cloud Destination:}:28 {Origin:z9 Destination:z1}:79 {Origin:z9 Destination:z10}:57 {Origin:z9 Destination:z2}:58 {Origin:z9 Destination:z3}:49 {Origin:z9 Destination:z4}:58 {Origin:z9 Destination:z6}:30 {Origin:z9 Destination:z7}:30 {Origin:z9 Destination:z8}:55]
I0428 12:04:50.045602       1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=6
I0428 12:04:50.045623       1 networkoverhead.go:270] "Node final cost" cost=600
I0428 12:04:50.045641       1 networkoverhead.go:242] "Node info" name="n8.ml.ilabt-imec-be.wall2.ilabt.iminds.be" region="cloud" zone="z8"
I0428 12:04:50.045688       1 networkoverhead.go:249] "Map" costMap=map[{Origin:cloud Destination:}:28 {Origin:z8 Destination:z1}:74 {Origin:z8 Destination:z10}:52 {Origin:z8 Destination:z2}:53 {Origin:z8 Destination:z3}:44 {Origin:z8 Destination:z4}:53 {Origin:z8 Destination:z6}:25 {Origin:z8 Destination:z7}:25 {Origin:z8 Destination:z9}:55]
I0428 12:04:50.045716       1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=6
I0428 12:04:50.045736       1 networkoverhead.go:270] "Node final cost" cost=600
I0428 12:04:50.045756       1 networkoverhead.go:242] "Node info" name="n3.ml.ilabt-imec-be.wall2.ilabt.iminds.be" region="cloud" zone="z3"
I0428 12:04:50.045796       1 networkoverhead.go:249] "Map" costMap=map[{Origin:cloud Destination:}:28 {Origin:z3 Destination:z1}:68 {Origin:z3 Destination:z10}:46 {Origin:z3 Destination:z2}:47 {Origin:z3 Destination:z4}:47 {Origin:z3 Destination:z6}:19 {Origin:z3 Destination:z7}:19 {Origin:z3 Destination:z8}:44 {Origin:z3 Destination:z9}:49]
I0428 12:04:50.045815       1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=6
I0428 12:04:50.045836       1 networkoverhead.go:270] "Node final cost" cost=600
I0428 12:04:50.046202       1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=6
I0428 12:04:50.046289       1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=6
I0428 12:04:50.046328       1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=6
I0428 12:04:50.046384       1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=6
I0428 12:04:50.046421       1 networkoverhead.go:331] "Number of dependencies:" satisfied=6 violated=0
I0428 12:04:50.046475       1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=6
I0428 12:04:50.046500       1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=6
I0428 12:04:50.046511       1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=6
I0428 12:04:50.046564       1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=6
I0428 12:04:50.046389       1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=6
I0428 12:04:50.046827       1 default_binder.go:52] "Attempting to bind pod to node" pod="default/recommendationservice-84df96db59-6zz8q" node="n5.ml.ilabt-imec-be.wall2.ilabt.iminds.be"
I0428 12:04:50.046951       1 request.go:1170] Request Body:
00000000  6b 38 73 00 0a 0d 0a 02  76 31 12 07 42 69 6e 64  |k8s.....v1..Bind|
00000010  69 6e 67 12 a0 01 0a 61  0a 26 72 65 63 6f 6d 6d  |ing....a.&recomm|
00000020  65 6e 64 61 74 69 6f 6e  73 65 72 76 69 63 65 2d  |endationservice-|
00000030  38 34 64 66 39 36 64 62  35 39 2d 36 7a 7a 38 71  |84df96db59-6zz8q|
00000040  12 00 1a 07 64 65 66 61  75 6c 74 22 00 2a 24 62  |....default".*$b|
00000050  37 35 65 37 66 39 36 2d  34 33 35 32 2d 34 33 32  |75e7f96-4352-432|
00000060  34 2d 62 63 30 63 2d 30  63 32 64 37 66 34 63 34  |4-bc0c-0c2d7f4c4|
00000070  65 35 38 32 00 38 00 42  00 12 3b 0a 04 4e 6f 64  |e582.8.B..;..Nod|
00000080  65 12 00 1a 29 6e 35 2e  6d 6c 2e 69 6c 61 62 74  |e...)n5.ml.ilabt|
00000090  2d 69 6d 65 63 2d 62 65  2e 77 61 6c 6c 32 2e 69  |-imec-be.wall2.i|
000000a0  6c 61 62 74 2e 69 6d 69  6e 64 73 2e 62 65 22 00  |labt.iminds.be".|
000000b0  2a 00 32 00 3a 00 1a 00  22 00                    |*.2.:...".|
I0428 12:04:50.047065       1 round_trippers.go:466] curl -v -XPOST  -H "Accept: application/vnd.kubernetes.protobuf, */*" -H "Content-Type: application/vnd.kubernetes.protobuf" -H "User-Agent: kube-scheduler/v0.0.0 (linux/amd64) kubernetes/$Format/scheduler" 'https://10.2.35.65:6443/api/v1/namespaces/default/pods/recommendationservice-84df96db59-6zz8q/binding'
I0428 12:04:50.050738       1 eventhandlers.go:159] "Delete event for unscheduled pod" pod="default/recommendationservice-84df96db59-6zz8q"
I0428 12:04:50.050785       1 eventhandlers.go:184] "Add event for scheduled pod" pod="default/recommendationservice-84df96db59-6zz8q"
I0428 12:04:50.051720       1 round_trippers.go:553] POST https://10.2.35.65:6443/api/v1/namespaces/default/pods/recommendationservice-84df96db59-6zz8q/binding 201 Created in 4 milliseconds

Deployed Pods:

kubectl get pods
NAME                                       READY   STATUS    RESTARTS      AGE
adservice-c4cbc7c8b-fl49q                  0/1     Running   0             44s
adservice-c4cbc7c8b-m7gjs                  0/1     Running   0             44s
adservice-c4cbc7c8b-sstc9                  0/1     Running   0             44s
cartservice-6df55f79c4-2mt9h               1/1     Running   1 (35s ago)   44s
cartservice-6df55f79c4-874w7               1/1     Running   1 (37s ago)   44s
cartservice-6df55f79c4-cnlvw               1/1     Running   1 (37s ago)   44s
checkoutservice-784797b87-7pjnf            1/1     Running   0             44s
checkoutservice-784797b87-f98qd            1/1     Running   0             44s
checkoutservice-784797b87-fjvqt            1/1     Running   0             44s
currencyservice-8f567b44f-2lc5t            1/1     Running   0             44s
currencyservice-8f567b44f-gt79t            1/1     Running   0             44s
currencyservice-8f567b44f-xk864            1/1     Running   0             44s
emailservice-deployment-86885f8f64-bnlr9   1/1     Running   0             44s
emailservice-deployment-86885f8f64-n9nt7   1/1     Running   0             44s
emailservice-deployment-86885f8f64-v7tfq   1/1     Running   0             44s
frontend-788886bf4d-2sgkz                  1/1     Running   0             44s
frontend-788886bf4d-9vgpw                  1/1     Running   0             43s
frontend-788886bf4d-n4qqh                  1/1     Running   0             43s
paymentservice-dcb658d-2fkbq               1/1     Running   0             44s
paymentservice-dcb658d-t6djk               1/1     Running   0             43s
paymentservice-dcb658d-xlpcr               1/1     Running   0             43s
productcatalogservice-5d4c7fc654-gw9mn     1/1     Running   0             43s
productcatalogservice-5d4c7fc654-qxrgj     1/1     Running   0             43s
productcatalogservice-5d4c7fc654-rfgks     1/1     Running   0             43s
recommendationservice-84df96db59-6zz8q     1/1     Running   0             42s
recommendationservice-84df96db59-fhbn2     1/1     Running   0             42s
recommendationservice-84df96db59-pv824     1/1     Running   0             43s
redis-cart-775cd7cb9d-9g6c9                1/1     Running   0             43s
redis-cart-775cd7cb9d-c4hsd                1/1     Running   0             42s
redis-cart-775cd7cb9d-jcpj8                1/1     Running   0             42s
shippingservice-5cc9965bd-bmzhr            1/1     Running   0             43s
shippingservice-5cc9965bd-c97ss            1/1     Running   0             42s
shippingservice-5cc9965bd-jw28f            1/1     Running   0             42s
jpedro1992 commented 1 year ago

Hi @dyyfk,

I restarted my cluster from scratch (kubeadm). The errors disappear after that.

Something gets broken in the cluster even when previous versions of the CRD and containers are deleted. Not exactly sure why. If you restart your minikube cluster, everything should be fine after that.

Sorry for the extra hassle!

kubectl logs networktopology-controller-5fb64c7769-b5j8n -n network-aware-controllers
W0428 12:58:25.009875       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0428 12:58:25.012995       1 networktopology.go:147] "Starting Network Topology controller"
I0428 12:58:25.113796       1 networktopology.go:155] "Network Topology sync finished"
I0428 13:16:22.869089       1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
I0428 13:16:22.869173       1 networktopology.go:765] "NetworkTopology SyncHandler: Update costs in the network graph... "
I0428 13:16:22.877379       1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
I0428 13:16:32.071381       1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
I0428 13:22:10.672740       1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
I0428 13:22:10.672791       1 networktopology.go:765] "NetworkTopology SyncHandler: Update costs in the network graph... "
I0428 13:22:10.733714       1 networktopology.go:635] ConfigMap netperf-metrics retrieved... 
dyyfk commented 1 year ago

Hi, @jpedro1992 Thanks again for your time! I have tried starting from scratch using the latest deployment in my minikube cluster. But the same error still persists.

I also want to let you know that my scheduler's log is different from yours. In particular, it does not show anything when I am trying to schedule a pod using the network-aware-scheduler. I think this is what the problem is.

ubuntu@ip-172-31-11-54:~$ kubectl logs -f network-aware-scheduler-567b9b9b89-m4qf9 -n kube-system
I0428 21:13:46.885430       1 serving.go:348] Generated self-signed cert in-memory
I0428 21:13:46.886756       1 configfile.go:59] "KubeSchedulerConfiguration v1beta3 is deprecated in v1.26, will be removed in v1.29"
I0428 21:13:47.501657       1 server.go:152] "Starting Kubernetes Scheduler" version="v0.25.7"
I0428 21:13:47.501690       1 server.go:154] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0428 21:13:47.506596       1 secure_serving.go:210] Serving securely on [::]:10259
I0428 21:13:47.506694       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0428 21:13:47.507430       1 shared_informer.go:273] Waiting for caches to sync for RequestHeaderAuthRequestController
I0428 21:13:47.507031       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0428 21:13:47.507579       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0428 21:13:47.507758       1 shared_informer.go:273] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0428 21:13:47.507562       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0428 21:13:47.508169       1 shared_informer.go:273] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0428 21:13:47.607787       1 shared_informer.go:280] Caches are synced for RequestHeaderAuthRequestController
I0428 21:13:47.608219       1 shared_informer.go:280] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0428 21:13:47.607902       1 shared_informer.go:280] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file

I want to confirm how did you start the scheduler? I start the scheduler as the following

kubectl apply -f scheduler-plugins/manifests/appgroup/scheduler-configmap-v1beta3.yaml/
kubectl apply -f scheduler-plugins/manifests/appgroup/deploy.yaml

In addition, my network controller does not have anything in the following

I0428 13:16:22.869089       1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
I0428 13:16:22.869173       1 networktopology.go:765] "NetworkTopology SyncHandler: Update costs in the network graph... "
I0428 13:16:22.877379       1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
I0428 13:16:32.071381       1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
I0428 13:22:10.672740       1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
I0428 13:22:10.672791       1 networktopology.go:765] "NetworkTopology SyncHandler: Update costs in the network graph... "
I0428 13:22:10.733714       1 networktopology.go:635] ConfigMap netperf-metrics retrieved... ```

I think it could be that I did not use this netperf component. Instead, I used the example.crd for a hardcoded network topology. But in theory, they should not have an impact on the network-topology-controller. Is my assumption correct?

kubectl get configmap --all-namespaces
NAMESPACE                   NAME                                 DATA   AGE
default                     kube-root-ca.crt                     1      10h
kube-node-lease             kube-root-ca.crt                     1      10h
kube-public                 cluster-info                         8      10h
kube-public                 kube-root-ca.crt                     1      10h
kube-system                 calico-config                        4      10h
kube-system                 coredns                              1      10h
kube-system                 extension-apiserver-authentication   6      10h
kube-system                 kube-proxy                           2      10h
kube-system                 kube-root-ca.crt                     1      10h
kube-system                 kubeadm-config                       1      10h
kube-system                 kubelet-config                       1      10h
kube-system                 network-aware-scheduler-config       1      10h
network-aware-controllers   kube-root-ca.crt                     1      10h
scheduler-plugins           kube-root-ca.crt                     1      10h
dyyfk commented 1 year ago

Hi @jpedro1992, I now used the netperf component on a eks cluster on aws. and I am getting the same error as yours. However, my network topologies's costs are all 0s.

ubuntu@ip-172-31-6-60:~/networktopology-api/manifests$ kubectl describe networktopologies
Name:         net-topology-test
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  networktopology.diktyo.x-k8s.io/v1alpha1
Kind:         NetworkTopology
Metadata:
  Creation Timestamp:  2023-04-30T23:16:41Z
  Generation:          2
  Resource Version:    3509558
  UID:                 f1165d12-4b36-4954-a89f-2b8af7af889a
Spec:
  Configmap Name:  netperf-metrics
  Weights:
    Name:  UserDefined
    Topology List:
      Origin List:
        Origin:      us-west-2
      Topology Key:  topology.kubernetes.io/region
      Origin List:
        Origin:      us-west-2b-1
        Origin:      us-west-2b-2
        Origin:      us-west-2b-3
        Origin:      us-west-2b-4
      Topology Key:  topology.kubernetes.io/zone
    Name:            NetperfCosts
    Topology List:
      Origin List:
        Origin:      us-west-2
      Topology Key:  topology.kubernetes.io/region
      Origin List:
        Cost List:
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          us-west-2b-2
          Network Cost:         0
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          us-west-2b-3
          Network Cost:         0
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          us-west-2b-4
          Network Cost:         0
        Origin:                 us-west-2b-1
        Cost List:
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          us-west-2b-1
          Network Cost:         0
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          us-west-2b-3
          Network Cost:         0
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          us-west-2b-4
          Network Cost:         0
        Origin:                 us-west-2b-2
        Cost List:
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          us-west-2b-1
          Network Cost:         0
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          us-west-2b-2
          Network Cost:         0
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          us-west-2b-4
          Network Cost:         0
        Origin:                 us-west-2b-3
        Cost List:
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          us-west-2b-1
          Network Cost:         0
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          us-west-2b-2
          Network Cost:         0
          Bandwidth Allocated:  0
          Bandwidth Capacity:   1G
          Destination:          us-west-2b-3
          Network Cost:         0
        Origin:                 us-west-2b-4
      Topology Key:             topology.kubernetes.io/zone
Status:
  Node Count:               4
  Weight Calculation Time:  2023-04-30T23:16:41Z
Events:                     <none>

For the appgroupcontroller, I am using image v1.0.3-alpha For the networkcontroller, I am using the debug image pulled from dockerhub

I0430 23:16:41.608153       1 networktopology.go:635] ConfigMap netperf-metrics retrieved...
I0430 23:16:41.608197       1 networktopology.go:765] "NetworkTopology SyncHandler: Update costs in the network graph... "
I0430 23:16:41.608222       1 networktopology.go:782] "N1: %v - N2: %v - Region1: %v - Region2: %v - Zone1: %v - Zone2: %v" ip-10-0-20-110.us-west-2.compute.internal="ip-10-0-20-229.us-west-2.compute.internal" us-west-2="us-west-2" us-west-2b-2="us-west-2b-3"
I0430 23:16:41.608258       1 networktopology.go:787] "Key: %v" netperf.p90.latency.milliseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-20-229.us-west-2.compute.internal="(MISSING)"
I0430 23:16:41.608301       1 networktopology.go:788] "configmap.Data: %v" map[netperf_p90_latency_microseconds.origin.ip-10-0-16-170.us-west-2.compute.internal.destination.ip-10-0-20-110.us-west-2.compute.internal:108 netperf_p90_latency_microseconds.origin.ip-10-0-16-170.us-west-2.compute.internal.destination.ip-10-0-20-229.us-west-2.compute.internal:109 netperf_p90_latency_microseconds.origin.ip-10-0-16-170.us-west-2.compute.internal.destination.ip-10-0-26-106.us-west-2.compute.internal:116 netperf_p90_latency_microseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-16-170.us-west-2.compute.internal:108 netperf_p90_latency_microseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-20-229.us-west-2.compute.internal:81 netperf_p90_latency_microseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-26-106.us-west-2.compute.internal:260 netperf_p90_latency_microseconds.origin.ip-10-0-20-229.us-west-2.compute.internal.destination.ip-10-0-16-170.us-west-2.compute.internal:109 netperf_p90_latency_microseconds.origin.ip-10-0-20-229.us-west-2.compute.internal.destination.ip-10-0-20-110.us-west-2.compute.internal:78 netperf_p90_latency_microseconds.origin.ip-10-0-20-229.us-west-2.compute.internal.destination.ip-10-0-26-106.us-west-2.compute.internal:77 netperf_p90_latency_microseconds.origin.ip-10-0-26-106.us-west-2.compute.internal.destination.ip-10-0-16-170.us-west-2.compute.internal:106 netperf_p90_latency_microseconds.origin.ip-10-0-26-106.us-west-2.compute.internal.destination.ip-10-0-20-110.us-west-2.compute.internal:75 netperf_p90_latency_microseconds.origin.ip-10-0-26-106.us-west-2.compute.internal.destination.ip-10-0-20-229.us-west-2.compute.internal:92]="(MISSING)"
E0430 23:16:41.608322       1 networktopology.go:792] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"
I0430 23:16:41.608334       1 networktopology.go:795] "Cost: %v" %!s(int=0)="(MISSING)"
I0430 23:16:41.608352       1 networktopology.go:782] "N1: %v - N2: %v - Region1: %v - Region2: %v - Zone1: %v - Zone2: %v" ip-10-0-20-110.us-west-2.compute.internal="ip-10-0-26-106.us-west-2.compute.internal" us-west-2="us-west-2" us-west-2b-2="us-west-2b-4"
I0430 23:16:41.608362       1 networktopology.go:787] "Key: %v" netperf.p90.latency.milliseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-26-106.us-west-2.compute.internal="(MISSING)"
I0430 23:16:41.608386       1 networktopology.go:788] "configmap.Data: %v" map[netperf_p90_latency_microseconds.origin.ip-10-0-16-170.us-west-2.compute.internal.destination.ip-10-0-20-110.us-west-2.compute.internal:108 netperf_p90_latency_microseconds.origin.ip-10-0-16-170.us-west-2.compute.internal.destination.ip-10-0-20-229.us-west-2.compute.internal:109 netperf_p90_latency_microseconds.origin.ip-10-0-16-170.us-west-2.compute.internal.destination.ip-10-0-26-106.us-west-2.compute.internal:116 netperf_p90_latency_microseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-16-170.us-west-2.compute.internal:108 netperf_p90_latency_microseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-20-229.us-west-2.compute.internal:81 netperf_p90_latency_microseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-26-106.us-west-2.compute.internal:260 netperf_p90_latency_microseconds.origin.ip-10-0-20-229.us-west-2.compute.internal.destination.ip-10-0-16-170.us-west-2.compute.internal:109 netperf_p90_latency_microseconds.origin.ip-10-0-20-229.us-west-2.compute.internal.destination.ip-10-0-20-110.us-west-2.compute.internal:78 netperf_p90_latency_microseconds.origin.ip-10-0-20-229.us-west-2.compute.internal.destination.ip-10-0-26-106.us-west-2.compute.internal:77 netperf_p90_latency_microseconds.origin.ip-10-0-26-106.us-west-2.compute.internal.destination.ip-10-0-16-170.us-west-2.compute.internal:106 netperf_p90_latency_microseconds.origin.ip-10-0-26-106.us-west-2.compute.internal.destination.ip-10-0-20-110.us-west-2.compute.internal:75 netperf_p90_latency_microseconds.origin.ip-10-0-26-106.us-west-2.compute.internal.destination.ip-10-0-20-229.us-west-2.compute.internal:92]="(MISSING)"
E0430 23:16:41.608404       1 networktopology.go:792] "Error converting cost..." err="strconv.Atoi: parsing \"\": invalid syntax"

However, it looks like the network's metrics has been recorded successfully.

ubuntu@ip-172-31-6-60:~/networktopology-api/manifests$ kubectl describe configmap netperf-metrics
Name:         netperf-metrics
Namespace:    default
Labels:       <none>
Annotations:  <none>

Data
====
netperf_p90_latency_microseconds.origin.ip-10-0-16-170.us-west-2.compute.internal.destination.ip-10-0-20-229.us-west-2.compute.internal:
----
109
netperf_p90_latency_microseconds.origin.ip-10-0-16-170.us-west-2.compute.internal.destination.ip-10-0-26-106.us-west-2.compute.internal:
----
116
netperf_p90_latency_microseconds.origin.ip-10-0-20-229.us-west-2.compute.internal.destination.ip-10-0-16-170.us-west-2.compute.internal:
----
109
netperf_p90_latency_microseconds.origin.ip-10-0-20-229.us-west-2.compute.internal.destination.ip-10-0-26-106.us-west-2.compute.internal:
----
77
netperf_p90_latency_microseconds.origin.ip-10-0-26-106.us-west-2.compute.internal.destination.ip-10-0-16-170.us-west-2.compute.internal:
----
106
netperf_p90_latency_microseconds.origin.ip-10-0-26-106.us-west-2.compute.internal.destination.ip-10-0-20-229.us-west-2.compute.internal:
----
92
netperf_p90_latency_microseconds.origin.ip-10-0-16-170.us-west-2.compute.internal.destination.ip-10-0-20-110.us-west-2.compute.internal:
----
108
netperf_p90_latency_microseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-16-170.us-west-2.compute.internal:
----
108
netperf_p90_latency_microseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-20-229.us-west-2.compute.internal:
----
81
netperf_p90_latency_microseconds.origin.ip-10-0-20-110.us-west-2.compute.internal.destination.ip-10-0-26-106.us-west-2.compute.internal:
----
260
netperf_p90_latency_microseconds.origin.ip-10-0-20-229.us-west-2.compute.internal.destination.ip-10-0-20-110.us-west-2.compute.internal:
----
78
netperf_p90_latency_microseconds.origin.ip-10-0-26-106.us-west-2.compute.internal.destination.ip-10-0-20-110.us-west-2.compute.internal:
----
75

BinaryData
====

Events:  <none>

Here is the crd that I am using:

ubuntu@ip-172-31-6-60:~/networktopology-api/manifests$ cat example.yaml
# Example Network CRD
apiVersion: networktopology.diktyo.x-k8s.io/v1alpha1
kind: NetworkTopology
metadata:
  name: net-topology-test
  namespace: default
spec:
  configmapName: "netperf-metrics"
  weights:
    # Region label: "topology.kubernetes.io/region"
    # Zone Label:   "topology.kubernetes.io/zone"
    # 2 Regions:  us-west-1
    #             us-east-1
    # 4 Zones:    us-west-1: z1, z2
    #             us-east-1: z3, z4
    - name: "UserDefined"
      topologyList: # Define weights between regions or between zones
        - topologyKey: "topology.kubernetes.io/region" # region costs
          originList:
            - origin: "us-west-2"
              costList:
        - topologyKey: "topology.kubernetes.io/zone" # zone costs
          originList:
            - origin: "us-west-2b-1"
              costList:
            - origin: "us-west-2b-2"
              costList:
            - origin: "us-west-2b-3"
              costList:
            - origin: "us-west-2b-4"
              costList:

If I am using the example crd like this, I will get the following errors

ubuntu@ip-172-31-6-60:~/test_deploy$ kubectl logs -f networktopology-controller-56948b9547-gxrwv -n network-aware-controllers
W0430 23:09:34.020217       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0430 23:09:34.020964       1 networktopology.go:147] "Starting Network Topology controller"
I0430 23:09:34.121500       1 networktopology.go:155] "Network Topology sync finished"
E0430 23:14:41.533989       1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.x-k8s.io \"a1\" not found"
E0430 23:14:41.534228       1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.x-k8s.io \"a1\" not found"
E0430 23:14:41.541700       1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.x-k8s.io \"a1\" not found"
jpedro1992 commented 1 year ago

Hi @dyyfk,

I am sharing my deployment files below. You can definitely use the networkTopology CR without the netperfComponent. I guess you might be missing at least a label in pod deployment files or any specific rbac rule, because the scheduler does not recognize the pods.

na.yaml:

apiVersion: v1
kind: Namespace
metadata:
  name: network-aware-controllers

appgroup-controller.yaml:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: appgroup-controller
  namespace: network-aware-controllers
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: appgroup-controller
rules:
- apiGroups: [""]
  resources: ["pods", "nodes", "configmaps"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["appgroup.diktyo.x-k8s.io"]
  resources: ["appgroups"]
  verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: appgroup-controller
subjects:
- kind: ServiceAccount
  name:  appgroup-controller
  namespace: network-aware-controllers
roleRef:
  kind: ClusterRole
  name: appgroup-controller
  apiGroup: rbac.authorization.k8s.io
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: appgroup-controller
  namespace: network-aware-controllers
  labels:
    app: appgroup-controller
spec:
  replicas: 1
  selector:
    matchLabels:
      app: appgroup-controller
  template:
    metadata:
      labels:
        app: appgroup-controller
    spec:
      serviceAccountName: appgroup-controller
      containers:
      - name: appgroup-controller
        image: jpedro1992/appgroup-controller:v1.0.3-alpha
        command:
        - /bin/controller
        imagePullPolicy: IfNotPresent

networktopology-controller.yaml:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: networktopology-controller
  namespace: network-aware-controllers
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: networktopology-controller
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "list", "watch", "patch"]
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
- apiGroups: ["appgroup.diktyo.x-k8s.io"]
  resources: ["appgroups"]
  verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
- apiGroups: ["networktopology.diktyo.x-k8s.io"]
  resources: ["networktopologies"]
  verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: networktopology-controller
subjects:
- kind: ServiceAccount
  name:  networktopology-controller
  namespace: network-aware-controllers
roleRef:
  kind: ClusterRole
  name: networktopology-controller
  apiGroup: rbac.authorization.k8s.io
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: networktopology-controller
  namespace: network-aware-controllers
  labels:
    app: networktopology-controller
spec:
  replicas: 1
  selector:
    matchLabels:
      app: networktopology-controller
  template:
    metadata:
      labels:
        app: networktopology-controller
    spec:
      serviceAccountName: networktopology-controller
      containers:
      - name: networktopology-controller
        image: jpedro1992/networktopology-controller:v1.0.3-alpha
        command:
          - /bin/controller
        imagePullPolicy: Always #IfNotPresent

sched-cc.yaml:

apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: KubeSchedulerConfiguration
leaderElection:
  leaderElect: false
clientConnection:
  kubeconfig: "/etc/kubernetes/scheduler.conf"
profiles:
- schedulerName: network-aware-scheduler
  plugins:
    queueSort:
      enabled:
      - name: TopologicalSort
      disabled:
      - name: "*"
    preFilter:
      enabled:
        - name: NetworkOverhead
    filter:
      enabled:
      - name: NetworkOverhead
    score:
      disabled: # Preferably avoid the combination of NodeResourcesFit with NetworkOverhead
        - name: NodeResourcesFit
      enabled: # A higher weight is given to NetworkOverhead to favor allocation schemes with lower latency.
        - name: NetworkOverhead
          weight: 5
        # - name: BalancedAllocation
         #  weight: 1
  pluginConfig:
  - name: TopologicalSort
    args:
      namespaces:
      - "default"
  - name: NetworkOverhead
    args:
      namespaces:
      - "default"
      weightsName: "NetperfCosts" # or Dijkstra
      networkTopologyName: "net-topology-test"

deployment scheduler and controller (sig-scheduling)

deploy.yaml:

# First part
# Apply extra privileges to system:kube-scheduler.
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: system:kube-scheduler:plugins
rules:
- apiGroups: ["scheduling.sigs.x-k8s.io"]
  resources: ["podgroups", "elasticquotas", "podgroups/status", "elasticquotas/status"]
  verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
- apiGroups: ["appgroup.diktyo.x-k8s.io"]
  resources: ["appgroups"]
  verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
- apiGroups: ["networktopology.diktyo.x-k8s.io"]
  resources: ["networktopologies"]
  verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: system:kube-scheduler:plugins
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:kube-scheduler:plugins
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: system:kube-scheduler
---
# Second part
# Install the controller image.
apiVersion: v1
kind: Namespace
metadata:
  name: scheduler-plugins
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: scheduler-plugins-controller
  namespace: scheduler-plugins
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: scheduler-plugins-controller
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["scheduling.x-k8s.io"]
  resources: ["podgroups", "elasticquotas", "podgroups/status", "elasticquotas/status"]
  verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
- apiGroups: [""]
  resources: ["events"]
  verbs: ["create", "patch", "update"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: scheduler-plugins-controller
subjects:
- kind: ServiceAccount
  name: scheduler-plugins-controller
  namespace: scheduler-plugins
roleRef:
  kind: ClusterRole
  name: scheduler-plugins-controller
  apiGroup: rbac.authorization.k8s.io
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: scheduler-plugins-controller
  namespace: scheduler-plugins
  labels:
    app: scheduler-plugins-controller
spec:
  replicas: 1
  selector:
    matchLabels:
      app: scheduler-plugins-controller
  template:
    metadata:
      labels:
        app: scheduler-plugins-controller
    spec:
      serviceAccountName: scheduler-plugins-controller
      containers:
      - name: scheduler-plugins-controller
        image: registry.k8s.io/scheduler-plugins/controller:v0.25.7
        imagePullPolicy: IfNotPresent
---
# Install the scheduler
apiVersion: apps/v1
kind: Deployment
metadata:
  name: scheduler-plugins-scheduler
  namespace: scheduler-plugins
spec:
  replicas: 1
  selector:
    matchLabels:
      component: scheduler
      tier: control-plane
  template:
    metadata:
      labels:
        component: scheduler
        tier: control-plane
    spec:
      nodeSelector:
        kubernetes.io/hostname: "master.n1" # Modify to your master node name! Unless you populate all nodes with sched-cc.yaml
      containers:
        - image: registry.k8s.io/scheduler-plugins/kube-scheduler:v0.25.7 
          #imagePullPolicy: Never
          command:
          - /bin/kube-scheduler
          - --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
          - --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
          - --config=/etc/kubernetes/sched-cc.yaml
          - -v=9
          name: scheduler-plugins
          securityContext:
            privileged: true
          volumeMounts:
          - mountPath: /etc/kubernetes
            name: etckubernetes
      hostNetwork: false
      hostPID: false
      volumes:
      - hostPath:
          path: /etc/kubernetes/
          type: Directory
        name: etckubernetes
dyyfk commented 1 year ago

Hi, @jpedro1992 Thanks for sharing your yaml file. I was able to get the scheduler running successfully on my minikube.

Although the scheduler is running and scheduling nodes, it keeps telling that the final costs of all nodes are all 0s. It seems like the dependency of any pod is also not respected as all pods have this line: "Number of dependencies" satisfied=0 violated=0". This is not the case as pods should have a dependency.

Scheduler logs:

I0504 02:30:07.766408       1 round_trippers.go:466] curl -v -XGET  -H "Accept: application/vnd.kubernetes.protobuf, */*" -H "User-Agent: kube-scheduler/v0.0.0 (linux/amd64) kubernetes/$Format/scheduler" 'https://192.168.49.2:8443/apis/storage.k8s.io/v1/csidrivers?allowWatchBookmarks=true&resourceVersion=6810&timeout=5m32s&timeoutSeconds=332&watch=true'
I0504 02:30:07.774639       1 round_trippers.go:553] GET https://192.168.49.2:8443/apis/storage.k8s.io/v1/csidrivers?allowWatchBookmarks=true&resourceVersion=6810&timeout=5m32s&timeoutSeconds=332&watch=true 200 OK in 1 milliseconds
I0504 02:30:07.774668       1 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 0 ms Duration 1 ms
I0504 02:30:07.774678       1 round_trippers.go:577] Response Headers:
I0504 02:30:07.774691       1 round_trippers.go:580]     Audit-Id: 3a0d8bcd-d0bb-4ec2-8256-0a78703ed19c
I0504 02:30:07.774700       1 round_trippers.go:580]     Cache-Control: no-cache, private
I0504 02:30:07.774708       1 round_trippers.go:580]     Content-Type: application/vnd.kubernetes.protobuf;stream=watch
I0504 02:30:07.774716       1 round_trippers.go:580]     X-Kubernetes-Pf-Flowschema-Uid: 46c850ec-633b-4590-974a-44dff367d480
I0504 02:30:07.774724       1 round_trippers.go:580]     X-Kubernetes-Pf-Prioritylevel-Uid: 78634a56-cf2f-4491-ab29-d3a44816f613
I0504 02:30:07.774733       1 round_trippers.go:580]     Date: Thu, 04 May 2023 02:30:07 GMT
I0504 02:30:15.901697       1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p2-76867498b7-ffg96"
I0504 02:30:15.901889       1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p1-59bcb65f56-cr59l"
I0504 02:30:15.927373       1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p3-5f458cf755-wswjn"
I0504 02:30:16.138097       1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p1-59bcb65f56-cr59l"
I0504 02:30:16.176018       1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p2-76867498b7-ffg96"
I0504 02:30:16.186971       1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p3-5f458cf755-wswjn"
I0504 02:30:16.870253       1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p3-5f458cf755-wswjn"
I0504 02:30:16.878755       1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p2-76867498b7-ffg96"
I0504 02:30:16.887701       1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p2-76867498b7-ffg96"
I0504 02:30:16.901057       1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p3-5f458cf755-wswjn"
I0504 02:30:16.916165       1 eventhandlers.go:229] "Delete event for scheduled pod" pod="default/p3-5f458cf755-wswjn"
I0504 02:30:16.916620       1 eventhandlers.go:229] "Delete event for scheduled pod" pod="default/p2-76867498b7-ffg96"
I0504 02:30:16.940136       1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p1-59bcb65f56-cr59l"
I0504 02:30:16.952460       1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p1-59bcb65f56-cr59l"
I0504 02:30:16.957156       1 eventhandlers.go:229] "Delete event for scheduled pod" pod="default/p1-59bcb65f56-cr59l"
I0504 02:30:18.403144       1 eventhandlers.go:116] "Add event for unscheduled pod" pod="default/p1-59bcb65f56-4rrfc"
I0504 02:30:18.403217       1 scheduling_queue.go:957] "About to try and schedule pod" pod="default/p1-59bcb65f56-4rrfc"
I0504 02:30:18.403229       1 schedule_one.go:85] "Attempting to schedule pod" pod="default/p1-59bcb65f56-4rrfc"
I0504 02:30:18.403311       1 networkoverhead.go:612] "namespaces: %s" [default]="(MISSING)"
I0504 02:30:18.403339       1 networkoverhead.go:614] "appGroup CR" namespace="default" ag.lister=&{indexer:0xc000654cd8}
I0504 02:30:18.403348       1 networkoverhead.go:629] "namespaces: %s" [default]="(MISSING)"
I0504 02:30:18.403359       1 networkoverhead.go:631] "networkTopology CR:" namespace="default" nt.lister=&{indexer:0xc000655620}
I0504 02:30:18.403413       1 networkoverhead.go:242] "Node info" name="minikube" region="us-west-1" zone="z1"
I0504 02:30:18.403613       1 networkoverhead.go:249] "Map" costMap=map[{Origin:us-west-1 Destination:us-east-1}:200 {Origin:z1 Destination:z2}:10]
I0504 02:30:18.403783       1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=0
I0504 02:30:18.403832       1 networkoverhead.go:270] "Node final cost" cost=0
I0504 02:30:18.403904       1 networkoverhead.go:242] "Node info" name="minikube-m02" region="us-west-1" zone="z2"
I0504 02:30:18.403954       1 networkoverhead.go:249] "Map" costMap=map[{Origin:us-west-1 Destination:us-east-1}:200 {Origin:z2 Destination:z1}:20]
I0504 02:30:18.403975       1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=0
I0504 02:30:18.404008       1 networkoverhead.go:270] "Node final cost" cost=0
I0504 02:30:18.404022       1 networkoverhead.go:242] "Node info" name="minikube-m03" region="us-east-1" zone="z3"
I0504 02:30:18.404037       1 networkoverhead.go:249] "Map" costMap=map[{Origin:z3 Destination:z4}:30]
I0504 02:30:18.404050       1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=0
I0504 02:30:18.404058       1 networkoverhead.go:270] "Node final cost" cost=0
I0504 02:30:18.404069       1 networkoverhead.go:242] "Node info" name="minikube-m04" region="us-east-1" zone="z4"
I0504 02:30:18.404081       1 networkoverhead.go:249] "Map" costMap=map[{Origin:z4 Destination:z3}:40]
I0504 02:30:18.404097       1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=0
I0504 02:30:18.404107       1 networkoverhead.go:270] "Node final cost" cost=0
I0504 02:30:18.404192       1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=0
I0504 02:30:18.404209       1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=0
I0504 02:30:18.404224       1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=0
I0504 02:30:18.404229       1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=0
I0504 02:30:18.404399       1 networkoverhead.go:359] "Score:" pod="p1-59bcb65f56-4rrfc" node="minikube" finalScore=0
I0504 02:30:18.404527       1 networkoverhead.go:359] "Score:" pod="p1-59bcb65f56-4rrfc" node="minikube-m02" finalScore=0
I0504 02:30:18.404550       1 networkoverhead.go:359] "Score:" pod="p1-59bcb65f56-4rrfc" node="minikube-m03" finalScore=0
I0504 02:30:18.404597       1 networkoverhead.go:359] "Score:" pod="p1-59bcb65f56-4rrfc" node="minikube-m04" finalScore=0
I0504 02:30:18.404739       1 networkoverhead.go:365] "before normalization: " scores=[{Name:minikube Score:0} {Name:minikube-m02 Score:0} {Name:minikube-m03 Score:0} {Name:minikube-m04 Score:0}]
I0504 02:30:18.404899       1 default_binder.go:52] "Attempting to bind pod to node" pod="default/p1-59bcb65f56-4rrfc" node="minikube-m04"
I0504 02:30:18.404974       1 request.go:1170] Request Body:
00000000  6b 38 73 00 0a 0d 0a 02  76 31 12 07 42 69 6e 64  |k8s.....v1..Bind|
00000010  69 6e 67 12 70 0a 4e 0a  13 70 31 2d 35 39 62 63  |ing.p.N..p1-59bc|
00000020  62 36 35 66 35 36 2d 34  72 72 66 63 12 00 1a 07  |b65f56-4rrfc....|
00000030  64 65 66 61 75 6c 74 22  00 2a 24 63 65 36 33 35  |default".*$ce635|
00000040  62 32 31 2d 63 66 62 35  2d 34 38 36 39 2d 61 38  |b21-cfb5-4869-a8|
00000050  35 39 2d 38 35 30 34 62  37 30 65 31 30 64 65 32  |59-8504b70e10de2|
00000060  00 38 00 42 00 12 1e 0a  04 4e 6f 64 65 12 00 1a  |.8.B.....Node...|
00000070  0c 6d 69 6e 69 6b 75 62  65 2d 6d 30 34 22 00 2a  |.minikube-m04".*|
00000080  00 32 00 3a 00 1a 00 22  00                       |.2.:...".|
I0504 02:30:18.405059       1 round_trippers.go:466] curl -v -XPOST  -H "Accept: application/vnd.kubernetes.protobuf, */*" -H "Content-Type: application/vnd.kubernetes.protobuf" -H "User-Agent: kube-scheduler/v0.0.0 (linux/amd64) kubernetes/$Format/scheduler" 'https://192.168.49.2:8443/api/v1/namespaces/default/pods/p1-59bcb65f56-4rrfc/binding'
I0504 02:30:18.412695       1 eventhandlers.go:116] "Add event for unscheduled pod" pod="default/p2-76867498b7-6lh62"
I0504 02:30:18.412787       1 scheduling_queue.go:957] "About to try and schedule pod" pod="default/p2-76867498b7-6lh62"
I0504 02:30:18.412812       1 schedule_one.go:85] "Attempting to schedule pod" pod="default/p2-76867498b7-6lh62"
I0504 02:30:18.412883       1 networkoverhead.go:612] "namespaces: %s" [default]="(MISSING)"
I0504 02:30:18.412907       1 networkoverhead.go:614] "appGroup CR" namespace="default" ag.lister=&{indexer:0xc000654cd8}
I0504 02:30:18.412918       1 networkoverhead.go:629] "namespaces: %s" [default]="(MISSING)"
I0504 02:30:18.412928       1 networkoverhead.go:631] "networkTopology CR:" namespace="default" nt.lister=&{indexer:0xc000655620}
I0504 02:30:18.412966       1 networkoverhead.go:242] "Node info" name="minikube" region="us-west-1" zone="z1"
I0504 02:30:18.413328       1 networkoverhead.go:249] "Map" costMap=map[{Origin:us-west-1 Destination:us-east-1}:200 {Origin:z1 Destination:z2}:10]
I0504 02:30:18.413452       1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=0
I0504 02:30:18.413496       1 networkoverhead.go:270] "Node final cost" cost=0
I0504 02:30:18.413583       1 networkoverhead.go:242] "Node info" name="minikube-m02" region="us-west-1" zone="z2"
I0504 02:30:18.413715       1 networkoverhead.go:249] "Map" costMap=map[{Origin:us-west-1 Destination:us-east-1}:200 {Origin:z2 Destination:z1}:20]
I0504 02:30:18.413810       1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=0
I0504 02:30:18.413869       1 networkoverhead.go:270] "Node final cost" cost=0
I0504 02:30:18.413949       1 networkoverhead.go:242] "Node info" name="minikube-m03" region="us-east-1" zone="z3"
I0504 02:30:18.414011       1 networkoverhead.go:249] "Map" costMap=map[{Origin:z3 Destination:z4}:30]
I0504 02:30:18.414093       1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=0
I0504 02:30:18.414148       1 networkoverhead.go:270] "Node final cost" cost=0
I0504 02:30:18.414866       1 networkoverhead.go:242] "Node info" name="minikube-m04" region="us-east-1" zone="z4"
I0504 02:30:18.415103       1 networkoverhead.go:249] "Map" costMap=map[{Origin:z4 Destination:z3}:40]
I0504 02:30:18.415328       1 networkoverhead.go:263] "Number of dependencies" satisfied=0 violated=0
I0504 02:30:18.415500       1 networkoverhead.go:270] "Node final cost" cost=0
I0504 02:30:18.416662       1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=0
I0504 02:30:18.416843       1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=0
I0504 02:30:18.417015       1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=0
I0504 02:30:18.417182       1 networkoverhead.go:331] "Number of dependencies:" satisfied=0 violated=0
I0504 02:30:18.417541       1 networkoverhead.go:359] "Score:" pod="p2-76867498b7-6lh62" node="minikube" finalScore=0
I0504 02:30:18.417843       1 networkoverhead.go:359] "Score:" pod="p2-76867498b7-6lh62" node="minikube-m03" finalScore=0
I0504 02:30:18.418091       1 networkoverhead.go:359] "Score:" pod="p2-76867498b7-6lh62" node="minikube-m04" finalScore=0
I0504 02:30:18.418139       1 networkoverhead.go:359] "Score:" pod="p2-76867498b7-6lh62" node="minikube-m02" finalScore=0
I0504 02:30:18.418678       1 networkoverhead.go:365] "before normalization: " scores=[{Name:minikube Score:0} {Name:minikube-m02 Score:0} {Name:minikube-m03 Score:0} {Name:minikube-m04 Score:0}]
I0504 02:30:18.419126       1 default_binder.go:52] "Attempting to bind pod to node" pod="default/p2-76867498b7-6lh62" node="minikube-m02"
I0504 02:30:18.419405       1 request.go:1170] Request Body:
00000000  6b 38 73 00 0a 0d 0a 02  76 31 12 07 42 69 6e 64  |k8s.....v1..Bind|
00000010  69 6e 67 12 70 0a 4e 0a  13 70 32 2d 37 36 38 36  |ing.p.N..p2-7686|
00000020  37 34 39 38 62 37 2d 36  6c 68 36 32 12 00 1a 07  |7498b7-6lh62....|
00000030  64 65 66 61 75 6c 74 22  00 2a 24 65 32 37 36 33  |default".*$e2763|
00000040  62 37 35 2d 64 37 39 35  2d 34 32 39 63 2d 61 34  |b75-d795-429c-a4|
00000050  33 32 2d 37 39 65 61 37  36 64 62 31 34 61 33 32  |32-79ea76db14a32|
00000060  00 38 00 42 00 12 1e 0a  04 4e 6f 64 65 12 00 1a  |.8.B.....Node...|
00000070  0c 6d 69 6e 69 6b 75 62  65 2d 6d 30 32 22 00 2a  |.minikube-m02".*|
00000080  00 32 00 3a 00 1a 00 22  00                       |.2.:...".|
I0504 02:30:18.419707       1 round_trippers.go:466] curl -v -XPOST  -H "Accept: application/vnd.kubernetes.protobuf, */*" -H "Content-Type: application/vnd.kubernetes.protobuf" -H "User-Agent: kube-scheduler/v0.0.0 (linux/amd64) kubernetes/$Format/scheduler" 'https://192.168.49.2:8443/api/v1/namespaces/default/pods/p2-76867498b7-6lh62/binding'
I0504 02:30:18.427998       1 eventhandlers.go:159] "Delete event for unscheduled pod" pod="default/p1-59bcb65f56-4rrfc"
I0504 02:30:18.428140       1 eventhandlers.go:184] "Add event for scheduled pod" pod="default/p1-59bcb65f56-4rrfc"
I0504 02:30:18.428881       1 round_trippers.go:553] POST https://192.168.49.2:8443/api/v1/namespaces/default/pods/p1-59bcb65f56-4rrfc/binding 201 Created in 19 milliseconds
I0504 02:30:18.428916       1 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 19 ms Duration 19 ms
I0504 02:30:18.428926       1 round_trippers.go:577] Response Headers:
I0504 02:30:18.428935       1 round_trippers.go:580]     Audit-Id: b63b0746-f514-4ab7-92db-f580656d1bc2
I0504 02:30:18.428944       1 round_trippers.go:580]     Cache-Control: no-cache, private
I0504 02:30:18.428953       1 round_trippers.go:580]     Content-Type: application/vnd.kubernetes.protobuf
I0504 02:30:18.428961       1 round_trippers.go:580]     X-Kubernetes-Pf-Flowschema-Uid: 46c850ec-633b-4590-974a-44dff367d480
I0504 02:30:18.428969       1 round_trippers.go:580]     X-Kubernetes-Pf-Prioritylevel-Uid: 78634a56-cf2f-4491-ab29-d3a44816f613
I0504 02:30:18.428978       1 round_trippers.go:580]     Content-Length: 48
I0504 02:30:18.428990       1 round_trippers.go:580]     Date: Thu, 04 May 2023 02:30:18 GMT
I0504 02:30:18.429033       1 request.go:1170] Response Body:
00000000  6b 38 73 00 0a 0c 0a 02  76 31 12 06 53 74 61 74  |k8s.....v1..Stat|
00000010  75 73 12 18 0a 06 0a 00  12 00 1a 00 12 07 53 75  |us............Su|
00000020  63 63 65 73 73 1a 00 22  00 30 c9 01 1a 00 22 00  |ccess..".0....".|
I0504 02:30:18.429553       1 round_trippers.go:553] POST https://192.168.49.2:8443/api/v1/namespaces/default/pods/p2-76867498b7-6lh62/binding 201 Created in 9 milliseconds
I0504 02:30:18.433101       1 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 9 ms Duration 9 ms
I0504 02:30:18.433514       1 round_trippers.go:577] Response Headers:
I0504 02:30:18.433758       1 round_trippers.go:580]     Content-Type: application/vnd.kubernetes.protobuf
I0504 02:30:18.433819       1 round_trippers.go:580]     X-Kubernetes-Pf-Flowschema-Uid: 46c850ec-633b-4590-974a-44dff367d480
I0504 02:30:18.433898       1 round_trippers.go:580]     X-Kubernetes-Pf-Prioritylevel-Uid: 78634a56-cf2f-4491-ab29-d3a44816f613
I0504 02:30:18.433939       1 round_trippers.go:580]     Content-Length: 48
I0504 02:30:18.434000       1 round_trippers.go:580]     Date: Thu, 04 May 2023 02:30:18 GMT
I0504 02:30:18.434075       1 round_trippers.go:580]     Audit-Id: 726d26ba-6875-49c2-86b6-f46d3a10c5bd
I0504 02:30:18.434163       1 round_trippers.go:580]     Cache-Control: no-cache, private
I0504 02:30:18.434264       1 request.go:1170] Response Body:
00000000  6b 38 73 00 0a 0c 0a 02  76 31 12 06 53 74 61 74  |k8s.....v1..Stat|
00000010  75 73 12 18 0a 06 0a 00  12 00 1a 00 12 07 53 75  |us............Su|
00000020  63 63 65 73 73 1a 00 22  00 30 c9 01 1a 00 22 00  |ccess..".0....".|
I0504 02:30:18.434418       1 cache.go:402] "Finished binding for pod, can be expired" pod="default/p2-76867498b7-6lh62"
I0504 02:30:18.434576       1 schedule_one.go:266] "Successfully bound pod to node" pod="default/p2-76867498b7-6lh62" node="minikube-m02" evaluatedNodes=4 feasibleNodes=4
I0504 02:30:18.434831       1 request.go:1170] Request Body:
00000000  6b 38 73 00 0a 19 0a 10  65 76 65 6e 74 73 2e 6b  |k8s.....events.k|
00000010  38 73 2e 69 6f 2f 76 31  12 05 45 76 65 6e 74 12  |8s.io/v1..Event.|
00000020  ec 02 0a 3b 0a 24 70 32  2d 37 36 38 36 37 34 39  |...;.$p2-7686749|
00000030  38 62 37 2d 36 6c 68 36  32 2e 31 37 35 62 63 66  |8b7-6lh62.175bcf|
00000040  64 39 36 39 34 64 38 35  63 37 12 00 1a 07 64 65  |d9694d85c7....de|
00000050  66 61 75 6c 74 22 00 2a  00 32 00 38 00 42 00 12  |fault".*.2.8.B..|
00000060  0c 08 ba b1 cc a2 06 10  98 b3 9a cf 01 22 17 6e  |.............".n|
00000070  65 74 77 6f 72 6b 2d 61  77 61 72 65 2d 73 63 68  |etwork-aware-sch|
00000080  65 64 75 6c 65 72 2a 44  6e 65 74 77 6f 72 6b 2d  |eduler*Dnetwork-|
00000090  61 77 61 72 65 2d 73 63  68 65 64 75 6c 65 72 2d  |aware-scheduler-|
000000a0  73 63 68 65 64 75 6c 65  72 2d 70 6c 75 67 69 6e  |scheduler-plugin|
000000b0  73 2d 73 63 68 65 64 75  6c 65 72 2d 36 37 64 34  |s-scheduler-67d4|
000000c0  38 39 63 35 63 38 2d 70  72 32 77 66 32 07 42 69  |89c5c8-pr2wf2.Bi|
000000d0  6e 64 69 6e 67 3a 09 53  63 68 65 64 75 6c 65 64  |nding:.Scheduled|
000000e0  42 55 0a 03 50 6f 64 12  07 64 65 66 61 75 6c 74  |BU..Pod..default|
000000f0  1a 13 70 32 2d 37 36 38  36 37 34 39 38 62 37 2d  |..p2-76867498b7-|
00000100  36 6c 68 36 32 22 24 65  32 37 36 33 62 37 35 2d  |6lh62"$e2763b75-|
00000110  64 37 39 35 2d 34 32 39  63 2d 61 34 33 32 2d 37  |d795-429c-a432-7|
00000120  39 65 61 37 36 64 62 31  34 61 33 2a 02 76 31 32  |9ea76db14a3*.v12|
00000130  04 36 38 36 38 3a 00 52  41 53 75 63 63 65 73 73  |.6868:.RASuccess|
00000140  66 75 6c 6c 79 20 61 73  73 69 67 6e 65 64 20 64  |fully assigned d|
00000150  65 66 61 75 6c 74 2f 70  32 2d 37 36 38 36 37 34  |efault/p2-768674|
00000160  39 38 62 37 2d 36 6c 68  36 32 20 74 6f 20 6d 69  |98b7-6lh62 to mi|
00000170  6e 69 6b 75 62 65 2d 6d  30 32 5a 06 4e 6f 72 6d  |nikube-m02Z.Norm|
00000180  61 6c 62 04 0a 00 12 00  6a 00 72 00 78 00 1a 00  |alb.....j.r.x...|
00000190  22 00                                             |".|
I0504 02:30:18.435001       1 round_trippers.go:466] curl -v -XPOST  -H "Accept: application/vnd.kubernetes.protobuf, */*" -H "Content-Type: application/vnd.kubernetes.protobuf" -H "User-Agent: kube-scheduler/v0.0.0 (linux/amd64) kubernetes/$Format/scheduler" 'https://192.168.49.2:8443/apis/events.k8s.io/v1/namespaces/default/events'
I0504 02:30:18.429087       1 cache.go:402] "Finished binding for pod, can be expired" pod="default/p1-59bcb65f56-4rrfc"
I0504 02:30:18.447976       1 schedule_one.go:266] "Successfully bound pod to node" pod="default/p1-59bcb65f56-4rrfc" node="minikube-m04" evaluatedNodes=4 feasibleNodes=4
I0504 02:30:18.448278       1 request.go:1170] Request Body:
00000000  6b 38 73 00 0a 19 0a 10  65 76 65 6e 74 73 2e 6b  |k8s.....events.k|
00000010  38 73 2e 69 6f 2f 76 31  12 05 45 76 65 6e 74 12  |8s.io/v1..Event.|
00000020  ec 02 0a 3b 0a 24 70 31  2d 35 39 62 63 62 36 35  |...;.$p1-59bcb65|
00000030  66 35 36 2d 34 72 72 66  63 2e 31 37 35 62 63 66  |f56-4rrfc.175bcf|
00000040  64 39 36 61 31 36 32 64  34 32 12 00 1a 07 64 65  |d96a162d42....de|
00000050  66 61 75 6c 74 22 00 2a  00 32 00 38 00 42 00 12  |fault".*.2.8.B..|
00000060  0c 08 ba b1 cc a2 06 10  a8 e2 bc d5 01 22 17 6e  |.............".n|
00000070  65 74 77 6f 72 6b 2d 61  77 61 72 65 2d 73 63 68  |etwork-aware-sch|
00000080  65 64 75 6c 65 72 2a 44  6e 65 74 77 6f 72 6b 2d  |eduler*Dnetwork-|
00000090  61 77 61 72 65 2d 73 63  68 65 64 75 6c 65 72 2d  |aware-scheduler-|
000000a0  73 63 68 65 64 75 6c 65  72 2d 70 6c 75 67 69 6e  |scheduler-plugin|
000000b0  73 2d 73 63 68 65 64 75  6c 65 72 2d 36 37 64 34  |s-scheduler-67d4|
000000c0  38 39 63 35 63 38 2d 70  72 32 77 66 32 07 42 69  |89c5c8-pr2wf2.Bi|
000000d0  6e 64 69 6e 67 3a 09 53  63 68 65 64 75 6c 65 64  |nding:.Scheduled|
000000e0  42 55 0a 03 50 6f 64 12  07 64 65 66 61 75 6c 74  |BU..Pod..default|
000000f0  1a 13 70 31 2d 35 39 62  63 62 36 35 66 35 36 2d  |..p1-59bcb65f56-|
00000100  34 72 72 66 63 22 24 63  65 36 33 35 62 32 31 2d  |4rrfc"$ce635b21-|
00000110  63 66 62 35 2d 34 38 36  39 2d 61 38 35 39 2d 38  |cfb5-4869-a859-8|
00000120  35 30 34 62 37 30 65 31  30 64 65 2a 02 76 31 32  |504b70e10de*.v12|
00000130  04 36 38 36 36 3a 00 52  41 53 75 63 63 65 73 73  |.6866:.RASuccess|
00000140  66 75 6c 6c 79 20 61 73  73 69 67 6e 65 64 20 64  |fully assigned d|
00000150  65 66 61 75 6c 74 2f 70  31 2d 35 39 62 63 62 36  |efault/p1-59bcb6|
00000160  35 66 35 36 2d 34 72 72  66 63 20 74 6f 20 6d 69  |5f56-4rrfc to mi|
00000170  6e 69 6b 75 62 65 2d 6d  30 34 5a 06 4e 6f 72 6d  |nikube-m04Z.Norm|
00000180  61 6c 62 04 0a 00 12 00  6a 00 72 00 78 00 1a 00  |alb.....j.r.x...|
00000190  22 00                                             |".|
I0504 02:30:18.448518       1 round_trippers.go:466] curl -v -XPOST  -H "Content-Type: application/vnd.kubernetes.protobuf" -H "Accept: application/vnd.kubernetes.protobuf, */*" -H "User-Agent: kube-scheduler/v0.0.0 (linux/amd64) kubernetes/$Format/scheduler" 'https://192.168.49.2:8443/apis/events.k8s.io/v1/namespaces/default/events'
I0504 02:30:18.449015       1 eventhandlers.go:184] "Add event for scheduled pod" pod="default/p2-76867498b7-6lh62"
I0504 02:30:18.449625       1 eventhandlers.go:159] "Delete event for unscheduled pod" pod="default/p2-76867498b7-6lh62"
I0504 02:30:18.472008       1 round_trippers.go:553] POST https://192.168.49.2:8443/apis/events.k8s.io/v1/namespaces/default/events 201 Created in 35 milliseconds
I0504 02:30:18.472051       1 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 32 ms Duration 35 ms
I0504 02:30:18.472061       1 round_trippers.go:577] Response Headers:
I0504 02:30:18.472071       1 round_trippers.go:580]     X-Kubernetes-Pf-Prioritylevel-Uid: 78634a56-cf2f-4491-ab29-d3a44816f613
I0504 02:30:18.472080       1 round_trippers.go:580]     Content-Length: 664
I0504 02:30:18.472649       1 round_trippers.go:580]     Date: Thu, 04 May 2023 02:30:18 GMT
I0504 02:30:18.473389       1 round_trippers.go:580]     Audit-Id: 9f3b48e8-4be0-489c-af79-a5f518cd8cbd
I0504 02:30:18.473555       1 round_trippers.go:580]     Cache-Control: no-cache, private
I0504 02:30:18.473670       1 round_trippers.go:580]     Content-Type: application/vnd.kubernetes.protobuf
I0504 02:30:18.473723       1 round_trippers.go:580]     X-Kubernetes-Pf-Flowschema-Uid: 46c850ec-633b-4590-974a-44dff367d480
I0504 02:30:18.473852       1 request.go:1170] Response Body:
00000000  6b 38 73 00 0a 19 0a 10  65 76 65 6e 74 73 2e 6b  |k8s.....events.k|
00000010  38 73 2e 69 6f 2f 76 31  12 05 45 76 65 6e 74 12  |8s.io/v1..Event.|
00000020  f2 04 0a c0 02 0a 24 70  32 2d 37 36 38 36 37 34  |......$p2-768674|
00000030  39 38 62 37 2d 36 6c 68  36 32 2e 31 37 35 62 63  |98b7-6lh62.175bc|
00000040  66 64 39 36 39 34 64 38  35 63 37 12 00 1a 07 64  |fd9694d85c7....d|
00000050  65 66 61 75 6c 74 22 00  2a 24 64 31 33 36 39 31  |efault".*$d13691|
00000060  62 66 2d 61 66 62 31 2d  34 62 30 32 2d 62 35 65  |bf-afb1-4b02-b5e|
00000070  66 2d 32 62 64 37 38 35  34 31 30 62 64 61 32 04  |f-2bd785410bda2.|
00000080  36 38 38 38 38 00 42 08  08 ba b1 cc a2 06 10 00  |68888.B.........|
00000090  8a 01 d1 01 0a 0e 6b 75  62 65 2d 73 63 68 65 64  |......kube-sched|
000000a0  75 6c 65 72 12 06 55 70  64 61 74 65 1a 10 65 76  |uler..Update..ev|
000000b0  65 6e 74 73 2e 6b 38 73  2e 69 6f 2f 76 31 22 08  |ents.k8s.io/v1".|
000000c0  08 ba b1 cc a2 06 10 00  32 08 46 69 65 6c 64 73  |........2.Fields|
000000d0  56 31 3a 8e 01 0a 8b 01  7b 22 66 3a 61 63 74 69  |V1:.....{"f:acti|
000000e0  6f 6e 22 3a 7b 7d 2c 22  66 3a 65 76 65 6e 74 54  |on":{},"f:eventT|
000000f0  69 6d 65 22 3a 7b 7d 2c  22 66 3a 6e 6f 74 65 22  |ime":{},"f:note"|
00000100  3a 7b 7d 2c 22 66 3a 72  65 61 73 6f 6e 22 3a 7b  |:{},"f:reason":{|
00000110  7d 2c 22 66 3a 72 65 67  61 72 64 69 6e 67 22 3a  |},"f:regarding":|
00000120  7b 7d 2c 22 66 3a 72 65  70 6f 72 74 69 6e 67 43  |{},"f:reportingC|
00000130  6f 6e 74 72 6f 6c 6c 65  72 22 3a 7b 7d 2c 22 66  |ontroller":{},"f|
00000140  3a 72 65 70 6f 72 74 69  6e 67 49 6e 73 74 61 6e  |:reportingInstan|
00000150  63 65 22 3a 7b 7d 2c 22  66 3a 74 79 70 65 22 3a  |ce":{},"f:type":|
00000160  7b 7d 7d 42 00 12 0c 08  ba b1 cc a2 06 10 98 b3  |{}}B............|
00000170  9a cf 01 22 17 6e 65 74  77 6f 72 6b 2d 61 77 61  |...".network-awa|
00000180  72 65 2d 73 63 68 65 64  75 6c 65 72 2a 44 6e 65  |re-scheduler*Dne|
00000190  74 77 6f 72 6b 2d 61 77  61 72 65 2d 73 63 68 65  |twork-aware-sche|
000001a0  64 75 6c 65 72 2d 73 63  68 65 64 75 6c 65 72 2d  |duler-scheduler-|
000001b0  70 6c 75 67 69 6e 73 2d  73 63 68 65 64 75 6c 65  |plugins-schedule|
000001c0  72 2d 36 37 64 34 38 39  63 35 63 38 2d 70 72 32  |r-67d489c5c8-pr2|
000001d0  77 66 32 07 42 69 6e 64  69 6e 67 3a 09 53 63 68  |wf2.Binding:.Sch|
000001e0  65 64 75 6c 65 64 42 55  0a 03 50 6f 64 12 07 64  |eduledBU..Pod..d|
000001f0  65 66 61 75 6c 74 1a 13  70 32 2d 37 36 38 36 37  |efault..p2-76867|
00000200  34 39 38 62 37 2d 36 6c  68 36 32 22 24 65 32 37  |498b7-6lh62"$e27|
00000210  36 33 62 37 35 2d 64 37  39 35 2d 34 32 39 63 2d  |63b75-d795-429c-|
00000220  61 34 33 32 2d 37 39 65  61 37 36 64 62 31 34 61  |a432-79ea76db14a|
00000230  33 2a 02 76 31 32 04 36  38 36 38 3a 00 52 41 53  |3*.v12.6868:.RAS|
00000240  75 63 63 65 73 73 66 75  6c 6c 79 20 61 73 73 69  |uccessfully assi|
00000250  67 6e 65 64 20 64 65 66  61 75 6c 74 2f 70 32 2d  |gned default/p2-|
00000260  37 36 38 36 37 34 39 38  62 37 2d 36 6c 68 36 32  |76867498b7-6lh62|
00000270  20 74 6f 20 6d 69 6e 69  6b 75 62 65 2d 6d 30 32  | to minikube-m02|
00000280  5a 06 4e 6f 72 6d 61 6c  62 04 0a 00 12 00 6a 00  |Z.Normalb.....j.|
00000290  72 00 78 00 1a 00 22 00                           |r.x...".|
I0504 02:30:18.473973       1 round_trippers.go:553] POST https://192.168.49.2:8443/apis/events.k8s.io/v1/namespaces/default/events 201 Created in 25 milliseconds
I0504 02:30:18.474052       1 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 20 ms Duration 25 ms
I0504 02:30:18.474100       1 round_trippers.go:577] Response Headers:
I0504 02:30:18.474150       1 round_trippers.go:580]     Content-Type: application/vnd.kubernetes.protobuf
I0504 02:30:18.474211       1 round_trippers.go:580]     X-Kubernetes-Pf-Flowschema-Uid: 46c850ec-633b-4590-974a-44dff367d480
I0504 02:30:18.474272       1 round_trippers.go:580]     X-Kubernetes-Pf-Prioritylevel-Uid: 78634a56-cf2f-4491-ab29-d3a44816f613
I0504 02:30:18.474313       1 round_trippers.go:580]     Content-Length: 664
I0504 02:30:18.474383       1 round_trippers.go:580]     Date: Thu, 04 May 2023 02:30:18 GMT
I0504 02:30:18.474435       1 round_trippers.go:580]     Audit-Id: 8703dc2b-a143-41d9-9da7-f4285ea10ffc
I0504 02:30:18.474482       1 round_trippers.go:580]     Cache-Control: no-cache, private
I0504 02:30:18.474573       1 request.go:1170] Response Body:
00000000  6b 38 73 00 0a 19 0a 10  65 76 65 6e 74 73 2e 6b  |k8s.....events.k|
00000010  38 73 2e 69 6f 2f 76 31  12 05 45 76 65 6e 74 12  |8s.io/v1..Event.|
00000020  f2 04 0a c0 02 0a 24 70  31 2d 35 39 62 63 62 36  |......$p1-59bcb6|
00000030  35 66 35 36 2d 34 72 72  66 63 2e 31 37 35 62 63  |5f56-4rrfc.175bc|
00000040  66 64 39 36 61 31 36 32  64 34 32 12 00 1a 07 64  |fd96a162d42....d|
00000050  65 66 61 75 6c 74 22 00  2a 24 32 30 36 30 63 32  |efault".*$2060c2|
00000060  33 34 2d 32 65 65 31 2d  34 63 65 36 2d 61 38 61  |34-2ee1-4ce6-a8a|
00000070  35 2d 39 32 36 66 38 33  38 35 64 33 34 34 32 04  |5-926f8385d3442.|
00000080  36 38 38 39 38 00 42 08  08 ba b1 cc a2 06 10 00  |68898.B.........|
00000090  8a 01 d1 01 0a 0e 6b 75  62 65 2d 73 63 68 65 64  |......kube-sched|
000000a0  75 6c 65 72 12 06 55 70  64 61 74 65 1a 10 65 76  |uler..Update..ev|
000000b0  65 6e 74 73 2e 6b 38 73  2e 69 6f 2f 76 31 22 08  |ents.k8s.io/v1".|
000000c0  08 ba b1 cc a2 06 10 00  32 08 46 69 65 6c 64 73  |........2.Fields|
000000d0  56 31 3a 8e 01 0a 8b 01  7b 22 66 3a 61 63 74 69  |V1:.....{"f:acti|
000000e0  6f 6e 22 3a 7b 7d 2c 22  66 3a 65 76 65 6e 74 54  |on":{},"f:eventT|
000000f0  69 6d 65 22 3a 7b 7d 2c  22 66 3a 6e 6f 74 65 22  |ime":{},"f:note"|
00000100  3a 7b 7d 2c 22 66 3a 72  65 61 73 6f 6e 22 3a 7b  |:{},"f:reason":{|
00000110  7d 2c 22 66 3a 72 65 67  61 72 64 69 6e 67 22 3a  |},"f:regarding":|
00000120  7b 7d 2c 22 66 3a 72 65  70 6f 72 74 69 6e 67 43  |{},"f:reportingC|
00000130  6f 6e 74 72 6f 6c 6c 65  72 22 3a 7b 7d 2c 22 66  |ontroller":{},"f|
00000140  3a 72 65 70 6f 72 74 69  6e 67 49 6e 73 74 61 6e  |:reportingInstan|
00000150  63 65 22 3a 7b 7d 2c 22  66 3a 74 79 70 65 22 3a  |ce":{},"f:type":|
00000160  7b 7d 7d 42 00 12 0c 08  ba b1 cc a2 06 10 a8 e2  |{}}B............|
00000170  bc d5 01 22 17 6e 65 74  77 6f 72 6b 2d 61 77 61  |...".network-awa|
00000180  72 65 2d 73 63 68 65 64  75 6c 65 72 2a 44 6e 65  |re-scheduler*Dne|
00000190  74 77 6f 72 6b 2d 61 77  61 72 65 2d 73 63 68 65  |twork-aware-sche|
000001a0  64 75 6c 65 72 2d 73 63  68 65 64 75 6c 65 72 2d  |duler-scheduler-|
000001b0  70 6c 75 67 69 6e 73 2d  73 63 68 65 64 75 6c 65  |plugins-schedule|
000001c0  72 2d 36 37 64 34 38 39  63 35 63 38 2d 70 72 32  |r-67d489c5c8-pr2|
000001d0  77 66 32 07 42 69 6e 64  69 6e 67 3a 09 53 63 68  |wf2.Binding:.Sch|
000001e0  65 64 75 6c 65 64 42 55  0a 03 50 6f 64 12 07 64  |eduledBU..Pod..d|
000001f0  65 66 61 75 6c 74 1a 13  70 31 2d 35 39 62 63 62  |efault..p1-59bcb|
00000200  36 35 66 35 36 2d 34 72  72 66 63 22 24 63 65 36  |65f56-4rrfc"$ce6|
00000210  33 35 62 32 31 2d 63 66  62 35 2d 34 38 36 39 2d  |35b21-cfb5-4869-|
00000220  61 38 35 39 2d 38 35 30  34 62 37 30 65 31 30 64  |a859-8504b70e10d|
00000230  65 2a 02 76 31 32 04 36  38 36 36 3a 00 52 41 53  |e*.v12.6866:.RAS|
00000240  75 63 63 65 73 73 66 75  6c 6c 79 20 61 73 73 69  |uccessfully assi|
00000250  67 6e 65 64 20 64 65 66  61 75 6c 74 2f 70 31 2d  |gned default/p1-|
00000260  35 39 62 63 62 36 35 66  35 36 2d 34 72 72 66 63  |59bcb65f56-4rrfc|
00000270  20 74 6f 20 6d 69 6e 69  6b 75 62 65 2d 6d 30 34  | to minikube-m04|
00000280  5a 06 4e 6f 72 6d 61 6c  62 04 0a 00 12 00 6a 00  |Z.Normalb.....j.|
00000290  72 00 78 00 1a 00 22 00                           |r.x...".|
I0504 02:30:18.474868       1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p1-59bcb65f56-4rrfc"
I0504 02:30:18.475092       1 eventhandlers.go:204] "Update event for scheduled pod" pod="default/p2-76867498b7-6lh62"
I0504 02:30:18.475204       1 eventhandlers.go:116] "Add event for unscheduled pod" pod="default/p3-5f458cf755-r4sc7"
I0504 02:30:18.475316       1 scheduling_queue.go:957] "About to try and schedule pod" pod="default/p3-5f458cf755-r4sc7"
I0504 02:30:18.475395       1 schedule_one.go:85] "Attempting to schedule pod" pod="default/p3-5f458cf755-r4sc7"
I0504 02:30:18.475518       1 networkoverhead.go:612] "namespaces: %s" [default]="(MISSING)"
I0504 02:30:18.475582       1 networkoverhead.go:614] "appGroup CR" namespace="default" ag.lister=&{indexer:0xc000654cd8}
I0504 02:30:18.475605       1 networkoverhead.go:629] "namespaces: %s" [default]="(MISSING)"
I0504 02:30:18.475617       1 networkoverhead.go:631] "networkTopology CR:" namespace="default" nt.lister=&{indexer:0xc000655620}
I0504 02:30:18.475721       1 networkoverhead.go:324] "Score all nodes equally, return"
I0504 02:30:18.475746       1 networkoverhead.go:324] "Score all nodes equally, return"
I0504 02:30:18.475758       1 networkoverhead.go:324] "Score all nodes equally, return"
I0504 02:30:18.475769       1 networkoverhead.go:324] "Score all nodes equally, return"
I0504 02:30:18.476018       1 networkoverhead.go:365] "before normalization: " scores=[{Name:minikube Score:0} {Name:minikube-m02 Score:0} {Name:minikube-m03 Score:0} {Name:minikube-m04 Score:0}]
I0504 02:30:18.476169       1 default_binder.go:52] "Attempting to bind pod to node" pod="default/p3-5f458cf755-r4sc7" node="minikube-m04"

I am deploying the appgroup crd without the netpert controller, the following is my yaml file.

# Example App Group CRD spec
apiVersion: appgroup.diktyo.x-k8s.io/v1alpha1
kind: AppGroup
metadata:
  name: a1
spec:
  numMembers: 3
  topologySortingAlgorithm: KahnSort
  workloads:
    - workload:
        kind: Deployment
        name: p1
        selector: p1
        apiVersion: apps/v1
        namespace: default
      dependencies:
        - workload:
            kind: Deployment
            name: p2
            selector: p2
            apiVersion: apps/v1
            namespace: default
          minBandwidth: "100Mi"
          maxNetworkCost: 30
    - workload:
        kind: Deployment
        name: p2
        selector: p2
        apiVersion: apps/v1
        namespace: default
      dependencies:
        - workload:
            kind: Deployment
            name: p3
            selector: p3
            apiVersion: apps/v1
            namespace: default
          minBandwidth: "250Mi"
          maxNetworkCost: 20
    - workload:
        kind: Deployment
        name: p3
        selector: p3
        apiVersion: apps/v1
        namespace: default
ubuntu@ip-172-31-11-54:~/test_deploy$ kubectl describe appgroups a1
Name:         a1
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  appgroup.diktyo.x-k8s.io/v1alpha1
Kind:         AppGroup
Metadata:
  Creation Timestamp:  2023-05-04T02:29:30Z
  Generation:          8
  Resource Version:    6937
  UID:                 4bb06824-1f6f-4b71-84f2-47f09cf52103
Spec:
  Num Members:                 3
  Topology Sorting Algorithm:  KahnSort
  Workloads:
    Dependencies:
      Max Network Cost:  30
      Min Bandwidth:     100Mi
      Workload:
        API Version:  apps/v1
        Kind:         Deployment
        Name:         p2
        Namespace:    default
        Selector:     p2
    Workload:
      API Version:  apps/v1
      Kind:         Deployment
      Name:         p1
      Namespace:    default
      Selector:     p1
    Dependencies:
      Max Network Cost:  20
      Min Bandwidth:     250Mi
      Workload:
        API Version:  apps/v1
        Kind:         Deployment
        Name:         p3
        Namespace:    default
        Selector:     p3
    Workload:
      API Version:  apps/v1
      Kind:         Deployment
      Name:         p2
      Namespace:    default
      Selector:     p2
    Workload:
      API Version:  apps/v1
      Kind:         Deployment
      Name:         p3
      Namespace:    default
      Selector:     p3
Status:
  Running Workloads:          3
  Topology Calculation Time:  2023-05-04T02:29:30Z
  Topology Order:
    Index:  1
    Workload:
      API Version:  apps/v1
      Kind:         Deployment
      Name:         p1
      Namespace:    default
      Selector:     p1
    Index:          2
    Workload:
      API Version:  apps/v1
      Kind:         Deployment
      Name:         p2
      Namespace:    default
      Selector:     p2
    Index:          3
    Workload:
      API Version:  apps/v1
      Kind:         Deployment
      Name:         p3
      Namespace:    default
      Selector:     p3
Events:             <none>
kubectl logs -f appgroup-controller-b9d5f9bb7-wrr98 -n network-aware-controllers
W0504 02:27:04.969172       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0504 02:27:04.970483       1 appgroup.go:104] "Starting App Group controller"
I0504 02:27:05.070875       1 appgroup.go:111] "App Group sync finished"
 kubectl logs -f networktopology-controller-65b7b4b464-tw2n6 -n network-aware-controllers
W0504 02:27:10.274079       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0504 02:27:10.275322       1 networktopology.go:147] "Starting Network Topology controller"
E0504 02:27:10.318650       1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.x-k8s.io \"a1\" not found"
E0504 02:27:10.318914       1 networktopology.go:300] "Error retrieving AppGroup..." err="appgroup.appgroup.diktyo.x-k8s.io \"a1\" not found"
jpedro1992 commented 1 year ago

Hi @dyyfk,

Can you scale all deployments to 3 replicas for instance, and see if the same pattern occurs? Or change the sorting algorithm?

Because your app follows the sequential pattern (p1 -> p2 -> p3) and you also chose to use KahnSort meaning pods will be deployed in the exact same order. So, no dependencies will exist in the cluster when deploying the pods sequentially, thus the costs are all 0, and Diktyo plugins do not affect the placement decision. However, if you scale all deployments, pods are already deployed in the cluster and different costs should be calculated for the different nodes.

The error in the networkTopology controller is docker related I am afraid. I had to completely reset my setup to get rid of it.

dyyfk commented 1 year ago

Thanks for your help. I was able to get it running with your suggestions!

I also updated the helm chart to install this scheduler in my repo. Thanks again for all your help and time.