henderiw-nephio / network-config-operator

Apache License 2.0
0 stars 0 forks source link

Questions Regarding External Switch Integration and CR Examples for `network-config-operator` #1

Open avl-dev opened 9 months ago

avl-dev commented 9 months ago

Hi @henderiw,

Thanks for your contributions to the Nephio project with technologies such as gNMI and YANG Schemas. It's great to explore and contribute to such initiatives.

I've got a couple of questions regarding the network-config-operator repository.

1) Is it possible to use this operator with switches that aren't part of the Kubernetes/Nephio-edge-cluster environment (think external endpoints)? If so, could you provide guidance on the necessary YAML configuration adjustments? 2) Am I right in understanding that the Network Config Custom Resource handles the switch endpoint designation? Could you post/share a sample of the CR YAML configuration to demonstrate how to configure this operator for an SR Linux switch?

Cheers!

henderiw commented 9 months ago

On 28 Jan 2024, at 10:01, Andrew Larin @.***> wrote:

Hi @henderiw https://github.com/henderiw,

Thanks for your contributions to the Nephio project with technologies such as gNMI and YANG Schemas. It's great to explore and contribute to such initiatives.

I've got a couple of questions regarding the network-config-operator repository.

Is it possible to use this operator with switches that aren't part of the Kubernetes/Nephio-edge-cluster environment (think external endpoints)? If so, could you provide guidance on the necessary YAML configuration adjustments? WH> yes this is possible. Now we are about to release a new open source project that is more generic for this. In Nephio I added a capability to do this to make the networking work. With this new operator it is agnostic of the vendor and we can load any yang schema. So you can use it for in cluster NF(s) that use YANG or out of cluster elements that use YANG. Am I right in understanding that the Network Config Custom Resource handles the switch endpoint designation? Could you post/share a sample of the CR YAML configuration to demonstrate how to configure this operator for an SR Linux switch? WH> the CR is flexible and adheres to the yang model of the device. So the yang schema is translated to YAML.

Here is a CR

@.***:~$ k get networks.config.nephio.org vpc-internet-srl -o yaml apiVersion: config.nephio.org/v1alpha1 kind: Network metadata: creationTimestamp: "2023-06-29T07:18:24Z" finalizers:

avl-dev commented 9 months ago

@henderiw Continuing with my research for the project!

  1. So, I deployed network-config-operator current state is:
    
    ubuntu@k8s-n1:~/network-config$ kubectl get all -n network-config
    NAME                                             READY   STATUS    RESTARTS   AGE
    pod/network-config-controller-847b864585-x44hq   2/2     Running   0          17m

NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/network-config-controller 1/1 1 1 66d

NAME DESIRED CURRENT READY AGE replicaset.apps/network-config-controller-54b956b8cd 0 0 0 36m replicaset.apps/network-config-controller-7849b48f66 0 0 0 66d replicaset.apps/network-config-controller-847b864585 1 1 1 17m


2. And deployed SR Linux'es in Kubernetes  with Containerlab (Clabernetes). Exported IPs of srl1 and srl2 (k8s service type service/srl02-srl1)  are `127.18.1.10` and `127.18.1.11` (gNMI and ssh working at these addresses):

ubuntu@k8s-n1:~$ kubectl get all -n clabernetes NAME READY STATUS RESTARTS AGE pod/clabernetes-manager-84bbd45b78-b7nj5 1/1 Running 0 2d4h pod/clabernetes-manager-84bbd45b78-lxx4q 1/1 Running 0 2d4h pod/clabernetes-manager-84bbd45b78-wnsrc 1/1 Running 0 2d4h pod/srl02-srl1-86db474d86-26st8 1/1 Running 0 2d2h pod/srl02-srl2-6fc98b488c-5rhx4 1/1 Running 0 2d2h

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/srl02-srl1 LoadBalancer 10.43.217.21 127.18.1.10 161:31478/UDP,21:31631/TCP,22:32725/TCP,23:32198/TCP,80:32559/TCP,443:32285/TCP,830:31538/TCP,5000:30910/TCP,5900:30320/TCP,6030:32422/TCP,9339:30208/TCP,9340:30762/TCP,9559:32698/TCP,57400:31588/TCP 2d2h service/srl02-srl1-vx ClusterIP 10.43.5.253 14789/UDP 2d2h service/srl02-srl2 LoadBalancer 10.43.4.162 127.18.1.11 161:30289/UDP,21:30171/TCP,22:31054/TCP,23:31331/TCP,80:31699/TCP,443:32525/TCP,830:30422/TCP,5000:30543/TCP,5900:32082/TCP,6030:31714/TCP,9339:31259/TCP,9340:30315/TCP,9559:32479/TCP,57400:32653/TCP 2d2h service/srl02-srl2-vx ClusterIP 10.43.188.17 14789/UDP 2d2h

NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/clabernetes-manager 3/3 3 3 2d4h deployment.apps/srl02-srl1 1/1 1 1 2d2h deployment.apps/srl02-srl2 1/1 1 1 2d2h

NAME DESIRED CURRENT READY AGE replicaset.apps/clabernetes-manager-84bbd45b78 3 3 3 2d4h replicaset.apps/srl02-srl1-86db474d86 1 1 1 2d2h replicaset.apps/srl02-srl2-6fc98b488c 1 1 1 2d2h


3. Created `Network CR` for IPs of my Sonic SR Linux in namespace `network-config` (where network-config-controller located):

ubuntu@k8s-n1:~$ cat clab_default.yaml apiVersion: infra.nephio.org/v1alpha1 kind: Network metadata: name: default spec: topology: clabernetes routingTables:

Logs in Controller container of pod/network-config-controller-847b864585-x44hq in network-config-controller:

ubuntu@k8s-n1:~$ kubectl logs pod/network-config-controller-847b864585-x44hq -c controller -n network-config
2024-02-04T14:18:16.393Z        INFO    controller-runtime.metrics      Metrics server is starting to listen    {"addr": "127.0.0.1:8080"}
2024-02-04T14:18:16.394Z        INFO    setup   setup controller
2024-02-04T14:18:16.396Z        INFO    setup   reconciler      {"name": "targets", "enabled": true}
2024-02-04T14:18:16.396Z        INFO    setup   reconciler      {"name": "networkconfigs", "enabled": true}
2024-02-04T14:18:16.396Z        INFO    setup   starting manager
2024-02-04T14:18:16.397Z        INFO    starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
2024-02-04T14:18:16.397Z        INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
I0204 14:18:16.397358       1 leaderelection.go:245] attempting to acquire leader lease network-config/nephio-operators.nephio.org...
I0204 14:18:39.114092       1 leaderelection.go:255] successfully acquired lease network-config/nephio-operators.nephio.org
2024-02-04T14:18:39.114Z        DEBUG   events  network-config-controller-847b864585-x44hq_1ab6e0f2-6235-43c5-a294-e57fdb634254 became leader   {"type": "Normal", "object": {"kind":"Lease","namespace":"network-config","name":"nephio-operators.nephio.org","uid":"47013b20-5d34-4327-a1d3-ae45e2f1abec","apiVersion":"coordination.k8s.io/v1","resourceVersion":"301235160"}, "reason": "LeaderElection"}
2024-02-04T14:18:39.115Z        INFO    Starting EventSource    {"controller": "TargetController", "controllerGroup": "inv.nephio.org", "controllerKind": "Target", "source": "kind source: *v1alpha1.Target"}
2024-02-04T14:18:39.115Z        INFO    Starting Controller     {"controller": "TargetController", "controllerGroup": "inv.nephio.org", "controllerKind": "Target"}
2024-02-04T14:18:39.115Z        INFO    Starting EventSource    {"controller": "NetworkConfigController", "controllerGroup": "config.nephio.org", "controllerKind": "Network", "source": "kind source: *v1alpha1.Network"}
2024-02-04T14:18:39.115Z        INFO    Starting Controller     {"controller": "NetworkConfigController", "controllerGroup": "config.nephio.org", "controllerKind": "Network"}
2024-02-04T14:18:39.219Z        INFO    Starting workers        {"controller": "TargetController", "controllerGroup": "inv.nephio.org", "controllerKind": "Target", "worker count": 1}
2024-02-04T14:18:39.223Z        INFO    Starting workers        {"controller": "NetworkConfigController", "controllerGroup": "config.nephio.org", "controllerKind": "Network", "worker count": 1}

Question is – What steps should be next to ensure that network-config-controller finds the IPs 127.18.1.10, 127.18.1.11 and starts communicating via gNMI with the routers? Maybe some steps between/after 1-3 are missing or the Network CR is not properly defined in step 3?

Thank you for your help!

hakenlaken commented 8 months ago

@henderiw waiting for your response! Thanks!

henderiw commented 8 months ago

this will not work as clabernetes is cluster local and we need aa multi-cluster solution. It is being worked on after we deliver nf2infra