nephio-project / nephio

Nephio is a Kubernetes-based automation platform for deploying and managing highly distributed, interconnected workloads such as 5G Network Functions, and the underlying infrastructure on which those workloads depend.
Apache License 2.0
96 stars 52 forks source link

[SIG2 dependency ]Requirements for E2E Test Bed #48

Closed aravind254 closed 1 year ago

aravind254 commented 1 year ago

SIG2 needs to clarify requirements for E2E test bed.

1) What are the number of clusters and what type of cluster (for example KIND) needs to be created? 2) What is the networking setup that is expected? 3) What are the workloads that needs to be setup in these clusters?

johnbelamaric commented 1 year ago

So, it sounds like you are looking for a description something like Eric did here: https://github.com/nephio-project/one-summit-22-workshop/blob/main/nephio-workshop.svg

johnbelamaric commented 1 year ago

@henderiw I think we want, for R1, just the e2e environment to be a single VM with:

We may want more workload clusters eventually but this would be a good start. A few other requirements:

I think there is more to the networking setup than that (for example VLANs), @henderiw can you please refine this/make it more accurate?

/assign @henderiw

electrocucaracha commented 1 year ago

Few more questions about this

henderiw commented 1 year ago

if you look at the sensible environment all of this is configurable. So I was thinking for mgmt test we only want mgmt cluster and for true E2E tests we create 4 clusters with the latest k8s version at this time.

So the ansible env. has the flexibility to handle these different case with some config input.

electrocucaracha commented 1 year ago

if you look at the sensible environment all of this is configurable. So I was thinking for mgmt test we only want mgmt cluster and for true E2E tests we create 4 clusters with the latest k8s version at this time.

So the ansible env. has the flexibility to handle these different case with some config input.

Hey @henderiw I was thinking to provision them with a multicluster KinD tool which can use a configuration file like this one. I can cover that on the next meeting.

henderiw commented 1 year ago

My understanding is that R1 will do an E2E call. As such we need to provide interconnectivity to all cluster on dedicated networks.

I called the networks in the following way.

Region cluster deploys 1 SMF and 1 AMF and 2 workload clusters deploy UPF.

Here are the parameters that will deploy an E2E testbed.

all: vars: cloud_user: gitea_username: gitea_password: dockerhub_username: dockerhub_token: proxy: http_proxy: https_proxy: no_proxy: host_os: "linux" # use "darwin" for MacOS X, "windows" for Windows host_arch: "amd64" # other possible values: "386","arm64","arm","ppc64le","s390x" tmp_directory: "/tmp" bin_directory: "/usr/local/bin" kubectl_version: "1.25.0" kubectl_checksum_binary: "sha512:fac91d79079672954b9ae9f80b9845fbf373e1c4d3663a84cc1538f89bf70cb85faee1bcd01b6263449f4a2995e7117e1c85ed8e5f137732650e8635b4ecee09" kind_version: "0.17.0" cni_version: "0.8.6" kpt_version: "1.0.0-beta.23" multus_cni_version: "3.9.2" nephio: install_dir: nephio-install packages_url: https://github.com/nephio-project/nephio-packages.git clusters: mgmt: {mgmt_subnet: 172.88.0.0/16, pod_subnet: 10.196.0.0/16, svc_subnet: 10.96.0.0/16} edge1: {mgmt_subnet: 172.89.0.0/16, pod_subnet: 10.197.0.0/16, svc_subnet: 10.97.0.0/16} edge2: {mgmt_subnet: 172.90.0.0/16, pod_subnet: 10.198.0.0/16, svc_subnet: 10.98.0.0/16} region1: {mgmt_subnet: 172.91.0.0/16, pod_subnet: 10.199.0.0/16, svc_subnet: 10.99.0.0/16} networkInstances: internal-vpc: {prefixes: [{prefix: 172.0.0.0/16, purpose: endpoint}]} external-vpc: {prefixes: [{prefix: 172.1.0.0/16, purpose: endpoint}]} sba-vpc: {prefixes: [{prefix: 172.2.0.0/16, purpose: endpoint}]} internet-vpc: {prefixes: [{prefix: 172.3.0.0/16, purpose: endpoint}, {prefix: 10.0.0.0/8, purpose: pool}]}

electrocucaracha commented 1 year ago

I have created the following diagram to visualize the Test bed setup:

VM setup

So the current HW requirements for the K8s clusters are:

Resource Request Limit
CPU 6.6 13.7
Memory(GB) 0.94 2.37

Recomendation: 8vCPUs, 6GB

How the Nephio workload tentatively looks is

free5g nephio

Name Latest release
ContainerD 1.7.0
CNI 1.1.2
Multus CNI 3.9.3

According to the k8s-conformance program, most of the Kubernetes distributions have been certified with 1.24.12 version.

gvbalaji commented 1 year ago

In our automation call today we discussed that we evolve our test infrastructure iteratively that could satisfy following scenarios in that order:

  1. Management cluster creation and deployment of all Nephio component and dependencies.
  2. Creation of repos, creation of 2-3 edge clusters and deployment of Nephio components and dependencies on edge cluster. Configuration of edge clusters (such as configsync repo set up), GRPC connection set up with management cluster and inter edge cluster communication set up.
  3. Deployment of free5gc functions ( AMF, SMF, UPF) using Nephio and make sure all pods come up.
  4. Test connection between UPF and SMF by establishing ping between them.
  5. Deployment of other free5gc functions on to the edge cluster ( where other 5gc control plane functions are running)
  6. Establish all the NFs are running.
  7. Deploy ueransim onto the edge cluster and establish end to end call.

Please add/correct if I missed anything here. We will create these as individual issues to work on them iteratively.

henderiw commented 1 year ago

I have created the following diagram to visualize the Test bed setup:

VM setup

So the current HW requirements for the K8s clusters are:

Resource Request Limit CPU 6.6 13.7 Memory(GB) 0.94 2.37

Recomendation: 8vCPUs, 6GB

How the Nephio workload tentatively looks is

free5g nephio

Name Latest release ContainerD 1.7.0 CNI 1.1.2 Multus CNI 3.9.3 According to the k8s-conformance program, most of the Kubernetes distributions have been certified with 1.24.12 version.

the CRDs in R1 will not allow for eth1 and eth2. ClusterContext have 1 master interface. We use vlans to distinguish the networks in R1 rather than a dedicated NIC. This is also more presentative to real world.

johnbelamaric commented 1 year ago

Is this complete? Can we close it?

electrocucaracha commented 1 year ago

Is this complete? Can we close it?

I was planning to update the diagram to reflect the latest comments of @henderiw . Regarding the SW and HW reqs, I think we're okay.

gvbalaji commented 1 year ago

The diagram and topology needs to be enhanced for all free5gc components and end to end call scenario with ueransim.

rravindran123 commented 1 year ago

The figure has to be updated, UERANSIM functions will be running within the VM context (assuming R1 is contained within a single VM setup), and these functions generally will be part of one of the Edge clusters as shown in this figure. But with the other UPF in edge-2 cluster then the N3 would be between the cluster, or we could have another UE/RANSIM as part of edge-2 as well. The decision is whether we go with 1 or 2 UE/gNodeB setup, or we could have one edge cluster have both the UPFs being served by the one or 2 gNodeB, with two DNNs.

Image

This is the source of this diagram : - https://github.com/Orange-OpenSource/towards5gs-helm/blob/main/docs/demo/Setup-free5gc-on-multiple-clusters-and-test-with-UERANSIM.md

johnbelamaric commented 1 year ago

Just following up here. Have we captured all of the above somewhere in a repo? Once that is done I think this can be closed.

electrocucaracha commented 1 year ago

Just following up here. Have we captured all of the above somewhere in a repo? Once that is done I think this can be closed.

I was trying to capture the requirements in this doc. I'll update it with the last comments.

johnbelamaric commented 1 year ago

Ok - we should move it to GH, I think.

gvbalaji commented 1 year ago

@electrocucaracha thanks. Looks like we need access to that doc.

electrocucaracha commented 1 year ago

@electrocucaracha thanks. Looks like we need access to that doc.

@aravind254 can you give access to @gvbalaji and @johnbelamaric

I have updated the diagram, but I'm not sure if UE has to be running as a container outside any cluster.

Screenshot 2023-04-24 at 6 25 15 PM

BTW, free5Gc has some Kernel restrictions (5.0.0-23-generic or 5.4.x.)

rravindran123 commented 1 year ago

Looking at the diagram again, it looks OK, we might be complicating it with Xn. Was also checking if UERANSIM supports Xn, looks like it doesn't currently. Also if we include Xn, that should be a different overlay, technically may not be correct too. Probably we just center that "N2 (bridge net)" in the diagram, since it was to the left, looked like Xn was intended in there.

https://github.com/aligungr/UERANSIM/discussions/492

johnbelamaric commented 1 year ago

As I understand it, the requirements are now understood and documented in a gdoc. Eventually that should move to GH as we implement the test bed, but that activity would fall under that implementation phase in SIG Release. So I am closing this ticket.