nephio-project / nephio

Nephio is a Kubernetes-based automation platform for deploying and managing highly distributed, interconnected workloads such as 5G Network Functions, and the underlying infrastructure on which those workloads depend.
Apache License 2.0
102 stars 53 forks source link

e2e: enhance UPF scaling test with some additional checks #311

Open johnbelamaric opened 1 year ago

johnbelamaric commented 1 year ago

In 008.sh we test that we can do the update, and that the deployment comes back up. But there are probably a few more tweaks needed.

We should verify that the operator generates an increased memory and/or CPU requests/limits for the deployment. This will require:

Finally, once all that is done, we should try another call through that UPF.

johnbelamaric commented 1 year ago

https://github.com/nephio-project/test-infra/pull/111 and https://github.com/nephio-project/test-infra/pull/129 fix several of these (just not the call).

johnbelamaric commented 1 year ago

So, I ran these (ie, scaled the SMF and the UPF) and then tried to do a ping like we do in 007.sh, and it failed.

I restarted the UE pod and then it worked.

So, I think we're OK - this seems to be a limitation in the UERANSIM?

@henderiw @tliron @s3wong @matysiaq @denysaleksandrov @rravindran123 WDYT?

rravindran123 commented 1 year ago

May not be a limitation, i would expect that behavior as the SMF is the control plane for the UPF, and manages GTP session between UPF and the gNodeB for all the UE it is serving. So SMF going offline would mean all those corresponding PDU sessions will be deactivated by the UPF/gNodeB (for the latter should be the AMF). UPF can detect SMF's liveliness using the heartbeat that goes on between the two services, guess this is on by default in free5gc. Unless we have some form of HA for these services..

johnbelamaric commented 1 year ago

I guess I thought the UE would detect it going down and then reconnect, but maybe not. It's ok, I don't think restarting the UE pod is a problem.

/assign @n2vo

rravindran123 commented 1 year ago

Thinking about it again, you are right, all the UE has to do is to re-issue the PDU session request, since the UDR/UDM should have its registration state (one shoudn't reset their UE if there is a 5GC component failure), but then there should be also be some trigger on the RRC end for UE to re-issue the request. Will have to dig 3GPP to see the expected behavior, but as per the specs if the UE is expected to reissue after the SMF/UPF fails/restarts, may be a limitation with free5gc implementation.

denysaleksandrov commented 1 year ago

ueansim never re-establish a session. It's a simple test software and not all 3GPP features are implemented. Just killing a ueransim pod after UPF/SMF capacity change should be sufficient.

Best Regards, Denys Aleksandrov

On Tue, Jul 11, 2023 at 6:28 AM Ravi Ravindran @.***> wrote:

Thinking about it again, you are right, all the UE has to do is to re-issue the PDU session request, since the UDR/UDM should have its registration state (one shoudn't reset their UE if there is a 5GC component failure), but then there should be also be some trigger on the RRC end for UE to re-issue the request. Will have to dig 3GPP to see the expected behavior, but as per the specs if the UE is expected to reissue after the SMF/UPF fails/restarts, may be a limitation with free5gc implementation.

— Reply to this email directly, view it on GitHub https://github.com/nephio-project/nephio/issues/311#issuecomment-1630095312, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4BZ63QDM6PXS46ZH3Y76LXPTI7FANCNFSM6AAAAAAZISUOSA . You are receiving this because you were mentioned.Message ID: @.***>