travelping / upg-vpp

User Plane Gateway (UPG) based on VPP
Apache License 2.0
146 stars 51 forks source link

ICMP reply doesn't reach UE #387

Closed Sabreu closed 6 months ago

Sabreu commented 6 months ago

Hello,

I just tried to setup upg-vpp (v.1.12.0) inside a docker to work with open5gs. I have real UE and real gnodeb. icmp request seems to be working fine, goes from gnb to N3 and then via N6 to pc I'm pinging (local pc).

icmp reply starts from the same local pc and goes even to gnb but not for the UE. One odd thing I see from gnb pcap is that the icmp reply source port is 35727 (10.20.4.2/n3), why not 2152? (with another UPF it's 2152) Then the other thing is that there is no Extension header (PDU session container) in the reply (with another UPF this is present), although I'm not sure if this is mandatory or not.

All help appreciated as I have been banging my head for long time with this :)

my upg confs:

set interface ip table n3 0 set interface mtu 9000 n3 set interface ip address n3 10.20.4.2/16 set interface state n3 up

set interface ip table n6 0 set interface mtu 9000 n6 set interface ip address n6 10.30.4.4/24 set interface state n6 up

set interface ip table n4 0 set interface mtu 9000 n4 set interface ip address n4 10.10.4.3/24 set interface state n4 up

ip route add 0.0.0.0/0 table 0 via 10.30.4.1 n6 upf pfcp endpoint ip 10.10.4.3 vrf 0 upf node-id fqdn 10.10.4.3 upf nwi name internet vrf 0 upf specification release 16 upf gtpu endpoint ip 10.20.4.2 nwi internet teid 0x000004d2/2

heapsize 2G unix { nodaemon log /etc/vpp/vpp.log full-coredump gid vpp interactive cli-listen /run/vpp/cli.sock exec /etc/vpp/init.conf }

api-trace { on }

cpu { main-core 0 corelist-workers 1-7 }

api-segment { gid vpp }

dpdk { dev 0000:01:00.1 {name n3} dev 0000:01:00.2 {name n4} dev 0000:01:00.3 {name n6} }

plugins { path /usr/lib/x86_64-linux-gnu/vpp_plugins/ plugin dpdk_plugin.so { enable } plugin upf_plugin.so { enable } plugin oddbuf_plugin.so { enable } }

mgumz commented 6 months ago

icmp-to-from-ue: state the framed-ip range. check the routes in both directions. "icmp reply starts from the same local pc and goes even to gnb but not for the UE." … so the icmp-reply is encapsulated into gtp-u. check the .pcap to analyse the traffic (gtp-u tunnel is the right one). and since the icmp-reply is already on the gnb, i would first start there.

you can use https://s3-docs.fd.io/vpp/22.02/cli-reference/clis/clicmd_src_plugins_dispatch-trace.html to get also more insight on the traversal of the packets. but, as said: if the packet is already on the gnb … and it is in the right gtp-u-tunnel, there is not much that we can help you with.

in regards to ports and extensions etc: feel free to raise a bug if you think it is against the specs.

Sabreu commented 6 months ago

The reference UPF has this in the reply and this one works: Extension header (PDU Session container) Extension Header Length: 1 PDU Session Container 0000 .... = PDU Type: DL PDU SESSION INFORMATION (0) .... 0000 = Spare: 0x0 0... .... = Paging Policy Presence (PPP): Not Present .0.. .... = Reflective QoS Indicator (RQI): Not Present ..00 0001 = QoS Flow Identifier (QFI): 1 Next extension header type: No more extension headers (0x00)

When using UPG I can see from gnb log that packet is discarded due to no dl pdu session information. I don't know about the spec but seems to me that this is mandatory. Port is fine I think, just the range is too big for me but I can probably hack that.

RoadRunnr commented 6 months ago

UPG does not support the PDU session container. Looks like you need that. AFAIK gNb will not accept GTP traffic without one.

The OAI project has a patched version of UPG that does have some kind of session container support. Try that.

Sabreu commented 6 months ago

Thanks for pointing that out. Any plan to support anytime soon? In the mean time I will try the OAI version...

RoadRunnr commented 6 months ago

Any plan to support anytime soon?

No

For reference, this is their patch: https://gitlab.eurecom.fr/oai/cn5g/oai-cn5g-upf-vpp/-/commit/d9909c42e7c70029c3d6b39ebf5132d99032c231

They are adding a fixed QFI. That is not 100% standards conform as the SMF should define which QFI is used in the PFCP session. But is does seem to work in most cases.

mitmitmitm commented 6 months ago

I also have a branch with somewhat complete PDU sess. container / QFI support at https://github.com/mitmitmitm/upg-vpp/tree/qfi. Will hopefully submit it as a PR soon.

Sabreu commented 6 months ago

@mitmitmitm awesome! I will try that branch soon when I have time

Sabreu commented 6 months ago

I also have a branch with somewhat complete PDU sess. container / QFI support at https://github.com/mitmitmitm/upg-vpp/tree/qfi. Will hopefully submit it as a PR soon.

This one seem to crash when icmp request comes from gnb to upf. master branch does not crash.

How to get some logs where this crashes? running it in docker..

mitmitmitm commented 6 months ago

You are correct, I made some mistakes during git rebase, sorry. Will hopefully fix it by tommorow.

mitmitmitm commented 6 months ago

OK, I force-pushed a fix that should work now. Thanks for feedback.

Sabreu commented 6 months ago

policer [error ]: Policer parameter validation failed -- 1R. policer [error ]: Unable to compute hw param. Error: -1 policer [error ]: Policer parameter validation failed -- 1R. policer [error ]: Unable to compute hw param. Error: -1

First time seeing these, any idea if the new code could trigger these?

mitmitmitm commented 6 months ago

Sabreu @.***> writes:

policer [error ]: Policer parameter validation failed -- 1R. policer [error ]: Unable to compute hw param. Error: -1 policer [error ]: Policer parameter validation failed -- 1R. policer [error ]: Unable to compute hw param. Error: -1

First time seeing these, any idea if the new code could trigger these?

Yes, these are due to

diff --git a/upf/upf_pfcp_api.c b/upf/upf_pfcp_api.c
@@ -2627,6 +2642,9 @@ handle_session_establishment_request (pfcp_msg_t *msg,
   if ((r = handle_create_urr (sess, req->create_urr, now, resp)) != 0)
     goto out_send_resp;

+  if ((r = handle_create_qer (sess, req->create_qer, now, resp)) != 0)
+    goto out_send_resp;
+
   r = pfcp_update_apply (sess);
   upf_debug ("Apply: %d\n", r);

You can ignore these errors. Or alternatively fix them with this patch, which also fixes MBR enforcement.

From add1a95c8e8419339dc927746796c197c857c646 Mon Sep 17 00:00:00 2001
From: mitmitmitm ***@***.***>
Date: Tue, 17 Oct 2023 08:28:20 +0200
Subject: [PATCH] Implement MBR QoS enforcement and averaging window

---
 upf/upf.h          | 1 +
 upf/upf_pfcp.c     | 7 +++++++
 upf/upf_pfcp.h     | 1 +
 upf/upf_pfcp_api.c | 6 ++++++
 4 files changed, 15 insertions(+)

diff --git a/upf/upf.h b/upf/upf.h
index 3a02302..84a5e4a 100644
--- a/upf/upf.h
+++ b/upf/upf.h
@@ -678,6 +678,7 @@ typedef struct
   u8 gate_status[UPF_DIRECTION_MAX];

   pfcp_ie_mbr_t mbr;
+  u32 averaging_window_ms;
   u8 qfi;
   clib_bihash_kv_8_8_t policer;
 } upf_qer_t;
diff --git a/upf/upf_pfcp.c b/upf/upf_pfcp.c
index d810b55..4dd50d0 100644
--- a/upf/upf_pfcp.c
+++ b/upf/upf_pfcp.c
@@ -1110,14 +1110,21 @@ init_qer_policer (upf_qer_t *qer)
   };
   upf_main_t *gtm = &upf_main;
   upf_qer_policer_t *pol;
+  u32 averaging_window_ms;

   pool_get_aligned_zero (gtm->qer_policers, pol, CLIB_CACHE_LINE_BYTES);
   qer->policer.value = pol - gtm->qer_policers;

+  averaging_window_ms = qer->averaging_window_ms;
+  if (averaging_window_ms == 0)
+    averaging_window_ms = UPF_DEFAULT_AVERAGING_WINDOW_MS;
+
   cfg.rb.kbps.cir_kbps = qer->mbr.ul;
+  cfg.rb.kbps.cb_bytes = qer->mbr.ul * averaging_window_ms / 8;
   pol_logical_2_physical (&cfg, &pol->policer[UPF_UL]);

   cfg.rb.kbps.cir_kbps = qer->mbr.dl;
+  cfg.rb.kbps.cb_bytes = qer->mbr.dl * averaging_window_ms / 8;
   pol_logical_2_physical (&cfg, &pol->policer[UPF_DL]);

   clib_bihash_add_del_8_8 (&gtm->qer_by_id, &qer->policer, 1 /* is_add */);
diff --git a/upf/upf_pfcp.h b/upf/upf_pfcp.h
index 18118bc..7535b09 100644
--- a/upf/upf_pfcp.h
+++ b/upf/upf_pfcp.h
@@ -18,6 +18,7 @@
 #include "upf.h"

 #define MAX_LEN 128
+#define UPF_DEFAULT_AVERAGING_WINDOW_MS 1500

 #define upf_pfcp_associnfo(gtm, ...)                                          \
   vlib_log_info ((gtm)->log_class, __VA_ARGS__)
diff --git a/upf/upf_pfcp_api.c b/upf/upf_pfcp_api.c
index abf935e..b5b738b 100644
--- a/upf/upf_pfcp_api.c
+++ b/upf/upf_pfcp_api.c
@@ -2127,6 +2127,9 @@ handle_create_qer (upf_session_t *sx, pfcp_ie_create_qer_t *create_qer,
           create->mbr = qer->mbr;
         }

+      if (ISSET_BIT (qer->grp.fields, CREATE_QER_AVERAGING_WINDOW))
+        create->averaging_window_ms = qer->averaging_window;
+
       if (ISSET_BIT (qer->grp.fields, CREATE_QER_QOS_FLOW_IDENTIFIER))
         create->qfi = qer->qos_flow_identifier & GTPU_PDU_CONT_QFI_MASK;
       else
@@ -2185,6 +2188,9 @@ handle_update_qer (upf_session_t *sx, pfcp_ie_update_qer_t *update_qer,
           update->mbr = qer->mbr;
         }

+      if (ISSET_BIT (qer->grp.fields, UPDATE_QER_AVERAGING_WINDOW))
+        update->averaging_window_ms = qer->averaging_window;
+
       if (ISSET_BIT (qer->grp.fields, UPDATE_QER_QOS_FLOW_IDENTIFIER))
         update->qfi = qer->qos_flow_identifier & GTPU_PDU_CONT_QFI_MASK;

-- 
2.43.0
Sabreu commented 6 months ago

Extension header (PDU Session container) Extension Header Length: 1 PDU Session Container 0001 .... = PDU Type: UL PDU SESSION INFORMATION (1) .... 0000 = Spare: 0x0 00.. .... = Spare: 0x0 ..00 0001 = QoS Flow Identifier (QFI): 1 Next extension header type: No more extension headers (0x00)

Should this be DL PDU not UL? at least gnb is saying unknown extension header, discard.

In addition any idea where I could limit the source ports used in N3 as gnb won't accept such high range of ports. I thought it would be the UPF_NAT_MIN_PORT in upf.h but seems like it's not after all.

mitmitmitm commented 6 months ago

Extension header (PDU Session container) Extension Header Length: 1 PDU Session Container 0001 .... = PDU Type: UL PDU SESSION INFORMATION (1) .... 0000 = Spare: 0x0 00.. .... = Spare: 0x0 ..00 0001 = QoS Flow Identifier (QFI): 1 Next extension header type: No more extension headers (0x00)

Should this be DL PDU not UL? at least gnb is saying unknown extension header, discard.

Thanks, I force-pushed another fix for this.

In addition any idea where I could limit the source ports used in N3 as gnb won't accept such high range of ports. I thought it would be the UPF_NAT_MIN_PORT in upf.h but seems like it's not after all.

In upf/upf_gtpu_encap.c, function upf_encap_inline, you can edit all occurrences of

udp0->src_port = flow_hash0;
udp1->src_port = flow_hash1;
...

to specify

udp0->src_port = clib_host_to_net_u16 (UDP_DST_PORT_GTPU);

Note however that TS 29.281 - General Packet Radio System (GPRS) Tunnelling Protocol User Plane (GTPv1-U), 4.4.2.0 UDP header and port numbers instructs otherwise:

For the GTP-U messages described below (other than the Echo Response message, see clause 4.4.2.2), the UDP Source Port or the Flow Label field (see IETF RFC 6437 [37]) should be set dynamically by the sending GTP-U entity to help balancing the load in the transport network.

Sabreu commented 6 months ago

I did git reset for you branch and installed but still seeing UL PDU SESSION INFORMATION. I don't see any commits in your branch that would have happened today?

mitmitmitm commented 6 months ago

Whoops, forgot to actually git push, please try again.

Sabreu commented 6 months ago

Now it works. icmp request and reply OK. In fact it seems that gnb accepts the default source ports, seems like the range I thought it would accept is for sending packet from gnb to upf.

Sabreu commented 6 months ago

Seems like the upf crashes after some time has passed, does vpp dump logs somewhere so that this could be checked further? Edit: might be because of multithread, changed to single thread and so far seems better but time will tell.

mitmitmitm commented 6 months ago

This depends on your startup.conf. If it includes something like

unix {
  nodaemon
  nosyslog
  interactive
}

vpp will run as a foreground process and logs will go to stderr.

For crash debugging, you can also inspect or send a backtrace at the point of a crash. Make sure that vpp and upg_vpp are built in debug mode or with debugging symbols. After vpp crashes, try finding its coredump with coredumpctl(1) and produce a backtrace with something like

$ coredumpctl debug vpp
(gdb) thread apply all bt full

Alternatively, run vpp under gdb with

$ gdb --args vpp <VPP_ARGS>
(gdb) run
# Wait for crash
(gdb) thread apply all bt full

You can also send your startup.conf and other VPP configuration.

Sabreu commented 6 months ago

Thanks, just not sure if any of these will apply to docker deployment as it will die on vpp crash. Also my confs are in the first post of this issue. So far it seems that single core worker doesn't crash (still running).

s5uishida commented 3 months ago

Hi @mitmitmitm

Could you have a plan to make a pull request of your qfi branch to UPG-VPP?

I think that if QFI is able to use with UPG-VPP, it will work with gNodeB of srsRAN_Project. For reference, the current status that I have confirmed to work is as follows.

https://github.com/s5uishida/simple_confirmed_info_for_mobile_network/tree/main#ping-and-iperf3

Best regards,

--Shigeru

mitmitmitm commented 2 months ago

Sorry, probably about half a year.

abousselmi commented 2 months ago

Can someone from the community take a look at @mitmitmitm's patch and share some feedback. Thanks !