herlesupreeth / Kamailio_IMS_Config

Fixed version of Kamailio IMS configuration files for basic calling
45 stars 36 forks source link

2nd call drops #3

Closed djanderson closed 3 years ago

djanderson commented 3 years ago

Hi Supreeth,

Great job on these configs.

With your latest change, I am able to place multiple consecutive calls for the first time.

However, I'm noticing a pattern, that my first call is very consistent (I can leave the call running for a minute or more), but subsequent calls seem to disconnect after a short time (10-20 seconds) with one phone seeming to initiate the drop with a SIP request BYE.

I can do more digging. but I just wanted to see if you're seeing something similar.

I'm using one IPsec phone and one non-IPsec phone, if it matters.

herlesupreeth commented 3 years ago

Are you using any SDR for eNB ? I used to face similar issue on srsLTE with usrp b210. That is because ACK for the call was taking very long time to reach UE from the start of the call. I am not totally sure where to debug this issue.

herlesupreeth commented 3 years ago

Send me a pcap i will be able to say for sure whether its the ACK issue i mentioned above

djanderson commented 3 years ago

I'm using a USRP B205 with openairinterface5g as the eNB.

I would love to provide a PCAP but I don't think that will be possible... I know that makes it a lot harder, so sorry about that.

I'll take a look at the ACK timing.

One thing I'm noticing is a lot of the following lines in the rtpengine output. There seems to be more in the dropped calls:

More than 30 duplicate packets detected, dropping packet to avoid potential loop

Is it normal to see that in the rtpengine logs?

herlesupreeth commented 3 years ago

Is it normal to see that in the rtpengine logs?

I wouldn't exactly say as normal. But sometimes the call sessions don't get teared down in rtpengine by PCSCF. In those cases you can use rtpengine-ctl terminate to delete a particular call which you know had ended or for clearing all calls run rtpengine-ctl terminate all

I guess you need to define the liste-ctl port for rtpengine in the config in order to use rtpengine-ctl

Regarding the ACK, if the ACK takes too long i.e around 7--10 seconds then the call will definitely get disconnected

Are you using any GPSDO with your USRP? Maybe that could help a bit. I will try subsequent calls next week once I get my hands on USRP B210 but I plan to use a GPSDO. I will let you know how it goes

djanderson commented 3 years ago

I'm still analyzing the disconnect, but I found that adding the flag

loop-protect

to the several instances of rtpproxy_offer_flags and rtpproxy_answer_flags at the top of rtp.cfg helped eliminate the "duplicate packet" issue.

loop protect is not documented in the kamailio 5.3.x rtpengine module docs, but is explained in the RTPEngine README.

herlesupreeth commented 3 years ago

Thanks for letting me know. I am facing a different issue where PCSCF is crashing sometimes after a unsuccessful call, I am debugging it, I will let you know.

herlesupreeth commented 3 years ago

I had never tried back to back calls before, but I also observing the same as you are. The RPTEngine is throwing lots of warning for duplicate packets and the call gets dropped after few seconds (definitely not an ACK problem as I mentioned before and neither the eNB issue)

herlesupreeth commented 3 years ago

@djanderson I pushed a change for rtp.cfg can you please try with that fix and let me know. I tried with 5 consecutive calls and it worked fine for me

djanderson commented 3 years ago

I gave this a try today and it didn't completely fix my issue. The first call works great - subsequent calls have a range of issues, from no SIP traffic at all after the successful Create Bearer Response, to more likely seeing only RTP traffic flowing from one phone. To be honest, I haven't ruled out OAI's dedicated bearer tear-down logic, so I'll verify that and get back.

I think I still have a lingering IPsec routing issue. I've noticed that NOTIFY and OPTIONS are not being properly routed over IPsec. I saw https://github.com/open5gs/open5gs/issues/483 but that user just said NOTIFY was already being routed but OPTIONS weren't. I'm looking at the route[NOTIFY] and don't see any ipsec_forward. Should there be something there?

djanderson commented 3 years ago

Again I apologize for not being able to post PCAPs or configs. I'm mostly interested to see if you're able to reproduce what I'm seeing on your devices!

herlesupreeth commented 3 years ago

That is quite strange. I am testing these changes using docker itself (https://github.com/herlesupreeth/docker_open5gs) and it works great for me.

Regarding IPSec issues, currently the only limitation is that whenever there is a sqn mismatch there would be lingering IPSec tunnels which cannot be deleted ( i am still finding a way to delete them). And there can be maximum of 10 IPSec connections.

herlesupreeth commented 3 years ago

I believe NOTIFY is an In-dialog request so it follows the already established REGISTER - SUBSCRIBE IPSec route. Let me also check that and get back to you

herlesupreeth commented 3 years ago

Please find the pcap attached..in that you can observe the packet 4168, SIP NOTIFY is going through IPSec (ESP Encapsulated) docker_reg.zip

djanderson commented 3 years ago

Thanks for that, I was able to identify one difference that I think it causing at least the NOTIFY issue and possibly others.

Your phone is using UDP over IPsec. The 3 IPsec phones I have tested all use TCP over IPsec. When Kamailio goes to forward the NOTIFY, it does so over TCP, but as a client (since the NOTIFY is not a response to a messages from the phone). Kamailio tries to open a new TCP connection and fails.

In newer versions of Kamailio there seems to be a new flags option to ipsec_forward which may have been designed to deal with this: https://kamailio.org/docs/modules/devel/modules/ims_ipsec_pcscf.html#idm159, but it's not available in 5.3.x.

flags - bitwise flag: 0x01 - set force socket for request messages. Useful for ipsec and TCP.

I'm taking a look into that to see if it makes sense to backport.

herlesupreeth commented 3 years ago

Nice find. Here is patch for kamailio source code, let me know if that fixes your issue

diff --git a/src/modules/ims_ipsec_pcscf/cmd.c b/src/modules/ims_ipsec_pcscf/cmd.c
index d5cd589417..cd4a222de9 100644
--- a/src/modules/ims_ipsec_pcscf/cmd.c
+++ b/src/modules/ims_ipsec_pcscf/cmd.c
@@ -850,6 +850,11 @@ int ipsec_forward(struct sip_msg* m, udomain_t* d)
         // for Request sends to UE server port
         dst_port = s->port_us;
     }
+    // for Request sends from P-CSCF client port
+    src_port = s->port_pc;
+
+    // for Request sends to UE server port
+    dst_port = s->port_us;

     int buf_len = snprintf(buf, sizeof(buf) - 1, "sip:%.*s:%d", ci.via_host.len, ci.via_host.s, dst_port);
djanderson commented 3 years ago

I was a bit off-base with what I said about UDP/TCP. I think I had clicked on the SUBSCRIBE in your PCAP and made an assumption, but I see now that your phone is also using TCP for the majority of messages.

So, after more research, current leads are:

The above was not correct, I did find that the REGISTER which is being replied to was sent from the UE's s port, which was not what I expected.

herlesupreeth commented 3 years ago

@djanderson After a lot of digging and traces I may have found the root cause of IPSec issue and it will take some effort to fix. It has to do with fixed path used for In-Dialog request and reply and it being used in case of transport protocol used (TCP/UDP).

djanderson commented 3 years ago

Can you explain a bit more about the IPsec issue you're still seeing?

I was finally able to complete multiple back-to-back calls and the issue in my case had to do with using openairinterface5g for the eNB. When oai5g sends RRC Reconfiguration Response for the release of E-RAB-id 7 (after first call ends) it assumes the MME will re-request E-RAB-id 7 next, but the open5gs MME increments the next request to 8 even through 7 is free, which was causing a mismatch between DRB and e-RAB in oai5g after the second call.

If you can explain the remaining IPsec issue I can see if I can confirm/reproduce it.

herlesupreeth commented 3 years ago

Glad that you were able to resolve issue of 2nd call drop

Regarding the IPSec, I may need to write it down as its a bit complicated. I will send you soon what issue I am facing and may be you can help

herlesupreeth commented 3 years ago

Maybe I wont be able to explain this properly, please feel free to ask any question if I am not clear.

For reference, I am using the pcap found in this github issue comment https://github.com/herlesupreeth/docker_open5gs/issues/7#issuecomment-708605448 . In that pcap, if you observe the call flow

  1. INVITE from UE1 to UE2 is proper (All inside IPSec tunnel) - there will be tag only in From header
  2. Response 183 from UE2 to UE1 is also proper (All inside IPSec tunnel) At this point a Dialog is created - i.e. there is tag in From header and tag in To header as well
  3. But during PRACK, due to handling of message with MTU > 1300 mechanism i set in PCSCF (without this PCSCF crashes for huge packets), there is some weird behaviour. PRACK received from UE1 is TCP, but the PRACK sent to UE2 is UDP (packet size < 1300) - For PRACK and other In-Dialog request and reply the routing is handled in route[WITHINDLG] code block in kamailio_pcscf.cfg and the route is determined based on Request URI/Destination URI/Route URI (refere loose_route() function and t_relay() function)

Now the complication starts, when using IPSec the PCSCF and UE cant just send out Requests and Replies to IPSec port on random order (atleast not on iPhone as its strictly adheres to 3gpp spec). More details can be found in below links

As per ETSI TS 133 203 V11.2.0, 7.1 Security association parameters https://tools.ietf.org/html/rfc3261#section-18

In a gist, the IPSec ports on UE are port_us (UE server port) and port_uc (UE client port) and similarly on PCSCF its port_pc (PCSCF server port) and port_pc (PCSCF client port) - I maybe completely wrong in interpreting this, it would be great to have your comments about this after reading the above spec.

for Requests originating from UE

for replies to UE

for Requests originating from PCSCF

for replies to PCSCF

  1. Strangely, due to PRACK transport getting changed from TCP to UDP may have cause UE2 to reply with 200 OK (packet 2176) with UDP (wrong ports being sent on and no ESP encapsulation)

Sending Requests and Replies over correct IPSec ports is what I am trying to solve. Your insights would be very helpful.