srsran / srsRAN_Project

Open source O-RAN 5G CU/DU solution from Software Radio Systems (SRS) https://docs.srsran.com/projects/project
https://www.srsran.com
GNU Affero General Public License v3.0
520 stars 176 forks source link

TCP Causes in UE Drop with RM502Q-AE. UDP stable. (Latest Commit) #454

Closed AzeezEbrahim closed 5 months ago

AzeezEbrahim commented 9 months ago

Issue Description

I'm experiencing significant TCP connection instability and frequent disconnections when using commit 32dae89ea7a5041f3ca4947540dc85870567819b on srsRAN. This issue affects stable connections to services like YouTube, while UDP connections, tested via iperf, remain unaffected. Reverting to the latest stable release 23.10 resolves the TCP issues.

Could this be related to specific changes in the commit? Any insights or suggestions for troubleshooting this issue would be greatly appreciated.

Setup Details

Expected Behavior

works fine without disconnection. Stable tracing and rnti.

Actual Behaviour

Unstable connection. unable to open youtube due to disconnection and lags. videos hardly open. Unstable iperf from UE to GNB TCP start fine for ~3 seconds then disconnect.

Steps to reproduce the problem

Build latest commit 32dae89ea7a5041f3ca4947540dc85870567819b Use this UE: Quectel RM502Q-AE connect it to PC/laptop then run any website like youtube, etc. Or do TCP iperf.

Additional Information

logs at all_level:

2024-02-06T10:40:36.352860 [FAPI    ] [W] [   172.9] Real-time failure in FAPI: Received late UL_TTI.request from slot 172.8
2024-02-06T10:41:51.948191 [RLC     ] [W] du=1 ue=1 DRB1 DL: Reached maximum number of RETX. sn=103 retx_count=32
2024-02-06T10:43:11.744562 [RLC     ] [W] du=1 ue=1 DRB1 DL: Reached maximum number of RETX. sn=1556 retx_count=32

config file: gnb_b210_20MHz_oneplus_8t.yml.txt

pcaps:

gtpu_ngap_rlc_mac_PCAP.zip

pcap on **OGSTU ogstun (copy).pcapng.gz

alvasMan commented 9 months ago

Can you please put the all_level logging to info and attach the logs?

AzeezEbrahim commented 9 months ago

Can you please put the all_level logging to info and attach the logs?

@alvasMan here is the log gnb.log.txt

once i open youtube

 pci rnti  cqi  ri  mcs  brate   ok  nok  (%)  dl_bs | pusch  mcs  brate   ok  nok  (%)    bsr
   1 4602   15   1   24   151k   34   42  55%  4.47k | -21.2    1   120k   25  177  87%      0
   1 4602  n/a   1   20   166k   20    2   9%    761 | -24.4    0      0    0 1000 100%      0
   1 4602  n/a   1    0      0    0    0   0%  1.46k | -24.5    0      0    0 1000 100%      0
   1 4602    4   1    0    15k   19   30  61%      0 | -24.4    0      0    0 1000 100%      0
   1 4602    4   1    0   3.1k    8    3  27%      0 | -24.3    0      0    0 1000 100%      0

           -----------------DL-----------------------|------------------UL--------------------
 pci rnti  cqi  ri  mcs  brate   ok  nok  (%)  dl_bs | pusch  mcs  brate   ok  nok  (%)    bsr
   1 4603   15   1   23   4.8k    4    0   0%      0 |   4.5    9    13k    3    0   0%      0
   1 4603   15   1   28   5.2k    7    0   0%      0 |   4.6    9    17k    4    0   0%      0
alvasMan commented 9 months ago

OK this is a bit of an head scratcher. It's like the UE just silently jumps off and tries to re-attach.

Could do the following please:

  1. attach the log and the mac pcaps from the same run.
  2. Lower the log level of the MAC to debug. Ideally PHY in debug too, if you can run that without lates.
  3. Increase the logging of the GTP-U and PDCP to warning, we do not need those layers. RLC keep at info.
AzeezEbrahim commented 9 months ago

hey @alvasMan

here is the config that i made:

log: 
  all_level: info                   
  gtpu_level: warning                  
  pdcp_level: warning             
  mac_level: debug                
  phy_level: debug           
  rlc_level: info                  
pcap:
  mac_enable: true                                         
  mac_filename: /tmp/gnb_mac.pcap                              

screenshots:

image image

and here is the log and pcaps:

gnb_log.zip gnb_mac_pcap.zip

robertfalkenberg commented 9 months ago

Thank you for the logs @AzeezEbrahim.

So far we couldn't identify anything suspicious in the logs until the UE silently leaves the cell (i.e. no de-registration, it just stops transmitting). Unfortunately, we could not reproduce it on our side.

Since you confirmed it working with 23.10.1, could you please also check a few intermediate versions whether they have the issue or not?

ismagom commented 7 months ago

Hi @AzeezEbrahim , we were able to reproduce the issue and provided a fix. Could you test if:

https://github.com/srsran/srsRAN_Project/tree/fix_disconnect

fixes it for you?

FYI, the problem seemed to be we were using the extended size bit in the MAC header to indicate the size of a PDU which didn't need it.