srsran / srsRAN_Project

Open source O-RAN 5G CU/DU solution from Software Radio Systems (SRS) https://docs.srsran.com/projects/project
https://www.srsran.com
GNU Affero General Public License v3.0
476 stars 163 forks source link

Not all UEs recoved their data after a reestablishment. #588

Closed houshangaziziaccelleran closed 4 months ago

houshangaziziaccelleran commented 4 months ago

Any idea on the reestablishment issue we are experiencing. Here is a short explanation of the scenario, the result and the snippet from the gnb.log.

Test:

Result:

From gnb.log:

2024-04-29T11:21:16.894575 [PHY     ] [I] [  665.13] PDCCH: rnti=0x4601 ss_id=2 format=1_1 cce=12 al=2 t=8.9us
2024-04-29T11:21:16.894670 [PHY     ] [I] [  665.13] PDSCH: rnti=0x4601 h_id=9 k1=5 prb=[0, 51) symb=[2, 14) mod=256QAM rv=0 tbs=10497 t=99.7us
2024-04-29T11:21:16.894991 [GTPU    ] [I] ue=0 DL teid=0x000001: RX SDU. sdu_len=1328 qos_flow=QFI=1 sn={na}
2024-04-29T11:21:16.894992 [PDCP    ] [I] ue=0 DRB1 DL: Dropping SDU to avoid going over the TX window size. tx_next_ack=131333 tx_trans=133380 tx_next=135427
2024-04-29T11:21:16.895000 [SCHED   ] [I] [  665.14] Slot decisions pci=1 t=17us (1 PDSCH, 0 PUSCHs): DL: ue=0 c-rnti=0x4601 rb=[0..51) h_id=12 ss_id=2 k1=4 newtx=true rv=0 tbs=10497
2024-04-29T11:21:16.895011 [RLC     ] [W] du=1 ue=0 DRB1 DL: Cannot build data PDU, tx_window is full. grant_len=10494
houshangaziziaccelleran commented 4 months ago

Thanks @ismagom @carlo-gal carlo-gal In case you need more info/log/test, feel free to ask.

ismagom commented 4 months ago

Can you upload full gnb.log and config file please?

houshangaziziaccelleran commented 4 months ago

@ismagom @carlo-gal FYI gnb.zip gnb_config.zip

carlo-gal commented 4 months ago

I went through the logs, but I couldn't spot the issue you mentioned. Are the logs complete? I only see ue=0 and ue=1. I guess you filtered the logs?

Anyway, the ue=0 c-rnti=0x4601 performs a reestablisment and it will get assigned a new id, which will be ue=1 c-rnti=0x4602. Note that the UE ue=0 c-rnti=0x4601 will eventually get deleted.

Just looking at the RRC, I can see the restablishment went through

➜  ~ grep -ni "\[RRC" ~/Workspace/issues/github_reest/gnb.log | less
772667:2024-04-29T11:21:18.273240 [RRC     ] [I] ue=1 c-rnti=0x4602: CCCH UL rrcReestablishmentRequest
772703:2024-04-29T11:21:18.273374 [RRC     ] [I] ue=1 c-rnti=0x4602: DCCH DL rrcReestablishment
773731:2024-04-29T11:21:18.303618 [RRC     ] [I] ue=1 c-rnti=0x4602: DCCH DL rrcReconfiguration

To see we only got the reestablishment complete, we need to check other logs.

412918:2024-04-29T11:21:18.326467 [PHY     ] [I] [   808.8] PUCCH: rnti=0x4602 format=2 prb=[3, 4) prb2=na symb=[4, 6) csi1=1001111 t=26.0us
412919:2024-04-29T11:21:18.326993 [PHY     ] [I] [   808.9] PUCCH: rnti=0x4602 format=1 prb1=50 prb2=na symb=[0, 14) cs=0 occ=1 sr=yes t=42.1us
...
412928:2024-04-29T11:21:18.336631 [DU-F1   ] [I] UL ue=1 c-rnti=0x4602 du_ue_id=1 cu_ue_id=1 SRB1 Tx PDU: UL RRC Message Transfer
412929:2024-04-29T11:21:18.336633 [CU-CP-F1] [I] Rx PDU du_id=1 ue=1 du_ue_id=1 cu_ue_id=1: ULRRCMessageTransfer
412931:2024-04-29T11:21:18.337024 [PHY     ] [I] [   809.9] PUCCH: rnti=0x4602 format=1 prb1=50 prb2=na symb=[0, 14) cs=0 occ=1 sr=no t=36.0us
412932:2024-04-29T11:21:18.337956 [SCHED   ] [I] [   810.0] Slot decisions pci=1 t=19us (1 PDSCH, 0 PUSCHs): DL: ue=1 c-rnti=0x4602 rb=[21..22) h_id=0 ss_id=2 k1=8 newtx=true rv=0 tbs=233

and the scheduler starts using ss_id=2, which means the reconfiguration went through.

From now on, I can only see the UE sends a SR from time to time, most of them are likely to be SR false alarm (as the UE answers with only padding and bs=0 to say no more data to tx), like this one, for example.

413959:2024-04-29T11:21:26.387038 [PHY     ] [I] [   590.9] PUCCH: rnti=0x4602 format=1 prb1=50 prb2=na symb=[0, 14) cs=0 occ=1 sr=yes t=39.9us
413960:2024-04-29T11:21:26.389997 [PHY     ] [I] [   591.4] PDCCH: rnti=0x4602 ss_id=2 format=0_1 cce=0 al=2 t=16.2us
413961:2024-04-29T11:21:26.391967 [SCHED   ] [I] [   591.8] Slot decisions pci=1 t=2us (0 PDSCHs, 1 PUSCH): UL: ue=1 rnti=0x4602 h_id=0 ss_id=2 rb=[4..10) newtx=true rv=0 tbs=544 k2=4
413962:2024-04-29T11:21:26.396566 [PHY     ] [I] [   591.8] PUSCH: rnti=0x4602 h_id=0 prb=[4, 10) symb=[0, 14) mod=64QAM rv=0 tbs=544 crc=OK iter=2.0 sinr=28.5dB t=123.9us uci_t=0.0us ret_t=0.0us
413963:2024-04-29T11:21:26.396605 [MAC     ] [I] [  591.17] UL rnti=0x4602 ue=1 subPDUs: [SBSR: lcg=0 bs=0, SE_PHR: ph=[22..24)dB p_cmax=[23..24)dBm, PAD: len=538]

Only one SR leads to a PUSCH with real data.

415057:2024-04-29T11:21:34.946970 [PHY     ] [I] [   422.9] PUCCH: rnti=0x4602 format=1 prb1=50 prb2=na symb=[0, 14) cs=0 occ=1 sr=yes t=34.2us
415058:2024-04-29T11:21:34.949989 [PHY     ] [I] [   423.4] PDCCH: rnti=0x4602 ss_id=2 format=0_1 cce=0 al=2 t=18.0us
415059:2024-04-29T11:21:34.951995 [SCHED   ] [I] [   423.8] Slot decisions pci=1 t=4us (0 PDSCHs, 1 PUSCH): UL: ue=1 rnti=0x4602 h_id=0 ss_id=2 rb=[4..10) newtx=true rv=0 tbs=544 k2=4
415060:2024-04-29T11:21:34.956573 [PHY     ] [I] [   423.8] PUSCH: rnti=0x4602 h_id=0 prb=[4, 10) symb=[0, 14) mod=64QAM rv=0 tbs=544 crc=OK iter=2.0 sinr=31.2dB t=145.8us uci_t=0.0us ret_t=0.0us
415061:2024-04-29T11:21:34.956714 [MAC     ] [I] [  423.17] UL rnti=0x4602 ue=1 subPDUs: [lcid=4: len=57, lcid=4: len=56, SBSR: lcg=2 bs=0, SE_PHR: ph=[22..24)dB p_cmax=[23..24)dBm, PAD: len=421]

Apart from this, I don't see any activity from the UE or from the upper layers in the GNB, but I don't see anything wrong either. Are the logs complete? Otherwise, from what I can see, the GNB seems to be working correctly.

houshangaziziaccelleran commented 4 months ago

Hi @carlo-gal Many thanks for coming back to me. Let me clarify. The test I explained was with 64 UEs. With that particular test it was not possible to collect logs because the system was crashing if you set the log level to info and with other log levels for that test, there is nothing in gnb.log. So I did another test as below and the logs I shared were for this test: I Attached one UE and I started Data. So far all good. The moment I caused reestablishment using my Amarisoft (the command rrc_reest), the Data stopped. If you want me to run a certain scenario (reestablishment related, I can do it).

carlo-gal commented 4 months ago

Ok, I got it now. Some questions for me to understand why this happens, before asking you to reproduce the issue.

I run several tests for the re-establishment, chances are that I didn't use the combinations of conditions that trigger the issue on your side.

houshangaziziaccelleran commented 4 months ago

Hi @carlo-gal My replies inline:

carlo-gal commented 4 months ago

Ok, thanks.

If you have the chance, could you run 2 tests with the following?

The first test should have:

The second test should have:

If you use Amarisoft UE, could you please send the UE logs as well?

With all this, I should be able to see what's wrong.

houshangaziziaccelleran commented 4 months ago

@carlo-gal The two tests you asked for are done. I am attaching the logs here. ul.zip dl.zip

carlo-gal commented 4 months ago

Hi,

getting back to just now.

I analyzed the logs for both case and these are my observations.

UL case:

The UE completes the re-establishment procedure

 689319 2024-04-30T17:48:16.301173 [RRC     ] [D] ue=1 c-rnti=0x4602: Containerized rrcReconfigurationComplete: [
 689320   {
 689321     "UL-DCCH-Message": {
 689322       "message": {
 689323         "c1": {
 689324           "rrcReconfigurationComplete": {
 689325             "rrc-TransactionIdentifier": 1,
 689326             "criticalExtensions": {
 689327               "rrcReconfigurationComplete": {
 689328               }
 689329             }
 689330           }
 689331         }
 689332       }
 689333     }
 689334   }
 689335 ]
 689336 2024-04-30T17:48:16.301176 [RRC     ] [D] ue=1 c-rnti=0x4602: "RRC Reconfiguration Procedure" finished successfully
 689337 2024-04-30T17:48:16.301176 [RRC     ] [D] ue=1 c-rnti=0x4602: "RRC Reconfiguration Procedure" finalized
 689338 2024-04-30T17:48:16.301192 [RRC     ] [D] ue=1 c-rnti=0x4602: "RRC Reestablishment Procedure" for old_ue=0 finalized

After that, I can see the UE transmitting

707808:2024-04-30T17:48:16.576188 [PHY     ] [D] [  247.18] PUSCH: rnti=0x4602 h_id=2 prb=[4, 48) symb=[0, 14) mod=64QAM rv=0 tbs=4033 crc=OK iter=3.0 sinr=22.9dB t=216.6us uci_t=0.0us ret_t=139.4us
707953:2024-04-30T17:48:16.576626 [PHY     ] [D] [  247.19] PUSCH: rnti=0x4602 h_id=3 prb=[24, 48) symb=[0, 14) mod=64QAM rv=0 tbs=2178 crc=OK iter=2.3 sinr=24.8dB t=137.6us uci_t=0.0us ret_t=102.0us
708139:2024-04-30T17:48:16.581207 [PHY     ] [D] [   248.8] PUSCH: rnti=0x4602 h_id=0 prb=[4, 48) symb=[0, 14) mod=64QAM rv=0 tbs=4033 crc=OK iter=2.5 ack=1 csi1=1111001 sinr=25.3dB t=231.2us uci_t=99.1us ret_t=174.5us
708287:2024-04-30T17:48:16.581683 [PHY     ] [D] [   248.9] PUSCH: rnti=0x4602 h_id=1 prb=[4, 48) symb=[0, 14) mod=64QAM rv=0 tbs=4033 crc=OK iter=2.8 sinr=24.4dB t=191.4us uci_t=0.0us ret_t=138.9us
708500:2024-04-30T17:48:16.586228 [PHY     ] [D] [  248.18] PUSCH: rnti=0x4602 h_id=2 prb=[4, 48) symb=[0, 14) mod=64QAM rv=0 tbs=4033 crc=OK iter=3.0 sinr=23.3dB t=252.0us uci_t=0.0us ret_t=182.4us
708645:2024-04-30T17:48:16.586620 [PHY     ] [D] [  248.19] PUSCH: rnti=0x4602 h_id=3 prb=[24, 48) symb=[0, 14) mod=64QAM rv=0 tbs=2178 crc=OK iter=2.7 sinr=25.3dB t=134.9us uci_t=0.0us ret_t=92.0us
708829:2024-04-30T17:48:16.591172 [PHY     ] [D] [   249.8] PUSCH: rnti=0x4602 h_id=0 prb=[4, 48) symb=[0, 14) mod=64QAM rv=0 tbs=4033 crc=OK iter=2.8 sinr=22.3dB t=203.1us uci_t=0.0us ret_t=134.7us
708973:2024-04-30T17:48:16.591761 [PHY     ] [D] [   249.9] PUSCH: rnti=0x4602 h_id=1 prb=[4, 48) symb=[0, 14) mod=64QAM rv=0 tbs=4033 crc=OK iter=3.5 sinr=21.9dB t=286.3us uci_t=0.0us ret_t=202.9us
709186:2024-04-30T17:48:16.596186 [PHY     ] [D] [  249.18] PUSCH: rnti=0x4602 h_id=2 prb=[4, 48) symb=[0, 14) mod=64QAM rv=0 tbs=4033 crc=OK iter=3.0 sinr=24.2dB t=201.4us uci_t=0.0us ret_t=135.8us
709330:2024-04-30T17:48:16.596652 [PHY     ] [D] [  249.19] PUSCH: rnti=0x4602 h_id=3 prb=[24, 48) symb=[0, 14) mod=64QAM rv=0 tbs=2178 crc=OK iter=2.7 sinr=24.7dB t=154.1us uci_t=0.0us ret_t=104.6us
709487:2024-04-30T17:48:16.601286 [PHY     ] [D] [   250.8] PUSCH: rnti=0x4602 h_id=0 prb=[4, 48) symb=[0, 14) mod=64QAM rv=0 tbs=4033 crc=OK iter=3.0 sinr=22.9dB t=311.3us uci_t=0.0us ret_t=205.2us
709633:2024-04-30T17:48:16.601670 [PHY     ] [D] [   250.9] 

All this traffic is served for LCID=4, so I guess the iperf traffic is going through. As far as the re-establishment procedure is concerned, everything seems to work fine.

When did you notice the data traffic in UL stopped after the re-establishment? Doesn't it start at all, or does it start and then stop after a while? Because I can't spot any issue only just looking at the logs.

DL case:

Even in this case, the re-establishment procedure completes without issues.

1738826:2024-04-30T17:45:31.078549 [RRC     ] [D] ue=1 c-rnti=0x4602: RX SRB1 PDCP PDU
1738843:2024-04-30T17:45:31.078579 [RRC     ] [D] ue=1 c-rnti=0x4602: Rx SRB1 DCCH UL rrcReconfigurationComplete (2 B)
1738844:2024-04-30T17:45:31.078580 [RRC     ] [D] ue=1 c-rnti=0x4602: Containerized rrcReconfigurationComplete: [
1738861:2024-04-30T17:45:31.078585 [RRC     ] [D] ue=1 c-rnti=0x4602: "RRC Reconfiguration Procedure" finished successfully
1738862:2024-04-30T17:45:31.078585 [RRC     ] [D] ue=1 c-rnti=0x4602: "RRC Reconfiguration Procedure" finalized
1738863:2024-04-30T17:45:31.078600 [RRC     ] [D] ue=1 c-rnti=0x4602: "RRC Reestablishment Procedure" for old_ue=0 finalized

After this, I can see the RLC buffer state updates reaching the scheduler

1738133:- RLC Buffer State: ue=1 lcid=1 pending_bytes=315
1738205:- RLC Buffer State: ue=1 lcid=1 pending_bytes=198
1738386:- RLC Buffer State: ue=1 lcid=1 pending_bytes=81
1738525:- RLC Buffer State: ue=1 lcid=1 pending_bytes=0
1738882:- RLC Buffer State: ue=1 lcid=1 pending_bytes=3

 ====   RRC reconfiguration  completes here approx. ==== 

1738974:- RLC Buffer State: ue=1 lcid=1 pending_bytes=0
1753415:- RLC Buffer State: ue=1 lcid=4 pending_bytes=3
1753507:- RLC Buffer State: ue=1 lcid=4 pending_bytes=0
1755392:- RLC Buffer State: ue=1 lcid=4 pending_bytes=3
1755483:- RLC Buffer State: ue=1 lcid=4 pending_bytes=0
1757396:- RLC Buffer State: ue=1 lcid=4 pending_bytes=3
1757488:- RLC Buffer State: ue=1 lcid=4 pending_bytes=0
1761166:- RLC Buffer State: ue=1 lcid=4 pending_bytes=3
1761258:- RLC Buffer State: ue=1 lcid=4 pending_bytes=0
1768521:- RLC Buffer State: ue=1 lcid=4 pending_bytes=3
1768613:- RLC Buffer State: ue=1 lcid=4 pending_bytes=0
1783229:- RLC Buffer State: ue=1 lcid=4 pending_bytes=3
1783321:- RLC Buffer State: ue=1 lcid=4 pending_bytes=0
1810227:- RLC Buffer State: ue=1 lcid=4 pending_bytes=1332
1810359:- RLC Buffer State: ue=1 lcid=4 pending_bytes=5328
1810528:- RLC Buffer State: ue=1 lcid=4 pending_bytes=2677
1810665:- RLC Buffer State: ue=1 lcid=4 pending_bytes=2673
1810902:- RLC Buffer State: ue=1 lcid=4 pending_bytes=10665
1811070:- RLC Buffer State: ue=1 lcid=4 pending_bytes=14703
1811131:- RLC Buffer State: ue=1 lcid=4 pending_bytes=21363
1811295:- RLC Buffer State: ue=1 lcid=4 pending_bytes=36015
1811451:- RLC Buffer State: ue=1 lcid=4 pending_bytes=24267
1811610:- RLC Buffer State: ue=1 lcid=4 pending_bytes=12520
1811769:- RLC Buffer State: ue=1 lcid=4 pending_bytes=773
...
1823307:- RLC Buffer State: ue=1 lcid=4 pending_bytes=9336
1823458:- RLC Buffer State: ue=1 lcid=4 pending_bytes=5510
1823581:- RLC Buffer State: ue=1 lcid=4 pending_bytes=15

There this DL session continues for like 0.5 seconds.

1810226 2024-04-30T17:45:40.061277 [SCHED   ] [D] [   126.3] Processed slot events:
1810227 - RLC Buffer State: ue=1 lcid=4 pending_bytes=1332
...
1836433 2024-04-30T17:45:40.585384 [SCHED   ] [D] [  178.11] Processed slot events:
1836434 - RLC Buffer State: ue=1 lcid=4 pending_bytes=15

After that, there is only some sporadic DL traffic reaching the scheduler. Again, the scheduler seems to be working fine.

Similarly to the previous case, when did you notice the data traffic in DL stopped after the re-establishment? Did you stop it or does it stop by itself. Depending on that, I will probably need to involve someone from upper layers

houshangaziziaccelleran commented 4 months ago

Hi @carlo-gal For the two test you asked me to run, I did not see any Data stop with UL-Only test. I only experience data stop with DL-Only test you asked me to run. So the DL data stops right after causing a reestablishment. And with 1-UE this is always the case. No matter if I run Bi-Dir or DL-Only. With UL-Only no issue!

carlo-gal commented 4 months ago

Thanks. One thing still isn't completely clear to me.

In this comment

The moment I caused reestablishment using my Amarisoft (the command rrc_reest), the Data stopped.

and in the title of this issue, the problem seems to be that the GNB cannot resume the traffic after the re-establishment.

In the latest logs you sent and according this comment,

So the DL data stops right after causing a reestablishment

the issue seems to be the data traffic stopping before the re-establishment (actually, it becomes the cause of the re-establishment).

Did I understand correctly? If so, either we are dealing with 2 different problems, or the re-establishment might not the reason for what we observe. Sorry for the forth and back, but I just want to narrow down what the problem is, as I might need to involve someone else to solve this.

houshangaziziaccelleran commented 4 months ago

Thanks again @carlo-gal Let me explain more clearly and I am going to explain the one UE scenario as it is easier to analyze than multi-UE scenario:

1- I attach 1 UE --> All good 2- I Run DL-Only Data --> All good 3- I runrrc_reest 1 on AMarisoft UE to cause reestablishment --> It seems the reestablishment itself is partially working because I can see that a new RNTI is assigned but the running data I explained in step #2, stops. It does not go through anymore.

This issue is only happening with DL. If I do the same test with UL-Only, the UL data wouldn't stop after reestablishment.

Feel free to ask me for any test/log/trace for further investigation

Thanks Houshang

carlo-gal commented 4 months ago

Thank you for the explanation. We'll discuss this internally as, as I said earlier, the problem could lie in the upper layers.

In the meantime, two more questions.

houshangaziziaccelleran commented 4 months ago
  1. It was UDP; I can do it with TCP as well.
  2. The answer is Yes.
alvasMan commented 4 months ago

@houshangaziziaccelleran, I see that in the dl.zip logs that the re-establishment finishes here:

/// Re-establishment ended here:
2024-04-30T17:45:31.078600 [RRC     ] [D] ue=1 c-rnti=0x4602: "RRC Reestablishment Procedure" for old_ue=0 finalized

But the next GTP-U PDU I see is 9 seconds later:

/// First GTP-U PDU after reestablishment only arrives much later.
2024-04-30T17:45:40.061074 [GTPU    ] [D] Forwarding PDU. pdu_len=1344 teid=0x000001
2024-04-30T17:45:40.061080 [GTPU    ] [D] ue=0 DL teid=0x000001: RX PDU. sdu_len=1344 rx_deliv=0 rx_reord=0 rx_next=0 

Could you kindly try running the same experiment but using a UM bearer please? I'm suspecting some problem in the pdcp::reestablishment() function.

Also could you please use the cu_up --warn_on_drop=true option, please?

alvasMan commented 4 months ago

Also, can you please enable NGAP, F1AP and E1AP pcaps? And info logs at the UE, please?

houshangaziziaccelleran commented 4 months ago

Hi @alvasMan

Thanks for reaching out Two questions:

1- Is cu_up --warn_on_drop=true an option of gnb when I am running srsRAN?

2- Is following a good log/trace level for you?If not, feel free to edit it and give it back to me.

log:
  filename: /tmp/gnb.log
  all_level: error
pcap:
  ngap_filename: /tmp/gnb_ngap.pcap
  ngap_enable: true
  e1ap_filename: /tmp/gnb_e1ap.pcap
  e1ap_enable: true
  f1ap_filename: /tmp/gnb_f1ap.pcap
  f1ap_enable: true
houshangaziziaccelleran commented 4 months ago

Hi @alvasMan

Following log/trace level is set:

log:
  filename: /tmp/gnb.log
  all_level: error
pcap:
  ngap_filename: /tmp/gnb_ngap.pcap
  ngap_enable: true
  e1ap_filename: /tmp/gnb_e1ap.pcap
  e1ap_enable: true
  f1ap_filename: /tmp/gnb_f1ap.pcap
  f1ap_enable: true

UE is in UM mode.

UE log level and gNB log level are in info mode.

The cu_up --warn_on_drop=trueI did not get it.

Please find attached the logs for the failed test (DL data stop after reestablishment reestablishement-issue-20240502.zip )

houshangaziziaccelleran commented 4 months ago

@alvasMan cu_up --warn_on_drop=true thing is clear now. Let me know if you want me to redo the test with it? By the way, the issue happens with TCP-DL as well.

houshangaziziaccelleran commented 4 months ago

@alvasMan The test is done with _cu_up --warn_ondrop=true as well. Please find attached the requested logs. reestablishement-issue-20240502_2.zip

alvasMan commented 4 months ago

@houshangaziziaccelleran the pcaps you sent all have 0 bytes, both in "reestablishement-issue-20240502.zip" and "reestablishement-issue-20240502_2.zip":

╰─$ ls -l ~/Downloads/reestablishement-issue-20240502/srsran_logs_and_traces/
total 134964
-rw-r--r-- 1 pedro pedro         0 May  2 16:27 gnb_e1ap.pcap
-rw-r--r-- 1 pedro pedro         0 May  2 16:27 gnb_f1ap.pcap
-rw-r--r-- 1 pedro pedro 138203136 May  2 16:27 gnb.log
-rw-r--r-- 1 pedro pedro         0 May  2 16:27 gnb_ngap.pcap
...
╰─$ ls -l ~/Downloads/reestablishement-issue-20240502_2/srsran_logs_and_traces/
total 123224
-rw-r--r-- 1 pedro pedro         0 May  2 17:32 gnb_e1ap.pcap
-rw-r--r-- 1 pedro pedro         0 May  2 17:32 gnb_f1ap.pcap
-rw-r--r-- 1 pedro pedro 126181376 May  2 17:32 gnb.log
-rw-r--r-- 1 pedro pedro         0 May  2 17:32 gnb_ngap.pcap

Perhaps there is some permission issue that prevented them from being written? Can you delete all previous logs and try again? Or the gNB did not exit cleanly?

That said, previously you were running the experiment in debug, could you please return to to that? Ideally, do the following:

log --all_level=debug --f1u_level=debug cu_up --warn_on_drop=true pcap --f1ap_enable=true --mac_enable=true --ngap_enable=true --e1ap_enable=true

If the logger cannot keep up and we start to drop log messages we will try to target the logging a bit more, but to start let's try to get as much information as possible. Just keep an eye in the console to make sure no log entries are dropped.

Another thing, if you do the following experiment, does the traffic recover (without extra signalling to the core/gnb):

  1. Start DL traffic (without any UL),
  2. Reestablish from the UE, traffic shoud drop,
  3. Trigger ping from the UE to the core,
  4. Does the traffic recover after the ping?
houshangaziziaccelleran commented 4 months ago

Hi @alvasMan Here is my log level:

log:
  filename: /tmp/gnb.log
  all_level: error
pcap:
  ngap_filename: /tmp/gnb_ngap.pcap
  ngap_enable: true
  e1ap_filename: /tmp/gnb_e1ap.pcap
  e1ap_enable: true
  f1ap_filename: /tmp/gnb_f1ap.pcap
  f1ap_enable: true
#  rlc_filename: /tmp/gnb_rlc.pcap
#  rlc_rb_type: all
#  rlc_enable: true
#  mac_filename: /tmp/gnb_mac.pcap
#  mac_type: dlt
#  mac_enable: true
#  e2ap_filename: /tmp/gnb_e2ap.pcap
#  e2ap_enable: true
#  gtpu_filename: /tmp/gnb_gtpu.pcap
#  gtpu_enable: true

It seems to be correct and I had removed everything in /tmp before staring the test. Let;s fix this before I can proceed with a new test as you request. Any idea how to fix this issue please?

ismagom commented 4 months ago

Hi @alvasMan Here is my log level:

log:
  filename: /tmp/gnb.log
  all_level: error
pcap:
  ngap_filename: /tmp/gnb_ngap.pcap
  ngap_enable: true
  e1ap_filename: /tmp/gnb_e1ap.pcap
  e1ap_enable: true
  f1ap_filename: /tmp/gnb_f1ap.pcap
  f1ap_enable: true
#  rlc_filename: /tmp/gnb_rlc.pcap
#  rlc_rb_type: all
#  rlc_enable: true
#  mac_filename: /tmp/gnb_mac.pcap
#  mac_type: dlt
#  mac_enable: true
#  e2ap_filename: /tmp/gnb_e2ap.pcap
#  e2ap_enable: true
#  gtpu_filename: /tmp/gnb_gtpu.pcap
#  gtpu_enable: true

It seems to be correct and I had removed everything in /tmp before staring the test. Let;s fix this before I can proceed with a new test as you request. Any idea how to fix this issue please?

You need to stop the gnb application before sending us the pcaps

houshangaziziaccelleran commented 4 months ago

@alvasMan For some reason (maybe because of the segmentation error) pcap files are empty again, except one. Please see attached logs and I will redo the test making sure I am not missing anything (see segmentation fault error message attached). 20240305-srsran.zip

For the ping thing, No, it did not recover the DL traffic.

alvasMan commented 4 months ago

(maybe because of the segmentation error)

How often are you seeing this segfault, does it happen every time or only some times? Do you see it immediately after re-establishment?

Can you try to run with RelWithDebugInfo so we can get more info?

houshangaziziaccelleran commented 4 months ago

It was actually first time I saw it for this particular test. Can you explain what is RelWithDebugInfo please?

houshangaziziaccelleran commented 4 months ago

@alvasMan I have better logs for you: 20240503_19h10.zip

carlo-gal commented 4 months ago

Hi @houshangaziziaccelleran,

Today we release a new version of the code, in which we fixed several pending issues. Could you try with the new code and check whether the your problem persists?

ismagom commented 4 months ago

The commit you should see is 78238fd15e4cd82a6324d6dbbb612ac5584b13ea, the version number is ok. Please do git pull again

houshangaziziaccelleran commented 4 months ago

Thanks @ismagom Is this correct?

radisys@node1:~/srsRAN_Project$ gnb --version
srsRAN 5G gNB version 24.04.0 (78238fd15)
houshangaziziaccelleran commented 4 months ago
radisys@node1:~/srsRAN_Project$ git log -n1
commit 78238fd15e4cd82a6324d6dbbb612ac5584b13ea (HEAD -> main, origin/test, origin/main, origin/HEAD)
Merge: a15950301 7172eede4
Author: codebot <group_13873759_bot_94b0c73433278102@noreply.gitlab.com>
Date:   Mon May 13 08:22:09 2024 +0000

    Update main

    # Conflicts:
    #       CMakeLists.txt
    #       apps/units/cu_cp/logger_registrator.h
    #       include/srsran/f1u/f1u_cu_up_entity.h
    #       include/srsran/f1u/f1u_cu_up_rx.h
    #       include/srsran/f1u/f1u_cu_up_transport.h
    #       include/srsran/f1u/f1u_cu_up_tx.h
    #       include/srsran/gtpu/gtpu_tunnel_tx.h
    #       lib/cu_cp/routine_managers/du_processor_routine_manager.cpp
    #       lib/cu_cp/routine_managers/du_processor_routine_manager.h
    #       lib/cu_cp/routines/mobility/inter_ngran_node_n2_handover_routine.cpp
    #       lib/cu_cp/routines/mobility/inter_ngran_node_n2_handover_routine.h
    #       lib/phy/upper/equalization/equalize_zf_2x2.cpp
    #       lib/phy/upper/equalization/equalize_zf_2x2.h
    #       lib/support/network/epoll_helper.h
    #       tests/unittests/cu_cp/du_processor/inter_ngran_node_n2_handover_routine_test.cpp
    #       tests/unittests/cu_cp/du_processor/inter_ngran_node_n2_handover_routine_test_helpers.h
    #       tests/unittests/phy/upper/channel_processors/pusch/pusch_demodulator_test_data.tar.gz
    #       tests/unittests/phy/upper/equalization/channel_equalizer_test_data.tar.gz
    #       tests/unittests/phy/upper/signal_processors/dmrs_pusch_estimator_test_data.tar.gz
    #       tests/unittests/phy/upper/signal_processors/port_channel_estimator_test_data.tar.gz
radisys@node1:~/srsRAN_Project$ 
ismagom commented 4 months ago

yes, this is the correct version

houshangaziziaccelleran commented 4 months ago

Hi @ismagom I have an issue with this new version. The very first UE Attach fails! I am using the same gnb config as before. My UE (Amarisoft) config is also same as before. Can you please check the logs? I am double-checking things from my end. 78238fd15-srsran_logs_and_traces.zip

ismagom commented 4 months ago

Hi @houshangaziziaccelleran,

What UE are you using?

houshangaziziaccelleran commented 4 months ago

Hi @ismagom It is Amarisoft

ismagom commented 4 months ago

Can you send UE and gnb logs? All in debug please.

houshangaziziaccelleran commented 4 months ago

@ismagom Find them attached please. Thanks 20240514_1.zip

houshangaziziaccelleran commented 4 months ago

Hi @ismagom Do you have an update for me please? I installed my server from scrach (the server srsRAN is running on) and I checked everything but I still cannot attach UE! I am using the same gnb config, the same core network, same amarisoft config,... Thanks

houshangaziziaccelleran commented 4 months ago

Hi @ismagom I head there has been changes to the following part. I use:

  nof_antennas_dl: 2
  nof_antennas_ul: 2

in cell_cfg of the gNB.

And the following in Amarisoft:

        n_antenna_dl: 2,
        n_antenna_ul: 1,

This used to work. Not sure if it is relevant at all.

ismagom commented 4 months ago

Can you try only 1 UL? antenna in the gnb?

houshangaziziaccelleran commented 4 months ago

@ismagom Before I get this message I tried with:

  nof_antennas_dl: 1
  nof_antennas_ul: 1

And Attach works but the issue is that i get half the DL throughput.

ismagom commented 4 months ago

Can you try 2 DL and 1 UL?

houshangaziziaccelleran commented 4 months ago

@ismagom 2 DL and 1 UL doesn't work. but 2/2 works. Thanks

ismagom commented 4 months ago

I thought you said 2 UL wasn't working. Can you confirm your observation above?

houshangaziziaccelleran commented 4 months ago

With previous release, I had:

nof_antennas_dl: 2
nof_antennas_ul: 2

And amarisoft side:

n_antenna_dl: 2,
n_antenna_ul: 1,

This is not working with new release. With new release, it is 2/2 (gNB) and 2/2 (amarisoft) and it works only this way (2/2 on both sides)

OK @ismagom Now, back to the point, the reestablishment issue is there with the new release as well. I have collected new logs (attached)

reest-issue-20240514_1.zip

alvasMan commented 4 months ago

@houshangaziziaccelleran we detected some problems on the re-establishment at the PDCP when doing heavy downlink traffic. This seems to match the log entries I see your latest logs:

2024-05-14T15:41:03.536658 [GTPU    ] [I] ue=0 DL teid=0x000001: RX SDU. sdu_len=1328 qos_flow=QFI=1 sn={na}
2024-05-14T15:41:03.536658 [PDCP    ] [I] ue=0 DRB1 DL: Dropping SDU to avoid going over the TX window size. tx_next_ack=162021 tx_trans=162169 tx_next=164216

Could you kindly try out the following patch? Just do git am < name_of_the_patch.patch

patch.tar.gz

houshangaziziaccelleran commented 4 months ago

Hi @alvasMan I will try with the following and update you accordingly:

ad@bm-super11-intel:~/srsRAN_Project$ git am 0001-cu_cp-e1ap-fix-missing-reestablish-field-in-PDCP-con.patch
Applying: cu_cp,e1ap: fix missing reestablish field in PDCP config
ad@bm-super11-intel:~/srsRAN_Project$ git log -n1
commit b68fda1389aa02871d0411546e0d5106134c136b (HEAD -> main)
Author: Pedro Alvarez <pedro@srs.io>
Date:   Mon May 13 15:53:29 2024 +0100

    cu_cp,e1ap: fix missing reestablish field in PDCP config
ad@bm-super11-intel:~/srsRAN_Project$ 
houshangaziziaccelleran commented 4 months ago

@alvasMan @ismagom The reestablishment issue (DL Data stop) is there with the patch as well. Please find attache he new log.

reest-issue-20240514_2-with_pach.zip

alvasMan commented 4 months ago

Did you re-compile? The logs seem to show you are still using 78238fd15.

2024-05-14T16:25:44.502697 [GNB     ] [I] Built in Release mode using commit 78238fd15 on branch main

I would also expect to see the PDCP config with the re-establish field in the E1AP pcap and also the following entry in the logs after the call to reestablish():

+  logger.log_info("Reestablished PDCP. st={}", st);