srsran / srsRAN_Project

Open source O-RAN 5G CU/DU solution from Software Radio Systems (SRS) https://docs.srsran.com/projects/project
https://www.srsran.com
GNU Affero General Public License v3.0
486 stars 164 forks source link

High DL BLER when Running Ping Tests with 4X2 MIMO VS 2X2 MIMO #603

Closed Esawi closed 1 month ago

Esawi commented 5 months ago

Issue Description

DL BLER ~50% when Running Ping Tests with 4X2 MIMO but ~2% when running 2X2

Logs, configs and traces:

Setup Details

Steps to reproduce the problem

andrepuschmann commented 5 months ago

@Esawi For the tests could you please also share the RU config and run the gnb with PHY log level in info?

Esawi commented 5 months ago

Hi @andrepuschmann , Please find attached below test runs with PHY log level info and ru_config.cfg included.

2X2 Test Run: 2X2_TestRun_logs_configs_traces.zip 4X2 Test Run: 4X2_TestRun_logs_configs_traces.zip

ismagom commented 5 months ago

The high ping latency is caused by the high BLER and long SR period. Try reducing the SR period first.

For the high BLER, use 100 MHz and LF PRACH with the RAN650. You can use the configs I shared here:

https://github.com/srsran/srsRAN_Project/discussions/555#discussioncomment-9328127

ismagom commented 5 months ago

As an example of the latency seen in the stack when uplink ping is done, consider this extract from the log. From the reception of the SR signal, the scheduling of PUSCH grant, the decoding of the ping request and transmission of ping reply and PDSCH takes 5-6 ms:

(log taken from https://github.com/srsran/srsRAN_Project/files/15223342/2X2_TestRun_logs_configs_traces.zip)

2024-05-06T16:01:14.865140 [PHY     ] [I] [    70.9] PUCCH: rnti=0x4605 format=1 prb1=2 prb2=na symb=[0, 14) cs=0 occ=4 sr=yes t=23.6us
2024-05-06T16:01:14.865579 [PHY     ] [I] [   70.16] PDCCH: rnti=0x4605 ss_id=2 format=0_1 cce=10 al=2 t=12.7us
2024-05-06T16:01:14.866522 [SCHED   ] [I] [   70.18] Slot decisions pci=502 t=1us (0 PDSCHs, 1 PUSCH): UL: ue=0 rnti=0x4605 h_id=0 ss_id=2 rb=[7..14) newtx=true rv=0 tbs=576 k2=2
2024-05-06T16:01:14.869706 [PHY     ] [I] [   70.18] PUSCH: rnti=0x4605 h_id=0 prb=[7, 14) symb=[0, 14) mod=64QAM rv=0 tbs=576 crc=OK iter=2.0 sinr=30.6dB t=86.6us uci_t=0.0us ret_t=0.0us
2024-05-06T16:01:14.869718 [MAC     ] [I] [    71.4] UL rnti=0x4605 ue=0 subPDUs: [lcid=4: len=90, lcid=4: len=90, lcid=4: len=90, SBSR: lcg=0 bs=0, SE_PHR: ph=[27..29)dB p_cmax=[22..23)dBm, PAD: len=294]
2024-05-06T16:01:14.869725 [RLC     ] [I] du=1 ue=0 DRB1 UL: RX PDU. pdu_len=90 dc=data p=0 si=full sn=0
2024-05-06T16:01:14.869730 [RLC     ] [I] du=1 ue=0 DRB1 UL: RX SDU. sn=0 sdu_len=87
2024-05-06T16:01:14.869743 [RLC     ] [I] du=1 ue=0 DRB1 UL: RX PDU. pdu_len=90 dc=data p=0 si=full sn=1
2024-05-06T16:01:14.869743 [RLC     ] [I] du=1 ue=0 DRB1 UL: RX SDU. sn=1 sdu_len=87
2024-05-06T16:01:14.869750 [PDCP    ] [I] ue=0 DRB1 UL: RX PDU. type=data pdu_len=87 sn=0 count=0
2024-05-06T16:01:14.869752 [PDCP    ] [I] ue=0 DRB1 UL: RX SDU. count=0
2024-05-06T16:01:14.869763 [PDCP    ] [I] ue=0 DRB1 UL: RX PDU. type=data pdu_len=87 sn=1 count=1
2024-05-06T16:01:14.869763 [PDCP    ] [I] ue=0 DRB1 UL: RX SDU. count=1
2024-05-06T16:01:14.869793 [RLC     ] [I] du=1 ue=0 DRB1 UL: RX PDU. pdu_len=90 dc=data p=1 si=full sn=2
2024-05-06T16:01:14.869793 [RLC     ] [I] du=1 ue=0 DRB1 UL: RX SDU. sn=2 sdu_len=87
2024-05-06T16:01:14.869797 [PDCP    ] [I] ue=0 DRB1 UL: RX PDU. type=data pdu_len=87 sn=2 count=2
2024-05-06T16:01:14.869798 [PDCP    ] [I] ue=0 DRB1 UL: RX SDU. count=2
2024-05-06T16:01:14.870041 [PDCP    ] [I] ue=0 DRB1 DL: TX PDU. type=data pdu_len=87 sn=0 count=0
2024-05-06T16:01:14.870044 [PDCP    ] [I] ue=0 DRB1 DL: TX PDU. type=data pdu_len=87 sn=1 count=1
2024-05-06T16:01:14.870045 [PDCP    ] [I] ue=0 DRB1 DL: TX PDU. type=data pdu_len=87 sn=2 count=2
2024-05-06T16:01:14.870047 [RLC     ] [I] du=1 ue=0 DRB1 DL: TX SDU. sdu_len=87 pdcp_sn=0 queued_sdus=1 queued_bytes=87
2024-05-06T16:01:14.870048 [RLC     ] [I] du=1 ue=0 DRB1 DL: TX SDU. sdu_len=87 pdcp_sn=1 queued_sdus=2 queued_bytes=174
2024-05-06T16:01:14.870048 [RLC     ] [I] du=1 ue=0 DRB1 DL: TX SDU. sdu_len=87 pdcp_sn=2 queued_sdus=3 queued_bytes=261
2024-05-06T16:01:14.870059 [SCHED   ] [I] [    71.5] Slot decisions pci=502 t=37us (1 PDSCH, 0 PUSCHs): DL: ue=0 c-rnti=0x4605 rb=[0..1) h_id=0 ss_id=2 k1=3 newtx=true rv=0 tbs=225
2024-05-06T16:01:14.870093 [RLC     ] [I] du=1 ue=0 DRB1 DL: TX status PDU. pdu_len=3 grant_len=3
2024-05-06T16:01:14.870095 [MAC     ] [I] [    71.5] DL PDU: ue=0 rnti=0x4605 size=225: SDU: lcid=4 nof_sdus=1 total_size=5
2024-05-06T16:01:14.870099 [PHY     ] [I] [    71.5] PDCCH: rnti=0x4605 ss_id=2 format=1_1 cce=18 al=2 t=12.0us
2024-05-06T16:01:14.870126 [PHY     ] [I] [    71.5] PDSCH: rnti=0x4605 h_id=0 k1=3 prb=[0, 1) symb=[2, 14) mod=256QAM rv=0 tbs=225 t=24.3us

The problem is that multiple reTX are needed, thus increasing the latency to 20 ms:

2024-05-06T16:01:14.874639 [PHY     ] [I] [    71.8] PUCCH: rnti=0x4605 format=1 prb1=0 prb2=na symb=[0, 14) cs=0 occ=4 ack=00 t=24.8us
2024-05-06T16:01:14.875023 [SCHED   ] [I] [   71.15] Slot decisions pci=502 t=16us (1 PDSCH, 0 PUSCHs): DL: ue=0 c-rnti=0x4605 rb=[0..1) h_id=0 ss_id=2 k1=3 newtx=false rv=2 tbs=225
2024-05-06T16:01:14.875058 [PHY     ] [I] [   71.15] PDCCH: rnti=0x4605 ss_id=2 format=1_1 cce=10 al=2 t=12.3us
2024-05-06T16:01:14.875078 [PHY     ] [I] [   71.15] PDSCH: rnti=0x4605 h_id=0 k1=3 prb=[0, 1) symb=[2, 14) mod=256QAM rv=2 tbs=225 t=29.0us
2024-05-06T16:01:14.875135 [PHY     ] [I] [    71.9] PUCCH: rnti=0x4605 format=1 prb1=0 prb2=na symb=[0, 14) cs=0 occ=4 ack=0 t=26.4us
2024-05-06T16:01:14.875524 [SCHED   ] [I] [   71.16] Slot decisions pci=502 t=14us (1 PDSCH, 0 PUSCHs): DL: ue=0 c-rnti=0x4605 rb=[0..2) h_id=1 ss_id=2 k1=2 newtx=false rv=2 tbs=453
2024-05-06T16:01:14.875555 [PHY     ] [I] [   71.16] PDCCH: rnti=0x4605 ss_id=2 format=1_1 cce=10 al=2 t=9.3us
2024-05-06T16:01:14.875576 [PHY     ] [I] [   71.16] PDSCH: rnti=0x4605 h_id=1 k1=2 prb=[0, 2) symb=[2, 14) mod=256QAM rv=2 tbs=453 t=25.2us
2024-05-06T16:01:14.876046 [SCHED   ] [I] [   71.17] Slot decisions pci=502 t=18us (1 PDSCH, 0 PUSCHs): DL: ue=0 c-rnti=0x4605 rb=[0..2) h_id=2 ss_id=2 k1=2 newtx=false rv=2 tbs=153
2024-05-06T16:01:14.876070 [PHY     ] [I] [   71.17] PDCCH: rnti=0x4605 ss_id=2 format=1_1 cce=0 al=2 t=7.4us
2024-05-06T16:01:14.876088 [PHY     ] [I] [   71.17] PDSCH: rnti=0x4605 h_id=2 k1=2 prb=[0, 2) symb=[2, 6) mod=256QAM rv=2 tbs=153 t=16.7us
2024-05-06T16:01:14.879635 [PHY     ] [I] [   71.18] PUCCH: rnti=0x4605 format=1 prb1=0 prb2=na symb=[0, 14) cs=0 occ=4 ack=00 t=22.4us
2024-05-06T16:01:14.880036 [SCHED   ] [I] [    72.5] Slot decisions pci=502 t=20us (1 PDSCH, 0 PUSCHs): DL: ue=0 c-rnti=0x4605 rb=[0..1) h_id=0 ss_id=2 k1=3 newtx=false rv=3 tbs=225
2024-05-06T16:01:14.880066 [PHY     ] [I] [    72.5] PDCCH: rnti=0x4605 ss_id=2 format=1_1 cce=18 al=2 t=9.0us
2024-05-06T16:01:14.880083 [PHY     ] [I] [    72.5] PDSCH: rnti=0x4605 h_id=0 k1=3 prb=[0, 1) symb=[2, 14) mod=256QAM rv=3 tbs=225 t=20.4us
2024-05-06T16:01:14.880138 [PHY     ] [I] [   71.19] PUCCH: rnti=0x4605 format=1 prb1=0 prb2=na symb=[0, 14) cs=0 occ=4 ack=0 t=21.3us
2024-05-06T16:01:14.880540 [SCHED   ] [I] [    72.6] Slot decisions pci=502 t=17us (1 PDSCH, 0 PUSCHs): DL: ue=0 c-rnti=0x4605 rb=[0..2) h_id=1 ss_id=2 k1=2 newtx=false rv=3 tbs=453
2024-05-06T16:01:14.880571 [PHY     ] [I] [    72.6] PDCCH: rnti=0x4605 ss_id=2 format=1_1 cce=28 al=2 t=9.5us
2024-05-06T16:01:14.880590 [PHY     ] [I] [    72.6] PDSCH: rnti=0x4605 h_id=1 k1=2 prb=[0, 2) symb=[2, 14) mod=256QAM rv=3 tbs=453 t=23.9us
2024-05-06T16:01:14.881025 [SCHED   ] [I] [    72.7] Slot decisions pci=502 t=14us (1 PDSCH, 0 PUSCHs): DL: ue=0 c-rnti=0x4605 rb=[0..2) h_id=2 ss_id=2 k1=2 newtx=false rv=3 tbs=153
2024-05-06T16:01:14.881060 [PHY     ] [I] [    72.7] PDCCH: rnti=0x4605 ss_id=2 format=1_1 cce=20 al=2 t=12.9us
2024-05-06T16:01:14.881067 [PHY     ] [I] [    72.7] PDSCH: rnti=0x4605 h_id=2 k1=2 prb=[0, 2) symb=[2, 6) mod=256QAM rv=3 tbs=153 t=15.9us
2024-05-06T16:01:14.884643 [PHY     ] [I] [    72.8] PUCCH: rnti=0x4605 format=2 prb=[3, 4) prb2=na symb=[12, 14) ack=01 csi1=1011111 t=29.7us
2024-05-06T16:01:14.885031 [SCHED   ] [I] [   72.15] Slot decisions pci=502 t=14us (1 PDSCH, 0 PUSCHs): DL: ue=0 c-rnti=0x4605 rb=[0..1) h_id=0 ss_id=2 k1=3 newtx=false rv=1 tbs=225
2024-05-06T16:01:14.885060 [PHY     ] [I] [   72.15] PDCCH: rnti=0x4605 ss_id=2 format=1_1 cce=10 al=2 t=10.2us
2024-05-06T16:01:14.885078 [PHY     ] [I] [   72.15] PDSCH: rnti=0x4605 h_id=0 k1=3 prb=[0, 1) symb=[2, 14) mod=256QAM rv=1 tbs=225 t=22.3us
2024-05-06T16:01:14.885152 [PHY     ] [I] [    72.9] PUCCH: rnti=0x4605 format=1 prb1=2 prb2=na symb=[0, 14) cs=0 occ=4 ack=1 t=24.6us
herlesupreeth commented 5 months ago

@Esawi If I may ask, are the following power values set in gNB configuration file for 4x2 MIMO scenario based on some measurements done using some signal analyzer or something?

ssb_block_power_dbm: 4
p_max: 23
p0_nominal_with_grant: -78

And, I dont think we have tested PUSCH explicitly with dmrs_additional_position: 3, may I know why you have set that? Can you try with default value of pos2?

Esawi commented 5 months ago

Hi @andrepuschmann ,

I believe I narrowed down the issue a bit.

cell_cfg:
  pucch:
    sr_period_ms: 20   # This can be set either 20 or 40 ms.
    f1_nof_cell_res_sr: 15
    f2_nof_cell_res_csi: 15
  csi:
    csi_rs_period: 40  # This can be set either 20 or 40 ms.
  ul_common:
    max_ul_grants_per_slot: 16
    max_pucchs_per_slot: 15
  pusch:
    max_puschs_per_slot: 1
  tdd_ul_dl_cfg:
    nof_dl_symbols: 7
    nof_dl_slots: 7
    nof_ul_slots: 2

Test Run 01: 4X2 MIMO without above configuration

Test Run 02: 4X2 MIMO with above configruation

Test Run 03: 2X2 MIMO with above configuration

The question remain why the above changes are not impacting the DL BLER and RTT when running the cell with 2X2 only.

Esawi commented 5 months ago

Hi @herlesupreeth ,

From what I can tell from the testing done, the dmrs position and these power parameters are not impacting the RTT and DL BLER only the UL BLER and the UL throughput.

for ssb_block_power_dbm: I used the calculator in this link to find the ssb_block_power_dbm . I have verified other units before to confirm these calculated values with the measured power from the RU (I will be repeating this with this test bed RU separately)

for p_max The UE used in these tests have a max power of 23dBm.

for p0_nominal_with_grant The value was being changed to test the impact on the UL performance.

ismagom commented 4 months ago

Hi @Esawi ,

Could you run a test with the 4x2 scenario using AmariUE and save the PHY signal in the amari config, please? then upload the amari logs and signal binary file so we can look at the signal of those small packets.

Thanks

Esawi commented 4 months ago

Hi @ismagom ,

Unfortunately, I am running into issues running the cell with the Amari (I will feedback with these logs as soon as possible).

Please note, running multiple tests, I narrowed down the parameter impacting the DL BLER with 4X2 is " csi_rs_period: 40". Is it necessary to have this value changed to 40 to support 32UEs? As with this value 4X2 MIMO is showing high DL BLER but 2X2 MIMO is not.

Run 01: running 4X2 MIMO with all the multi UE recommended cell_cfg_max_32_ues.yml configuration below:

cell_cfg:
  pucch:
    sr_period_ms: 20
    f1_nof_cell_res_sr: 15
    f2_nof_cell_res_csi: 15
  csi:
    csi_rs_period: 40
  ul_common:
    max_ul_grants_per_slot: 16
    max_pucchs_per_slot: 15
  pusch:
    max_puschs_per_slot: 1
  tdd_ul_dl_cfg:
    nof_dl_symbols: 7
    nof_dl_slots: 7
    nof_ul_slots: 2

Run 02: running 4X2 MIMO with the multi UE recommended configuration but commenting out " csi_rs_period: 40".

Run 03: Reference 2X2 MIMO (with all multi UE recommened configuration and " csi_rs_period: 40".)

carlo-gal commented 4 months ago

Hi @Esawi ,

From the logs you sent above (Test_run01_4x2.zip), it seems you are only using 1 UE.

In that case, there is no problem if you remove

  csi:
    csi_rs_period: 40

from your configuration. If you do so, you will automatically use 20ms as the default CSI period. It could be that with MIMO configuration and COTS UE over the air, 40ms is too large. But I would like to double check to make sure that's actually the case.

Note that the PUCCH/CSI configurations suggested in the config folder are meaningful only if you use several UEs, and especially if you are constrained on the number of PUCCH/PUSCH grants you can schedule per slot. I don't think that's your case, so you can use a shorter CSI period.

Now, for me to be able to reproduce the issue, I have a question. What do you mean exactly when you say you use MIMO 4x2? If you set the 4 TX ports in the GNB and given that the UE has 4 RX ports (according to Quectel specs (https://www.quectel.com/wp-content/uploads/2021/03/Quectel_RM500Q-GL_5G_Specification_V1.3.pdf), it means you are basically using MIMO 4x4. Unless there is a way for Quectel to reduce the number of layers to 2. Can you confirm if that's the case?

Esawi commented 4 months ago

Hi @carlo-gal , Indeed these tests are with a single UE, but the configuration is being prepared to run multi UE cells (hence testing the impact of these parameters). From what I understand from your feedback the "csi_rs_period" would be necessary for the multi UE cells, which means the DL BLER (and it's impact on the RTT) would effect the cell performance.

Regarding your question , I am sorry If I wasn't clear before. In all these cases all 4 Antenna ports of the RU are connected to the 4 Ports of the Quectel (4X4 indeed), it is just on the gnb_config.yml I am using 4 TX and 2 RX with Run01 and Run02 while on Run03 it is 2 TX and 2 RX.

ismagom commented 4 months ago

Hi @Esawi , for the case of 4X2 and csi_rs_period: 40, we still would like to diagnose the issue (even if you don't need this configuration now). Could you enable the following logs:

And upload them here please? Thanks

carlo-gal commented 4 months ago

Hi @Esawi,

We went again through the logs and we didn't find anything wrong in the way the GNB works. Given that we can't get the UE logs, we can only make assumptions on what is actually happening.

We saw that the PMI in the CSI changes quite frequently, which seems to justify why the CSI report period of 40 ms leads to high BLER (basically, the channel changes quickly and the CSI is not reported frequently enough to keep track if the changes).

We also noticed that the PDSCH grants are allocated over 1 PRB only, due to the small size of ping packets, and this PRB is located at the edge of the band. In this case, the MIMO precoder (which is indicated by the PMI in the CSI report), which is wideband, could be not representing well the channel for that specific PRB, located at the edge of the band. And the CSI report of 40 ms could makes this worse. But at the moment, we have no way to prove our hypothesis correct.

For us to better understand the problem, could you explain a bit more the setup you use for your experiment?

If you have any details on the way to run the experiments, could you share them? Anything would help us figure out the reason of the BLER.

Thanks!

Esawi commented 4 months ago

Sorry for the late response

@ismagom, I am sorry I am still running into issues with the Amarisoft in our lab. I will be providing the requested logs as soon as I can.

@carlo-gal . I am sorry, I will try to get amarisoft logs as soon as I can. But regarding the information requested:

As I mentioned I will be re-doing these tests again with the Amarisoft as soon as I can.

ismagom commented 1 month ago

Hi @Esawi , is this issue still happening?

Esawi commented 1 month ago

Hi @ismagom ,

I think we can close the issue, as we are not observing it anymore with the latest versions and config files we use. thank you all for the support.

Regards,

Mahmoud Esawi