srsran / srsRAN_Project

Open source O-RAN 5G CU/DU solution from Software Radio Systems (SRS) https://docs.srsran.com/projects/project
https://www.srsran.com
GNU Affero General Public License v3.0
471 stars 163 forks source link

gnb: "Error: exceeded maximum number of timed out transmissions" #772

Open tiger762 opened 4 weeks ago

tiger762 commented 4 weeks ago

Issue Description

Hello. Trying to stand up a gnb but neither an IPhone 12 nor a OnePlus N200 can discover the 5G network. Using band 71 (617-652 down, 663-698 up)

Setup Details

Ubuntu 22.04.4 LTS srsRAN 5G gNB version 24.04.0 Ettus B200 (UHD_4.1.0.5-3) Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz IPhone 12 and OnePlus N200 GPSDO providing 10MHz sinewave to port of B200

Expected Behavior

That the gnb runs stably, maybe an overflow or underflow every once in a while, and the 5G handsets discover the network

Actual Behaviour

Always this error: Error: exceeded maximum number of timed out transmissions. Less often, one of these: Error: unhandled error in Rx metadata ERROR_CODE_BAD_PACKET [ERROR] [STREAMER] The receive packet handler caught a value exception. [ERROR] [STREAMER] recv packet demuxer unexpected sid 0xabfea009

Steps to reproduce the problem

cat /root/gnb_b71_arfcn127900.yml

amf:
  addr: 192.168.16.254                                          # The address or hostname of the AMF.
  bind_addr: 192.168.16.1                                       # A local IP that the gNB binds to for traffic from the AMF.

ru_sdr:
  device_driver: uhd                                            # The RF driver name.
  device_args: type=b200,num_recv_frames=64,num_send_frames=64  # Optionally pass arguments to the selected RF driver.
  sync: external
  srate: 11.52                                                  # RF sample rate might need to be adjusted according to selected bandwidth.
  otw_format: sc12
  tx_gain: 70                                                   # Transmit gain of the RF might need to adjusted to the given situation.
  rx_gain: 70                                                   # Receive gain of the RF might need to adjusted to the given situation.

cell_cfg:
  dl_arfcn: 127980                                              # ARFCN of the downlink carrier (center frequency).
  band: 71                                                      # The NR band.
  channel_bandwidth_MHz: 5                                      # Bandwith in MHz. Number of PRBs will be automatically derived.
  common_scs: 15                                                # Subcarrier spacing in kHz used for data.
  plmn: "310690"                                                # PLMN broadcasted by the gNB.
  tac: 7                                                        # Tracking area code (needs to match the core configuration).
  pci: 1                                                        # Physical cell ID.

log:
  filename: /tmp/gnb.log                                        # Path of the log file.
  all_level: info                                               # Logging level applied to all layers.

pcap:
  mac_enable: true                                              # Set to true to enable MAC-layer PCAPs.
  mac_filename: /tmp/gnb_mac.pcap                               # Path where the MAC PCAP is stored.
  ngap_enable: true                                             # Set to true to enable NGAP PCAPs.
  ngap_filename: /tmp/gnb_ngap.pcap                             # Path where the NGAP PCAP is stored.

Additional Information

Just because someone is bound to ask, have tried all these suggested remedies:

taskset -c 0-7 ./gnb -c /root/gnb_b71_arfcn127900.yml
echo "performance" | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
./srsran_performance
./benchmark_rate --args type=b200 --rx_rate 23.04e6 --tx_rate 23.04e6
./latency_test
./test_timed_commands
./tx_waveforms --freq=900e6 --rate=23.04e6
./pdsch_ue -f 751000000 -a clock=external ----> CFO is within 100 Hz of local Verizon LTE eNodeB, so GPSDO is working

Spectrum analyzer showing the 5MHz bandwidth: b71_gnb_arfcn127900

tiger762 commented 4 weeks ago

Something else that might be significant. After the "exceeded maximum number of timed out transmissions" error, I cannot restart gnb. It segfaults. I have to unplug the USB cable and plug it back in. Then I can restart gnb. It reloads the FPGA firmware. Also, tried to change from 11.52 to 23.04 MHz. No effect..... This combination of hardware/software is broken, and there is no obvious reason why..

root@powerlab:~/srsRAN_Project/build/apps/gnb# ./gnb -c /root/gnb_b71_arfcn127900.yml 

--== srsRAN gNB (commit 4cf7513e9) ==--

The PRACH detector will not meet the performance requirements with the configuration {Format 0, ZCZ 0, SCS 1.25kHz, Rx ports 1}.
Lower PHY in quad executor mode.
N2: Connection to AMF on 192.168.16.254:38412 completed
Cell pci=1, bw=5 MHz, 1T1R, dl_arfcn=127980 (n71), dl_freq=639.9 MHz, dl_ssb_arfcn=127950, ul_freq=685.9 MHz

Available radio types: uhd.
[INFO] [UHD] linux; GNU C++ version 11.2.0; Boost_107400; UHD_4.1.0.5-3
[INFO] [LOGGING] Fastpath logging disabled at runtime.
Making USRP object with args 'type=b200,num_recv_frames=64,num_send_frames=64'
[INFO] [B200] Detected Device: B200
[INFO] [B200] Operating over USB 3.
[INFO] [B200] Initialize CODEC control...
[INFO] [B200] Initialize Radio control...
[INFO] [B200] Performing register loopback test... 
[INFO] [B200] Register loopback test passed
[INFO] [B200] Setting master clock rate selection to 'automatic'.
[INFO] [B200] Asking for clock rate 16.000000 MHz... 
[INFO] [B200] Actually got clock rate 16.000000 MHz.
[INFO] [MULTI_USRP] Setting master clock rate selection to 'manual'.
[INFO] [B200] Asking for clock rate 23.040000 MHz... 
[INFO] [B200] Actually got clock rate 23.040000 MHz.
**Segmentation fault (core dumped)**
dominikheinz commented 4 weeks ago

I experienced the same problem with the B210. See my issue about it: https://github.com/srsran/srsRAN_Project/issues/727

I never fully resolved it, but I assume it has to do with the USB controller being overloaded - tho I can't say for sure. Often times power cycling the device and removing other USB peripherals helped.

pgawlowicz commented 4 weeks ago

could you try with uhd4.3?

tiger762 commented 4 weeks ago

Hello, I downloaded uhd4.3.0.0 and built it in its own separate space, then modified the CMakeCache.txt file to point to the new LIBS and INCLUDE directory. The rest of my system (GNURadio, etc) relies on the installed UHD3.15.0

root@powerlab:~/srsRAN_Project/build# grep UHD CMakeCache.txt | grep PATH
//Path to a file.UHD_INCLUDE_DIRS:PATH=/usr/include
UHD_INCLUDE_DIRS:PATH=/root/uhd-4.3.0.0/host/include
//Path to a library.UHD_LIBRARIES:FILEPATH=/usr/lib/x86_64-linux-gnu/libuhd.so
UHD_LIBRARIES:FILEPATH=/root/uhd-4.3.0.0/host/build/lib/libuhd.so

So with it pointing to the new version of UHD host, I did a 'make clean' and 'make -j4' to rebuild srsRAN_Project. At long last, tried to run it with the new version but same result :(

root@powerlab:~/srsRAN_Project/build/apps/gnb# ./gnb -c /root/gnb_b71_arfcn127900.yml

--== srsRAN gNB (commit 4cf7513e9) ==--

The PRACH detector will not meet the performance requirements with the configuration {Format 0, ZCZ 0, SCS 1.25kHz, Rx ports 1}.
Lower PHY in quad executor mode.
N2: Connection to AMF on 192.168.16.254:38412 completed
Cell pci=1, bw=5 MHz, 1T1R, dl_arfcn=128000 (n71), dl_freq=640.0 MHz, dl_ssb_arfcn=127970, ul_freq=686.0 MHz

Available radio types: uhd.
[INFO] [UHD] linux; GNU C++ version 11.4.0; Boost_107400; **UHD_4.3.0.0-0-unknown**
[INFO] [LOGGING] Fastpath logging disabled at runtime.
Making USRP object with args 'type=b200,num_recv_frames=64,num_send_frames=64'
[INFO] [B200] Detected Device: B200
[INFO] [B200] **Operating over USB 3.**
[INFO] [B200] Initialize CODEC control...
[INFO] [B200] Initialize Radio control...
[INFO] [B200] Performing register loopback test...
[INFO] [B200] Register loopback test passed
[INFO] [B200] Setting master clock rate selection to 'automatic'.
[INFO] [B200] Asking for clock rate 16.000000 MHz...
[INFO] [B200] Actually got clock rate 16.000000 MHz.
[INFO] [MULTI_USRP] Setting master clock rate selection to 'manual'.
[INFO] [B200] Asking for clock rate 23.040000 MHz...
[INFO] [B200] Actually got clock rate 23.040000 MHz.
==== gNB started ===
Type <h> to view help
[ERROR] [STREAMER] The receive packet handler caught a value exception.
ValueError: bad vrt header or packet fragment
[ERROR] [STREAMER] recv packet demuxer unexpected sid 0x1efcdf8e
Error: unhandled error in Rx metadata ERROR_CODE_BAD_PACKET.Error: exceeded maximum number of timed out transmissions.
Error: exceeded maximum number of timed out transmissions.
Error: exceeded maximum number of timed out transmissions.
^CStopping ..
Error: exceeded maximum number of timed out transmissions.
Error: exceeded maximum number of timed out transmissions.
Error: exceeded maximum number of timed out transmissions.
Could not stop application after 5 seconds. Forcing exit.
Killed

I am at my wit's end :(

tiger762 commented 4 weeks ago

I experienced the same problem with the B210. See my issue about it: #727

I never fully resolved it, but I assume it has to do with the USB controller being overloaded - tho I can't say for sure. Often times power cycling the device and removing other USB peripherals helped.

Yes, I saw that and I believe there was another thread on this same issue. Someone else mentioned it had something to do with the GPSDO. So I reverted to using internal clock source. I have tried all six of my machine's USB3.0 ports, including unplugging unused USB peripherals. Nothing matters.

tiger762 commented 4 weeks ago

I experienced the same problem with the B210. See my issue about it: #727

I never fully resolved it, but I assume it has to do with the USB controller being overloaded - tho I can't say for sure. Often times power cycling the device and removing other USB peripherals helped.

@dominikheinz I think I've stumbled upon something here. I dropped my TX gain from 70 down to 40 and saw that it goes longer before the error shows up. Anything less than 40, and it could go as much as 7 minutes. There might still be 1 over or under flow which is fine. When I set the TX gain to 70 or above, the error shows up in 2 minutes on average. I did this back and forth several times this evening until I was convinced that there was a pattern here. So, even though the Bxx series is "bus powered", realistically the SDR board needs dedicated power. I also take note of the fact that a purely Rx application like uhd_fft can run indefinitely on bus power. Now to find a 6VDC power supply with appropriate plug diameter. Looks like a typical 5.5mm.

Interesting to note that lsusb doesn't return a realistic max_power for my B200. Always shows 8ma:

Every 1.0s: lsusb -v | egrep "^Bus|MaxPower"                        powerlab: Fri Aug 16 22:11:03 2024

Bus 002 Device 007: ID 2500:0020 Ettus Research LLC USRP B210
    MaxPower                8mA

Hope this helps!

dominikheinz commented 3 weeks ago

@tiger762 I had the issue, regardless is the B210 has external power or not.