srsran / srsRAN_Project

Open source O-RAN 5G CU/DU solution from Software Radio Systems (SRS) https://docs.srsran.com/projects/project
https://www.srsran.com
GNU Affero General Public License v3.0
514 stars 173 forks source link

Starting gnb gives Error: exceeded maximum number of timed out transmissions. #727

Closed dominikheinz closed 2 months ago

dominikheinz commented 3 months ago

Issue Description

After the GNB starts, a few seconds later, I see Error: exceeded maximum number of timed out transmissions. And it appears that the UE can not connect anymore either.

Setup Details

Everything default. Ubuntu 22.04 install, system is very powerful (Intel 14900k) so this shouldnt be a performance issue. The HF Frontend is a USRP B210. All The exact same setup worked perfectly fine yesterday, and today, for some reason, I get this error even tho no changes were made.

Expected Behavior

The GNB starts without this error and UEs can connect.

Actual Behaviour

I see this error:

fiveglab@fiveglab-Precision-3680 $ sudo ./gnb -c gnb_b210_20MHz_oneplus_8t.yml 

--== srsRAN gNB (commit a804c5b9c) ==--

The PRACH detector will not meet the performance requirements with the configuration {Format B4, ZCZ 0, SCS 30kHz, Rx ports 1}.
Lower PHY in quad executor mode.
N2: Connection to AMF on 127.0.0.5:38412 completed
Cell pci=1, bw=20 MHz, 1T1R, dl_arfcn=627340 (n78), dl_freq=3410.1 MHz, dl_ssb_arfcn=626976, ul_freq=3410.1 MHz

Available radio types: uhd.
[INFO] [UHD] linux; GNU C++ version 11.4.0; Boost_107400; UHD_4.7.0.0-0ubuntu1~jammy1
[INFO] [LOGGING] Fastpath logging disabled at runtime.
Making USRP object with args 'type=b200,num_recv_frames=64,num_send_frames=64'
[DEBUG] [B200] the firmware image: /usr/share/uhd/images/usrp_b200_fw.hex
[INFO] [B200] Detected Device: B210
[INFO] [B200] Operating over USB 3.
[INFO] [B200] Initialize CODEC control...
[INFO] [B200] Initialize Radio control...
[DEBUG] [AD936X] baseband bandwidth too large for current sample rate. Setting bandwidth to: 5e+07
[DEBUG] [AD936X] baseband bandwidth too large for current sample rate. Setting bandwidth to: 5e+07
[DEBUG] [AD936X] baseband bandwidth too large for current sample rate. Setting bandwidth to: 5e+07
[DEBUG] [AD936X] baseband bandwidth too large for current sample rate. Setting bandwidth to: 5e+07
[DEBUG] [AD936X] baseband bandwidth too large for current sample rate. Setting bandwidth to: 5e+07
[DEBUG] [AD936X] baseband bandwidth too large for current sample rate. Setting bandwidth to: 5e+07
[DEBUG] [AD936X] baseband bandwidth too large for current sample rate. Setting bandwidth to: 5e+07
[DEBUG] [AD936X] baseband bandwidth too large for current sample rate. Setting bandwidth to: 5e+07
[INFO] [B200] Performing register loopback test... 
[INFO] [B200] Register loopback test passed
[INFO] [B200] Performing register loopback test... 
[INFO] [B200] Register loopback test passed
[DEBUG] [AD936X] Performing CODEC loopback test... 
[DEBUG] [AD936X] CODEC loopback test passed.
[DEBUG] [AD936X] Performing CODEC loopback test... 
[DEBUG] [AD936X] CODEC loopback test passed.
[INFO] [B200] Setting master clock rate selection to 'automatic'.
[INFO] [B200] Asking for clock rate 16.000000 MHz... 
[INFO] [B200] Actually got clock rate 16.000000 MHz.
[DEBUG] [CORES] Performing timer loopback test... 
[DEBUG] [CORES] Timer loopback test passed.
[DEBUG] [CORES] Performing timer loopback test... 
[DEBUG] [CORES] Timer loopback test passed.
[INFO] [MULTI_USRP] Setting master clock rate selection to 'manual'.
[INFO] [B200] Asking for clock rate 23.040000 MHz... 
[INFO] [B200] Actually got clock rate 23.040000 MHz.
[DEBUG] [CORES] Performing timer loopback test... 
[DEBUG] [CORES] Timer loopback test passed.
[DEBUG] [CORES] Performing timer loopback test... 
[DEBUG] [CORES] Timer loopback test passed.
[DEBUG] [CONVERT] get_converter: For converter ID: conversion ID
  Input format:  fc32
  Num inputs:    1
  Output format: sc12_item32_le
  Num outputs:   1
 Using best available prio: 0
[DEBUG] [CONVERT] get_converter: For converter ID: conversion ID
  Input format:  sc12_item32_le
  Num inputs:    1
  Output format: fc32
  Num outputs:   1
 Using best available prio: 0
==== gNB started ===
Type <h> to view help
Error: exceeded maximum number of timed out transmissions.
Error: exceeded maximum number of timed out transmissions.
Error: exceeded maximum number of timed out transmissions.
Error: exceeded maximum number of timed out transmissions.
Error: exceeded maximum number of timed out transmissions.
Error: exceeded maximum number of timed out transmissions.
Error: exceeded maximum number of timed out transmissions.
Error: exceeded maximum number of timed out transmissions.
Error: exceeded maximum number of timed out transmissions.
Error: exceeded maximum number of timed out transmissions.
Error: exceeded maximum number of timed out transmissions.
Error: exceeded maximum number of timed out transmissions.

Steps to reproduce the problem

I don't know how it can be reproduced. The error appeared seemingly random.

Additional Information

This is the config I am using:

# This configuration file example shows how to configure the srsRAN Project gNB to connect to a COTS UE. As with the 
# associated tutorial, this config has been tested with a OnePlus 8T. A B210 USRP is used as the RF-frontend.   
# This config creates a TDD cell with 20 MHz bandwidth in band 78. 
# To run the srsRAN Project gNB with this config, use the following command: 
#   sudo ./gnb -c gnb_b210_20MHz_oneplus_8t.yaml

amf:
  addr: 127.0.0.5 #127.0.1.100                                             # The address or hostname of the AMF.
  bind_addr: 127.0.0.1                                          # A local IP that the gNB binds to for traffic from the AMF.

ru_sdr:
  device_driver: uhd                                            # The RF driver name.
  device_args: type=b200,num_recv_frames=64,num_send_frames=64  # Optionally pass arguments to the selected RF driver.
  sync: external                                                # Specify the sync source used by the RF. NOTE: Set to internal if NOT using an external 10 MHz reference clock. 
  srate: 23.04                                                  # RF sample rate might need to be adjusted according to selected bandwidth.
  otw_format: sc12
  tx_gain: 80                                                   # Transmit gain of the RF might need to adjusted to the given situation.
  rx_gain: 40                                                   # Receive gain of the RF might need to adjusted to the given situation.

cell_cfg:
  dl_arfcn: 627340                                              # ARFCN of the downlink carrier (center frequency).
  band: 78                                                      # The NR band.
  channel_bandwidth_MHz: 20                                     # Bandwith in MHz. Number of PRBs will be automatically derived.
  common_scs: 30                                                # Subcarrier spacing in kHz used for data.
  plmn: "99970" #"90170"                                                 # PLMN broadcasted by the gNB.
  tac: 7                                                        # Tracking area code (needs to match the core configuration).
  pci: 1                                                        # Physical cell ID.

log:
  filename: /tmp/gnb-debug.log                                         
  all_level: debug 

pcap:
  mac_enable: true                                              # Set to true to enable MAC-layer PCAPs.
  mac_filename: /tmp/gnb_mac.pcap                               # Path where the MAC PCAP is stored.
  ngap_enable: true                                             # Set to true to enable NGAP PCAPs.
  ngap_filename: /tmp/gnb_ngap.pcap                             # Path where the NGAP PCAP is stored.
dominikheinz commented 3 months ago

A full system reboot did also not resolve the issue.

dominikheinz commented 3 months ago

Logs: gnb-debug.log

fllay commented 3 months ago

sync: external? do you use an external clock?

dominikheinz commented 3 months ago

sync: external? do you use an external clock?

I do not, thats a good point. I will change it to sync: internal and see if that helps. Also, did you have a chance to check the logs, maybe there is something else, that causes this?

pgawlowicz commented 2 months ago

@dominikheinz any update on this issue?

dominikheinz commented 2 months ago

@dominikheinz any update on this issue?

Closed for now. For the future: I believe the issue had to do with the USB Controller being overloaded. After unplugging as many unnecessary devices as I could, especially from other USB3 ports, the issue seemingly went away.