srsran / srsRAN_4G

Open source SDR 4G software suite from Software Radio Systems (SRS) https://docs.srsran.com/projects/4g
https://www.srsran.com
GNU Affero General Public License v3.0
3.44k stars 1.13k forks source link

ue lockup with mbms enabled in lossy environment #936

Open jgiovatto opened 2 years ago

jgiovatto commented 2 years ago

Issue Description

UE will eventually lockup when mbsm is enabled in a lossy rf environment.

Setup Details

Setup epc enb and ue with the example config files with mbms enabled, possibly using zmq rf if convenient. Set DL gain such that for 30 seconds the ue sees a good signal from the enb and goes into attach and camping state, then for 30 seconds reduce gain such that ue goes back to cell search. Repeat this pattern for a few minutes (typically less than 5). Traffic flow unicast or multicast is not required.

Log trace patch here: ue_mbms_trace.txt

Expected Behavior

Ue should bounce between cell search and camping indefinitely

Actual Behaviour

Ue main processing eventually thread stops (blocks) in: bool phy_common::is_mch_subframe

      while (!have_mtch_stop) { 
        fprintf(stderr, "\t\t\tphy_common:%s XXX waiting on mtch_cvar\n", __func__); 
        mtch_cvar.wait(lock); 
      } 
      fprintf(stderr, "\t\t\tphy_common:%s XXX done with mtch_cvar\n", __func__); 
      lock.unlock(); 

sf_worker:work_imp tti 3197, ENTER phy_common:is_mch_subframe XXX done with mtch_cvar cc_worker:work_dl_mbsfn [1] ENTER cc_worker:work_dl_mbsfn [5] EXIT sf_worker:work_imp tti 3197, EXIT worker_pool:wait_worker done wait_worker tti 3198 worker_pool:wait_worker call wait_worker tti 3199 sf_worker:work_imp tti 3198, ENTER phy_common:is_mch_subframe XXX done with mtch_cvar cc_worker:work_dl_mbsfn [1] ENTER cc_worker:work_dl_mbsfn [5] EXIT sf_worker:work_imp tti 3198, EXIT worker_pool:wait_worker done wait_worker tti 3199 worker_pool:wait_worker call wait_worker tti 3200 sf_worker:work_imp tti 3199, ENTER sf_worker:work_imp tti 3199, EXIT worker_pool:wait_worker done wait_worker tti 3200 worker_pool:wait_worker call wait_worker tti 3201 sf_worker:work_imp tti 3200, ENTER sf_worker:work_imp tti 3200, EXIT worker_pool:wait_worker done wait_worker tti 3201 worker_pool:wait_worker call wait_worker tti 3202 sf_worker:work_imp tti 3201, ENTER cc_worker:work_dl_mbsfn [1] ENTER cc_worker:work_dl_mbsfn [2] cc_worker:work_dl_mbsfn [5] EXIT sf_worker:work_imp tti 3201, EXIT worker_pool:wait_worker done wait_worker tti 3202 worker_pool:wait_worker call wait_worker tti 3203 sf_worker:work_imp tti 3202, ENTER phy_common:is_mch_subframe XXX waiting on mtch_cvar <<<<< last log

jgiovatto commented 2 years ago

Hi folks, I dug in a little deeper here is what I found. Same scenario as above but with this patch instead. ue_mbms_trace.txt

mac:mch_decoded BEGIN len 277, crc 1 mac:mch_decoded serviceId 0, ce_type 30 mac:mch_decoded got sched info setting stop 0, <<< good mac:mch_decoded serviceId 0, ce_type 0 mac:mch_decoded serviceId 0, ce_type 31 mac:mch_decoded END mac:mch_decoded BEGIN len 277, crc 1 mac:mch_decoded serviceId 0, ce_type 30 mac:mch_decoded got sched info setting stop 0, <<< good mac:mch_decoded serviceId 0, ce_type 0 mac:mch_decoded serviceId 0, ce_type 31 mac:mch_decoded END mac:mch_decoded BEGIN len 277, crc 1 mac:mch_decoded serviceId 0, ce_type 30 mac:mch_decoded got sched info setting stop 0, <<< good mac:mch_decoded serviceId 0, ce_type 0 mac:mch_decoded serviceId 0, ce_type 31 mac:mch_decoded END mac:mch_decoded BEGIN len 277, crc 1 mac:mch_decoded serviceId 0, ce_type 0 mac:mch_decoded force stop <<<< hack to unlock the condition variable mac:mch_decoded END

andrepuschmann commented 1 year ago

Hi @jgiovatto, thanks for reporting the issue. Compiling srsRAN with TSAN (thread sanitizer) enabled and running the UE and eNB with eMBMS shows a few race conditions in the eMBMS related code. It's something we have just realized before the 22.10 release and couldn't fix but it could be related to your issues.