zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
9.98k stars 6.15k forks source link

ESP32-S3 WiFi Interrupt Registration breaks if Watchdog Enabled #74368

Closed EricNRS closed 1 day ago

EricNRS commented 1 week ago

Describe the bug Once 6 other peripheral interrupts are installed, the WiFi "timer_alarm_isr" interrupt allocation fails resulting in loss of WiFi functionality.

Without the watchdog enabled, the WiFi timer interrupt "timer_alarm_isr" is correctly allocated.

(gdb) print int_log_data
$1 = {
  {source = 16, intr = 2, handler = 0x40378920 <gpio_esp32_isr>},
  {source = 27, intr = 3, handler = 0x4207b96c <uart_esp32_isr>}, 
  {source = 96, intr = 9, handler = 0x4207c2ec <serial_esp32_usb_isr>},
  {source = 28, intr = 12, handler = 0x4207b96c <uart_esp32_isr>},
  {source = 29, intr = 13, handler = 0x4207b96c <uart_esp32_isr>},
  {source = 42, intr = 17, handler = 0x40379564 <i2c_esp32_isr>},
  {source = 59, intr = 18, handler = 0x40377a30 <timer_alarm_isr>},  <-- WiFi interrupt
  {source = 0, intr = 0, handler = 0x0} <repeats 25 times>
}

With watchdog enabled, WiFi timer ISR fails and is not in the allocation table:

(gdb) print int_log_data
$8 = {
  {source = 16, intr = 2, handler = 0x40378920 <gpio_esp32_isr>},
  {source = 27, intr = 3, handler = 0x4207bb24 <uart_esp32_isr>},
  {source = 96, intr = 9, handler = 0x4207c4a4 <serial_esp32_usb_isr>},
  {source = 52, intr = 12, handler = 0x4207e0f4 <wdt_esp32_isr>},
  {source = 28, intr = 13, handler = 0x4207bb24 <uart_esp32_isr>},
  {source = 29, intr = 17, handler = 0x4207bb24 <uart_esp32_isr>},
  {source = 42, intr = 18, handler = 0x40379564 <i2c_esp32_isr>},
  # <timer_alarm_isr> missing
  {source = 0, intr = 0, handler = 0x0} <repeats 25 times>
}

Here is the GDB screenshot showing that the failure is caused by the interrupt controller failing to get an interrupt from the call to get_available_int().

┌─/home/work//zephyrproject-3.6/zephyr/drivers/interrupt_controller/intc_esp32.c──────────────────────────────────────────────────┐
│      592           */                                                                                                                         │
│      593          switch (source) {                                                                                                           │
│      594          case ETS_INTERNAL_TIMER0_INTR_SOURCE:                                                                                       │
│      595                  force = ETS_INTERNAL_TIMER0_INTR_NO;                                                                                │
│      596                  break;                                                                                                              │
│      597          case ETS_INTERNAL_TIMER1_INTR_SOURCE:                                                                                       │
│      598                  force = ETS_INTERNAL_TIMER1_INTR_NO;                                                                                │
│      599                  break;                                                                                                              │
│      600          case ETS_INTERNAL_TIMER2_INTR_SOURCE:                                                                                       │
│      601                  force = ETS_INTERNAL_TIMER2_INTR_NO;                                                                                │
│      602                  break;                                                                                                              │
│      603          case ETS_INTERNAL_SW0_INTR_SOURCE:                                                                                          │
│      604                  force = ETS_INTERNAL_SW0_INTR_NO;                                                                                   │
│      605                  break;                                                                                                              │
│      606          case ETS_INTERNAL_SW1_INTR_SOURCE:                                                                                          │
│      607                  force = ETS_INTERNAL_SW1_INTR_NO;                                                                                   │
│      608                  break;                                                                                                              │
│      609          case ETS_INTERNAL_PROFILING_INTR_SOURCE:                                                                                    │
│      610                  force = ETS_INTERNAL_PROFILING_INTR_NO;                                                                             │
│      611                  break;                                                                                                              │
│      612          default:                                                                                                                    │
│      613                  break;                                                                                                              │
│      614          }                                                                                                                           │
│      615                                                                                                                                      │
│      616          /* Allocate a return handle. If we end up not needing it, we'll free it later on. */                                        │
│      617          ret = k_malloc(sizeof(struct intr_handle_data_t));                                                                          │
│      618          if (ret == NULL) {                                                                                                          │
│      619                  return -ENOMEM;                                                                                                     │
│      620          }                                                                                                                           │
│      621                                                                                                                                      │
│      622          esp_intr_lock();                                                                                                            │
│      623          int cpu = esp_cpu_get_core_id();                                                                                            │
│      624          /* See if we can find an interrupt that matches the flags. */                                                               │
│      625          int intr = get_available_int(flags, cpu, force, source);                                                                    │
│      626                                                                                                                                      │
│      627          if (intr == -1) {                                                                                                           │
│      628                  /* None found. Bail out. */                                                                                         │
│  >   629                  esp_intr_unlock();                                                                                                  │
│      630                  k_free(ret);                                                                                                        │
│      631                  return -ENODEV;                                                                                                     │
│      632          }                                                                                                                           │
│      633          /* Get an int vector desc for int. */                                                                                       │
│      634          struct vector_desc_t *vd = get_desc_for_int(intr, cpu);                                                                     │
│      635                                                                                                                                      │
│      636          if (vd == NULL) {                                                                                                           │
│      637                  esp_intr_unlock();                                                                                                  │
│      638                  k_free(ret);                                                                                                        │
│      639                  return -ENOMEM;                                                                                                     │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
extended-r Thread 1070354784 In: esp_intr_alloc_intrstatus                                                                  L629  PC: 0x40389f15 
}
(gdb) print cpu
$3 = 0
(gdb) print force
$4 = -1
(gdb) print source
$5 = 59
(gdb) bt    
#0  esp_intr_alloc_intrstatus (source=59, flags=1026, intrstatusreg=0, intrstatusmask=0, handler=0x40377a30 <timer_alarm_isr>, arg=0x0,
    ret_handle=ret_handle@entry=0x0) at /home/work//zephyrproject-3.6/zephyr/drivers/interrupt_controller/intc_esp32.c:629
#1  0x4038a10c in esp_intr_alloc (source=59, flags=1026, handler=0x40377a30 <timer_alarm_isr>, arg=0x0, ret_handle=0x0)
    at /home/work//zephyrproject-3.6/zephyr/drivers/interrupt_controller/intc_esp32.c:739
#2  0x4202e347 in esp_timer_impl_init (alarm_handler=0x4037787c <timer_alarm_handler>)
    at /home/work//zephyrproject-3.6/modules/hal/espressif/components/esp_timer/src/esp_timer_impl_systimer.c:163
#3  0x4202e211 in esp_timer_init ()
    at /home/work//zephyrproject-3.6/modules/hal/espressif/components/esp_timer/src/esp_timer.c:556
#4  0x4207fa7b in esp32_wifi_dev_init (dev=<optimized out>)
    at /home/work//zephyrproject-3.6/zephyr/drivers/wifi/esp32/src/esp_wifi_drv.c:714
#5  0x4037d89e in do_device_init (entry=<optimized out>) at /home/work//zephyrproject-3.6/zephyr/kernel/init.c:312
#6  0x4037d908 in z_sys_init_run_level (level=INIT_LEVEL_POST_KERNEL) at /home/work//zephyrproject-3.6/zephyr/kernel/init.c:368
#7  0x4037dae8 in bg_thread_main (unused1=<optimized out>, unused2=0x0, unused3=0x0)
    at /home/work//zephyrproject-3.6/zephyr/kernel/init.c:427
#8  0x4201ac5d in z_thread_entry (entry=0x4037dac8 <bg_thread_main>, p1=0x0, p2=0x0, p3=0x0)
    at /home/work//zephyrproject-3.6/zephyr/lib/os/thread_entry.c:48
(gdb) 

To Reproduce On ESP32-S3 WiFi sample, enable additional peripherals. I have enabled:

I have shared an updated version of the interrupt registration patch that was originally shared in the discussion at https://discord.com/channels/720317445772017664/1211822589081550878/1219766247051235449

After booting the system use GDB and enter the following commands to dump the registered interrupts:

set print array on
print int_log_data

Expected behavior Interrupt registration should be successful since there are more interrupts slots available per Table 9-2 in the ESP32-S3 TRM version 1.4.

Impact Functionality loss of either the watchdog or WiFi in this case. [0001-DEBUG-log-interrupt.patch](https://github.com/user-attachments/files/15858065/0001-DEBUG-log-interrupt.patch) Additional context Zephyr v3.6.99 at commit d78a9429450f89de7829c8af7efe41fdb692fd50.

EricNRS commented 1 week ago

@sylvioalves

EricNRS commented 1 week ago

As a temporary work-around to continue development, I have enabled level 2 interrupts for the WiFI using this patch:

commit 19b1cdd240e5f9453a8589ad8bf2609384cd9796 (HEAD)
Author: Eric Holmberg <eric.holmberg@northriversystems.co.nz>
Date:   Mon Jun 17 01:08:22 2024 +1200

    esp-hal-components: allow WiFi interrupt to use either level 1 or level 2

    Level 1 interrupts are failing to register, so enable level 2 interrupts
    while original failure is investigated.

    Fixes #74368

    Signed-off-by: Eric Holmberg <eric.holmberg@northriversystems.co.nz>

diff --git a/components/esp_timer/src/esp_timer_impl_systimer.c b/components/esp_timer/src/esp_timer_impl_systimer.c
index df379a198f..f2581a20ba 100644
--- a/components/esp_timer/src/esp_timer_impl_systimer.c
+++ b/components/esp_timer/src/esp_timer_impl_systimer.c
@@ -154,7 +154,7 @@ esp_err_t esp_timer_impl_early_init(void)

 esp_err_t esp_timer_impl_init(intr_handler_t alarm_handler)
 {
-    int isr_flags = ((1 << 1) & ESP_INTR_FLAG_LEVELMASK)
+    int isr_flags = ((1 << 1 | 1 << 2) & ESP_INTR_FLAG_LEVELMASK)
 #if !SOC_SYSTIMER_INT_LEVEL
                     | ESP_INTR_FLAG_EDGE
 #endif
@@ -163,6 +163,7 @@ esp_err_t esp_timer_impl_init(intr_handler_t alarm_handler)
        esp_err_t err = esp_intr_alloc(ETS_SYSTIMER_TARGET2_EDGE_INTR_SOURCE, isr_flags,
                                                                   (ISR_HANDLER)timer_alarm_isr, NULL, NULL);
        if (err != ESP_OK) {
+        __ASSERT(0, "esp_intr_alloc failed (0x%x)", err);
         ESP_EARLY_LOGE(TAG, "esp_intr_alloc failed (0x%x)", err);
         return err;
     }
sylvioalves commented 1 week ago

@EricNRS, let me check something in wifi driver that could be related to this..

epc-ake commented 1 week ago

I've encountered the same issue also within the sample/net/wifi example by only enabeling the DMA. add CONFIG_DMA=y to the prj.conf and enable the DMA in the dt overlay.

The patch from @EricNRS also fixes it.

epc-ake commented 1 week ago

Not sure if it is the same issue but even with the patch applied I still could not connect to wifi in my custom application. There I allocated an additional interrupt and forgot to specify the interrupt priority. e.g.

esp_intr_alloc(cfg->irq_source,
               0,  < ----------
               (intr_handler_t)video_esp32_irq_handler,
               (void *)dev,
               NULL);

Specifying any Interrupt level there (e.g. ESP_INTR_FLAG_LEVEL4) seems to fix it again. I more or less copied that from the uart_esp32 driver where it also allocates an interrupt without specifying a priority level. (at least when not in async mode)

epc-ake commented 1 week ago

I keep stumbling over this problem... By digging in to it I came across the interrupt_descriptor_table. Looks like I'm running out of interrupt slots. However, can someone explain me why almost half of the Pheripheral interrupts are marked as reserved? (e.g. INTDESC_RESVD) Reserved by what?

from hal/esp32s3/interrupt_descriptor_table.c

const static int_desc_t interrupt_descriptor_table [32]={
    { 1, INTTP_LEVEL, {INTDESC_RESVD,  INTDESC_RESVD } }, //0
    { 1, INTTP_LEVEL, {INTDESC_RESVD,  INTDESC_RESVD } }, //1
    { 1, INTTP_LEVEL, {INTDESC_NORMAL, INTDESC_NORMAL} }, //2
    { 1, INTTP_LEVEL, {INTDESC_NORMAL, INTDESC_NORMAL} }, //3
    { 1, INTTP_LEVEL, {INTDESC_RESVD,  INTDESC_NORMAL} }, //4
    { 1, INTTP_LEVEL, {INTDESC_RESVD,  INTDESC_RESVD } }, //5
    { 1, INTTP_NA,    {INT6RES,        INT6RES       } }, //6
    { 1, INTTP_NA,    {INTDESC_SPECIAL,INTDESC_SPECIAL}}, //7
    { 1, INTTP_LEVEL, {INTDESC_RESVD,  INTDESC_RESVD } }, //8
    { 1, INTTP_LEVEL, {INTDESC_NORMAL, INTDESC_NORMAL} }, //9
    { 1, INTTP_EDGE , {INTDESC_NORMAL, INTDESC_NORMAL} }, //10
    { 3, INTTP_NA,    {INTDESC_SPECIAL,INTDESC_SPECIAL}}, //11
    { 1, INTTP_LEVEL, {INTDESC_NORMAL, INTDESC_NORMAL} }, //12
    { 1, INTTP_LEVEL, {INTDESC_NORMAL, INTDESC_NORMAL} }, //13
    { 7, INTTP_LEVEL, {INTDESC_RESVD,  INTDESC_RESVD } }, //14, NMI
    { 3, INTTP_NA,    {INT15RES,       INT15RES      } }, //15
    { 5, INTTP_NA,    {INTDESC_SPECIAL,INTDESC_SPECIAL} }, //16
    { 1, INTTP_LEVEL, {INTDESC_NORMAL, INTDESC_NORMAL} }, //17
    { 1, INTTP_LEVEL, {INTDESC_NORMAL, INTDESC_NORMAL} }, //18
    { 2, INTTP_LEVEL, {INTDESC_NORMAL, INTDESC_NORMAL} }, //19
    { 2, INTTP_LEVEL, {INTDESC_NORMAL, INTDESC_NORMAL} }, //20
    { 2, INTTP_LEVEL, {INTDESC_NORMAL, INTDESC_NORMAL} }, //21
    { 3, INTTP_EDGE,  {INTDESC_RESVD,  INTDESC_NORMAL} }, //22
    { 3, INTTP_LEVEL, {INTDESC_NORMAL, INTDESC_NORMAL} }, //23
    { 4, INTTP_LEVEL, {INTDESC_RESVD,  INTDESC_NORMAL} }, //24
    { 4, INTTP_LEVEL, {INTDESC_RESVD,  INTDESC_RESVD } }, //25
    { 5, INTTP_LEVEL, {INTDESC_NORMAL, INTDESC_RESVD } }, //26
    { 3, INTTP_LEVEL, {INTDESC_RESVD,  INTDESC_RESVD } }, //27
    { 4, INTTP_EDGE,  {INTDESC_NORMAL, INTDESC_NORMAL} }, //28
    { 3, INTTP_NA,    {INTDESC_SPECIAL,INTDESC_SPECIAL}}, //29
    { 4, INTTP_EDGE,  {INTDESC_RESVD,  INTDESC_RESVD } }, //30
    { 5, INTTP_LEVEL, {INTDESC_RESVD,  INTDESC_RESVD } }, //31
};
epc-ake commented 1 week ago

When DMA is enabled, it takes at least 6 (up to 10) interrupt slots for all of its rx/tx channels. In my application I only need one rx channel though. What ended up working for me is to configure all DMA Channel interrupts to share the same interrupt slot by setting the ESP_INTR_FLAG_SHARED flag when allocating the interrupts in the dma_esp32_gdma driver.

EricNRS commented 1 week ago

Are you on an older version of Zephyr? I show interrupt_descriptor_table.c and interrupt_descriptor_table as only being in IDF 4.x and not in the latest v5.1 version.

epc-ake commented 1 week ago

You're right. I was on zephyr 3.6.0. In the latest hal shipping with 3.7.0-rc1 the table moved to components/esp_hw_support/cpu.c. https://github.com/zephyrproject-rtos/hal_espressif/blob/snapshot/1/components/esp_hw_support/cpu.c#L181 However, the issue persists and there are still almost half of all slots marked as "reserved".

EricNRS commented 4 days ago

@epc-ake - FYI, you may not want to use LEVEL4 interrupts. I ran across this issue which if still relevant, implies that anything higher than XCHAL_EXCM_LEVEL (which is 3) may cause corruption of the stack during window exceptions.

https://github.com/zephyrproject-rtos/zephyr/issues/41039

sylvioalves commented 3 days ago

@epc-ake DMA issue is being handled in another PR.