zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.66k stars 6.53k forks source link

BUS FAULT: Hard fault arising from gpio_add_callback() and potentially slist pointers? #57612

Closed savvn001 closed 1 year ago

savvn001 commented 1 year ago

Zephyr OS Build: v3.2.99-ncs2 (nRF connect SDK fork BUT this is Zephyr specific issue I think) Platform: NRF52

Hi, got an issue that I'm stuck on currently, when trying to initialise some callbacks for GPIO interrupts. The issue arises when I try to call gpio_add_callback for the 2nd time in the application.

So let's say in sensor driver 1 .c file I have the textbook setup for a GPIO interrupt, and assigning a cb fn.

static const struct gpio_dt_spec _max30101_int_pin = GPIO_DT_SPEC_GET(DT_NODELABEL(ppgint), gpios);
static struct gpio_callback _max30101_int_cb_data;

static void _max30101_int_cb(const struct device *dev, struct gpio_callback *cb, uint32_t pins)
{
    LOG_DBG("int cb");
    // Submit a work item to the sensors work queue to read out the sensor's FIFO
    k_work_submit_to_queue(&sensors_ctrl_work_q, &_read_fifo_work_item.work);
}

// Config interrupt on PPG int pin - for MAX30101 it is active low when int is triggered
ret = gpio_pin_configure_dt(&_max30101_int_pin, GPIO_INPUT);
ret = gpio_pin_interrupt_configure_dt(&_max30101_int_pin, GPIO_INT_EDGE_FALLING);

if (ret < 0)
{
    LOG_ERR("Err initing GPIO for int pin");
    return;
}

// Config callback on int pin
gpio_init_callback(&_max30101_int_cb_data, _max30101_int_cb, BIT(_max30101_int_pin.pin));
gpio_add_callback(_max30101_int_pin.port, &_max30101_int_cb_data);

That works ok. It builds and runs. Now in sensor driver 2.c file I have the exact same thing. The exact same snippet except, for the pin being different. The variable names are different and it is in a separate .c file. The callback function is a different statically defined function inside this sensor driver 2.c.

When this snippet from sensor driver 2.c get's called after sensor driver 1 .c I get the following error:

[00:00:00.014,709] <err> os: bus_fault: ***** BUS FAULT *****
[00:00:00.021,484] <err> os: bus_fault:   Precise data bus error
[00:00:00.028,533] <err> os: bus_fault:   BFAR Address: 0x30b44908
[00:00:00.035,797] <err> os: esf_dump: r0/a1:  0x20003fd8  r1/a2:  0x30b44908  r2/a3:  0x200032c8
[00:00:00.045,928] <err> os: esf_dump: r3/a4:  0x30b44908 r12/ip:  0x00000006 r14/lr:  0x00035107
[00:00:00.055,999] <err> os: esf_dump:  xpsr:  0x21000000
[00:00:00.062,469] <err> os: esf_dump: Faulting instruction address (r15/pc): 0x000350e2
[00:00:00.071,716] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:00.081,298] <err> os: z_fatal_error: Current thread: 0x200030a8 (main)

The offending line is gpio_add_callback() from sensor driver 2.c. So I dug into it, and the issue starts at this snippet from gpio_utils.h

/**
 * @brief Generic function to insert or remove a callback from a callback list
 *
 * @param callbacks A pointer to the original list of callbacks (can be NULL)
 * @param callback A pointer of the callback to insert or remove from the list
 * @param set A boolean indicating insertion or removal of the callback
 *
 * @return 0 on success, negative errno otherwise.
 */
static inline int gpio_manage_callback(sys_slist_t *callbacks,
                    struct gpio_callback *callback,
                    bool set)
{
    __ASSERT(callback, "No callback!");
    __ASSERT(callback->handler, "No callback handler!");

    if (!sys_slist_is_empty(callbacks)) {
        if (!sys_slist_find_and_remove(callbacks, &callback->node)) {
            if (!set) {
                return -EINVAL;
            }
        }
    } else if (!set) {
        return -EINVAL;
    }

    if (set) {
        sys_slist_prepend(callbacks, &callback->node);
    }

    return 0;
}

It goes into sys_slist_find_and_remove()and somewhere in there it ends up at slist.h

static inline sys_snode_t *z_snode_next_peek(sys_snode_t *node)
{
    return node->next;
}

node->next is just 0 and I guess maybe returning this null pointer causes a hard fault. I'm not sure why it's 0 as the slist implementation is beyond what I know, but I just dug into this using the debugger and saw this was the offending line.

github-actions[bot] commented 1 year ago

Hi @savvn001! We appreciate you submitting your first issue for our open-source project. 🌟

Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. πŸ€–πŸ’™

anangl commented 1 year ago

@savvn001 Please provide a minimal application with which the problem could be observed. I was not able to reproduce such behavior using the information you provided. There is no problem with adding a second GPIO callback in an application. I tried this with the latest Zephyr and with NCS 2.3.0 and everything worked fine, so the problem you are encountering is probably caused by something else. But the exact code that you are using is required for any further investigation.

savvn001 commented 1 year ago

Hi @anangl.

Hmm ok so since you can't reproduce it, there is likely an issue being caused by something else in the app. I did some more experimenting whilst trying to narrow down a minimal example to reproduce the issue. Can you reproduce the issue with the following example? For me just this is needed to reproduce the issue.

MCU: nrf52840 NCS v2.3.0 Build type: DEBUG (-Og)

prj.conf:

# Increased stack due to settings API usage
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=2048

# Bluetooth
CONFIG_BT=y
CONFIG_BT_DEBUG_LOG=y
CONFIG_BT_SMP=y
CONFIG_BT_SIGNING=y
CONFIG_BT_PERIPHERAL=y
CONFIG_BT_ATT_PREPARE_COUNT=5
CONFIG_BT_PRIVACY=y
CONFIG_BT_DEVICE_NAME="ZEPHYR BLE EXAMPLE"
CONFIG_BT_DEVICE_APPEARANCE=833
CONFIG_BT_DEVICE_NAME_DYNAMIC=y
CONFIG_BT_DEVICE_NAME_MAX=65
CONFIG_BT_KEYS_OVERWRITE_OLDEST=y
CONFIG_BT_SETTINGS=y

# Flash 
CONFIG_FLASH=y
CONFIG_FLASH_PAGE_LAYOUT=y
CONFIG_FLASH_MAP=y
CONFIG_NVS=y
CONFIG_SETTINGS=y

# Logging
CONFIG_LOG=y
CONFIG_LOG_MODE_DEFERRED=n
CONFIG_LOG_MODE_IMMEDIATE=y
CONFIG_LOG_RUNTIME_FILTERING=y
CONFIG_LOG_BACKEND_SHOW_COLOR=y
CONFIG_LOG_INFO_COLOR_GREEN=y
CONFIG_LOG_BACKEND_UART=y
CONFIG_USE_SEGGER_RTT=y
CONFIG_LOG_BACKEND_RTT=y

# Shell
CONFIG_PRINTK=y
CONFIG_SHELL=y
CONFIG_INIT_STACKS=y
CONFIG_THREAD_STACK_INFO=y
CONFIG_KERNEL_SHELL=y
CONFIG_THREAD_MONITOR=y
CONFIG_BOOT_BANNER=y
CONFIG_THREAD_NAME=y
CONFIG_DEVICE_SHELL=y
CONFIG_POSIX_CLOCK=y
CONFIG_DATE_SHELL=y
CONFIG_THREAD_RUNTIME_STATS=y
CONFIG_THREAD_RUNTIME_STATS_USE_TIMING_FUNCTIONS=y
CONFIG_STATS=y
CONFIG_STATS_SHELL=y
CONFIG_SHELL_VT100_COLORS=y
CONFIG_SHELL_LOG_BACKEND=n

# HAL Config
CONFIG_I2C=y
CONFIG_GPIO=y
CONFIG_PWM=y
CONFIG_LED=y
CONFIG_SERIAL=y

# Misc.
CONFIG_POSIX_API=y
CONFIG_GETOPT_LONG=y

CONFIG_MAIN_STACK_SIZE=8192
CONFIG_IDLE_STACK_SIZE=4096

main.c:

void main(void)
{
   max30101_ctrl_init();
   sensor_2_ctrl_init();

}

max30101.c:


static const struct i2c_dt_spec _max30101_i2c = I2C_DT_SPEC_GET(DT_NODELABEL(max30101));
static const struct gpio_dt_spec _max30101_int_pin = GPIO_DT_SPEC_GET(DT_NODELABEL(ppgint), gpios);
static struct gpio_callback _max30101_int_cb_data;
extern struct k_work_q sensors_ctrl_work_q; // Sensors control work queue thread to submit work to

static struct maxworkitem // Work item to submit from ISR
{
    struct k_work work;
    char name[16];
} _max30101_work;

static void _max30101_int_cb(const struct device *dev, struct gpio_callback *cb, uint32_t pins)
{
    LOG_DBG("int cb");
    // Submit a work item to the sensors work queue to read out the sensor's FIFO
    k_work_submit_to_queue(&sensors_ctrl_work_q, &_max30101_work.work);
}

void max30101_ctrl_init()
{
    int ret;

    if (!device_is_ready(_max30101_i2c.bus))
    {
        LOG_ERR("I2C bus %s is not ready!", _max30101_i2c.bus->name);
        return;
    }

    if (!device_is_ready(_max30101_int_pin.port))
    {
        LOG_ERR("PPG Int pin is not ready!");
        return;
    }

    // Config interrrupt on PPG int pin - for MAX30101 it is active low when int is triggered
    ret = gpio_pin_configure_dt(&_max30101_int_pin, GPIO_INPUT);
    ret = gpio_pin_interrupt_configure_dt(&_max30101_int_pin, GPIO_INT_EDGE_FALLING);

    if (ret < 0)
    {
        LOG_ERR("Err initing GPIO for int pin");
        return;
    }

    // Config callback on int pin
    gpio_init_callback(&_max30101_int_cb_data, _max30101_int_cb, BIT(_max30101_int_pin.pin));
    gpio_add_callback(_max30101_int_pin.port, &_max30101_int_cb_data);

    if (ret < 0)
    {
        LOG_ERR("Err initing callback for int pin");
        return;
    }

    // Init work item that will be submitted on int cb
    strcpy(_max30101_work.name, "max30101_read_fifo");
    k_work_init(&_max30101_work.work, _read_fifo);
}

sensor_2_ctrl_init(); is the exact same as above but different pin and i2c device. You should see that it faults inside sensor_2_ctrl_init();, on the gpio_add_callback() fn.

I then found, for some reason, commenting out these lines:

    strcpy(_max30101_work.name, "max30101_read_fifo");
    k_work_init(&_max30101_work.work, _read_fifo);

Fixes the issue - no hard faults. I'm struggling to see the connection here between initialising that work item and the callback init function.

anangl commented 1 year ago

I then found, for some reason, commenting out these lines:

    strcpy(_max30101_work.name, "max30101_read_fifo");
    k_work_init(&_max30101_work.work, _read_fifo);

Fixes the issue - no hard faults. I'm struggling to see the connection here between initialising that work item and the callback init function.

I guess that commenting out just the line with strcpy should help, as it tries to copy 19 characters to a 16-character array.

savvn001 commented 1 year ago

It does, good spot.... I totally missed that. Simple overflow bug, was making it seem as if the error was elsewhere This issue can be closed.