zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.88k stars 6.63k forks source link

NRF temperature sensor driver race condition #26518

Closed Counterfeiter closed 4 years ago

Counterfeiter commented 4 years ago

Describe the bug The NRF temperature sensor driver is not thread save. It has a race condition within the ISR handling. The bug appears if the temperature sensor is used to calibrate the low freq. clock source and also from user thread.

To Reproduce Steps to reproduce the behavior:

  1. use a sample board with no nrf low freq. crystal (like mircobit) or don't use the crystal with option: CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC=y CONFIG_CLOCK_CONTROL_NRF_K32SRC_250PPM=y
  2. Poll in the main loop the sensor values fast:
        int ret = sensor_sample_fetch(temp_dev);
    if(ret)
    {
        printk("Fetch temperature value failed\n");
        return 0.0f;
    }
    ret = sensor_channel_get(temp_dev, SENSOR_CHAN_DIE_TEMP, &temp_value);
  3. internal calibration happens every 4 seconds by default... wait about 20 seconds or less.

Expected behavior Temperature sensor is thread save.

Impact showstopper, because the main loop blocks at random time, depending on the user interval polling period. In my normal program case it was a 100 ms polling and I trigger it sometimes after 15-30 minutes.

Environment (please complete the following information):

Fixed by - (sorry no pull request from my side) Add mutex (device_mutex) for this resource:

static int temp_nrf5_sample_fetch(struct device *dev, enum sensor_channel chan)
{
    struct temp_nrf5_data *data = dev->driver_data;
    struct clock_control_async_data clk_data = {
        .cb = hfclk_on_callback
    };

    int r;

    /* Error if before sensor initialized */
    if (data->clk_dev == NULL) {
        return -EAGAIN;
    }

    if (chan != SENSOR_CHAN_ALL && chan != SENSOR_CHAN_DIE_TEMP) {
        return -ENOTSUP;
    }

    //wait for temp sensor resource
    k_mutex_lock(&data->device_mutex, K_FOREVER);

    r = clock_control_async_on(data->clk_dev, CLOCK_CONTROL_NRF_SUBSYS_HF,
                    &clk_data);
    __ASSERT_NO_MSG(!r);

    k_sem_take(&data->device_sync_sem, K_FOREVER);

    r = clock_control_off(data->clk_dev, CLOCK_CONTROL_NRF_SUBSYS_HF);
    __ASSERT_NO_MSG(!r);

    data->sample = nrf_temp_result_get(NRF_TEMP);
    LOG_DBG("sample: %d", data->sample);
    nrf_temp_task_trigger(NRF_TEMP, NRF_TEMP_TASK_STOP);

    k_mutex_unlock(&data->device_mutex);

    return 0;
}

Maybe other handling like deleting the code line (nrf_temp_task_trigger(NRF_TEMP, NRF_TEMP_TASK_STOP);) could also workaround the race condition but the mutex seems the cleanest thing for me.

pabigot commented 4 years ago

I think this might be fixed by in-progress work on the clock infrastructure; in any case this sensor is tightly coupled with that driver.