NRF temperature sensor driver race condition

Describe the bug The NRF temperature sensor driver is not thread save. It has a race condition within the ISR handling. The bug appears if the temperature sensor is used to calibrate the low freq. clock source and also from user thread.

To Reproduce Steps to reproduce the behavior:

use a sample board with no nrf low freq. crystal (like mircobit) or don't use the crystal with option: CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC=y CONFIG_CLOCK_CONTROL_NRF_K32SRC_250PPM=y

Poll in the main loop the sensor values fast:

    int ret = sensor_sample_fetch(temp_dev);
if(ret)
{
    printk("Fetch temperature value failed\n");
    return 0.0f;
}
ret = sensor_channel_get(temp_dev, SENSOR_CHAN_DIE_TEMP, &temp_value);

internal calibration happens every 4 seconds by default... wait about 20 seconds or less.

Expected behavior Temperature sensor is thread save.

Impact showstopper, because the main loop blocks at random time, depending on the user interval polling period. In my normal program case it was a 100 ms polling and I trigger it sometimes after 15-30 minutes.

Environment (please complete the following information):

OS: Ubuntu
Toolchain (e.g Zephyr SDK, ...)
Commit SHA or Version used: Booting Zephyr OS build zephyr-v2.3.0-505-ga4957be25b7e

Fixed by - (sorry no pull request from my side) Add mutex (device_mutex) for this resource:

static int temp_nrf5_sample_fetch(struct device *dev, enum sensor_channel chan)
{
    struct temp_nrf5_data *data = dev->driver_data;
    struct clock_control_async_data clk_data = {
        .cb = hfclk_on_callback
    };

    int r;

    /* Error if before sensor initialized */
    if (data->clk_dev == NULL) {
        return -EAGAIN;
    }

    if (chan != SENSOR_CHAN_ALL && chan != SENSOR_CHAN_DIE_TEMP) {
        return -ENOTSUP;
    }

    //wait for temp sensor resource
    k_mutex_lock(&data->device_mutex, K_FOREVER);

    r = clock_control_async_on(data->clk_dev, CLOCK_CONTROL_NRF_SUBSYS_HF,
                    &clk_data);
    __ASSERT_NO_MSG(!r);

    k_sem_take(&data->device_sync_sem, K_FOREVER);

    r = clock_control_off(data->clk_dev, CLOCK_CONTROL_NRF_SUBSYS_HF);
    __ASSERT_NO_MSG(!r);

    data->sample = nrf_temp_result_get(NRF_TEMP);
    LOG_DBG("sample: %d", data->sample);
    nrf_temp_task_trigger(NRF_TEMP, NRF_TEMP_TASK_STOP);

    k_mutex_unlock(&data->device_mutex);

    return 0;
}

Maybe other handling like deleting the code line (nrf_temp_task_trigger(NRF_TEMP, NRF_TEMP_TASK_STOP);) could also workaround the race condition but the mutex seems the cleanest thing for me.

zephyrproject-rtos / zephyr

NRF temperature sensor driver race condition #26518