melexis / mlx90632-library

MLX90632 library for the Melexis 90632 Infra Red temperature sensor.
Apache License 2.0
42 stars 15 forks source link

mlx90632_set_meas_type fail after addressed reset command #31

Closed jjungo closed 3 years ago

jjungo commented 3 years ago

Hi,

Setup: nrf52840, i2c clock is running at 400 kHz.

I'm having an issue with mlx90632_set_meas_type which is failing every time. After mlx90632_addressed_reset has been called, the next mlx90632_i2c_read will fail. I can see a NACK is received after the i2c address has been sent.

In this case, a repeated mlx90632_i2c_read fix the problem.

Thanks

Letme commented 3 years ago

Can you confirm that it is 150-200 us between the reset command and the next i2c read? Maybe your usleep implementation is a bit faster than expected?

jjungo commented 3 years ago

I don't have currently my logical analyzer but I would say yes, because it's the same when I step over with a debugger.

Letme commented 3 years ago

OK, that makes no sense. With debugger you should have well over a second between reset and next read, so sensor should be rebooted and ready. Can you share a bit more information about your setup that could help me spot the difference?

jjungo commented 3 years ago

It's a custom PCB and there is one other i2c slave on the same line but at different address.

I can provide the mlx wrapper that we have to implement and the underlying i2c_read/write, but there is no fancy stuff...

There is the mlx wrapper:

int32_t mlx90632_i2c_read(int16_t register_address, uint16_t *value) {
    uint8_t payload[2] = {
            [0] = (register_address >> 8) & (0xFF),
            [1] = (register_address) & (0xFF),
    };
    int ret = i2c_write(mlx_device.i2c_client_handle, mlx_device.i2c_addr, payload, sizeof(payload));
    if (ret < 0) {
        return ret;
    }

    ret = i2c_read(mlx_device.i2c_client_handle, mlx_device.i2c_addr, (uint8_t *) value, 2);
    if (ret < 0) {
        return -1;
    }

    *value = __bswap16(*value);

    return ret;
}

int32_t mlx90632_i2c_write(int16_t register_address, uint16_t value) {

    uint8_t payload[4] = {
            (register_address >> 8) & (0xFF),
            (register_address) & (0xFF),
            (value >> 8) & (0xFF),
            (value) & (0xFF),
    };

    ret = i2c_write(mlx_device.i2c_client_handle, mlx_device.i2c_addr, payload, sizeof(payload));
    return ret
}

void usleep(int min_range, int max_range) {
    nrf_delay_us(max_range);
}

void msleep(int msecs) {
    nrf_delay_ms(msecs);
}

And the i2c write and register_address

int i2c_write(i2c_client *client, uint8_t i2c_addr, const uint8_t *data, size_t len) {
    ret_code_t err = nrf_drv_twi_tx(&client->device->twi_instance, i2c_addr, data, len,
                                    client->config->stop_type); // stop_type is set in this case.
    if (err != NRF_SUCCESS) {
        // skipped err handling
    }

    return 0;
}

int i2c_read(i2c_client *client, uint8_t i2c_addr, uint8_t *data, size_t len) {

    ret_code_t err = nrf_drv_twi_rx(&client->device->twi_instance, i2c_addr, data, len);
    if (err != NRF_SUCCESS) {
        // skipped err handling
    }

    return 0;
}

The 5th parameter to nrf_drv_twi_tx concerns the stop condition after a i2c write operation. The nrf documation says:

If set, the stop condition is not generated on the bus after the transfer has completed successfully (allowing for a repeated start in the next transfer).

The initialization of the temperature sensor is the following:

int temperature_init() {    

    mlx90632_wrapper_init();
    int32_t r = mlx90632_init();
    if (r < 0) {
        return r;
    }

    r = mlx90632_set_meas_type(MLX90632_MTYP_MEDICAL_BURST);
    if (r < 0) {
        return r;
    }

    r = mlx90632_read_eeprom(&PR, &PG, &PO, &PT, &Ea, &Eb, &Fa, &Fb, &Ga, &Gb, &Ha, &Hb, &Ka);
    if (r < 0) {
        return r;
    }

    return 0;
}
Letme commented 3 years ago

Indeed nothing fancy, except that stop condition for the reset command which should not be set as we do not want to have repeated start because in i2c write we will need a stop condition to be generated. So I would think it would be safe that in i2c_write we can have a stop condition on the end.

jjungo commented 3 years ago

Thanks for your suggestion @Letme, but it didn't solve the issue. I tried to unset the stop condition in order to remove the repeated start during the reset command but I still have the same issue.

Letme commented 3 years ago

I went through your code and you have eeprom_unlock array there which you write before every write command. With library in place you do not need that. Maybe that is the problem and you just have non-even unlock commands sent...

jjungo commented 3 years ago

Oops very sorry, I forgot to remove this part in the pasted code The unlock command is not use in this case and the problem is present without it. I've update the showed wrapper.

Letme commented 3 years ago

I have no idea why I put registers as signed 16 bit integers, but let's assume that bad practice is the problem here (although it makes no sense, that it would reset after the second read if the command was wrong and you would notice that most likely on the scope).

(register_address >> 8) & (0xFF), -> (register_address & 0xFF00) >> 8,

I assumed that the device is reset and therefore a NACK, but do you use any other writes (like changing the refresh rate) in your code? Because it could be also a NACK from the bad register address access and the device never resets, thus the meas_type is never changed...

jjungo commented 3 years ago

Indeed u16 would obviously be better. But I don't think it's the problem.

No I don't have any other write command. The failing code is the one I showed before:

int32_t r = mlx90632_init();
r = mlx90632_set_meas_type(MLX90632_MTYP_MEDICAL_BURST);

Where mlx90632_set_meas_type fail on the next i2c read after the addressed reset.

jjungo commented 3 years ago

mlx90632_init call is pointless to raise the bug. A loop over mlx90632_set_meas_type shows the bug:

while (1) {
    ret = mlx90632_set_meas_type(MLX90632_MTYP_MEDICAL_BURST);
    if (ret < 0) {
        LOG_ERR("mlx90632_set_meas_type failed: %d", ret);
    }
    LOG_INFO("retry...");
    nrf_delay_ms(1000);
}
jjungo commented 3 years ago

With a logical analyzer I confirm the NACK is coming from the write address during the next read after reset. The following screenshots illustrate the problem while calling mlx90632_set_meas_type image image The address's nack: image

Letme commented 3 years ago

I see that messages are separated by 210us. How about we try with some bisection and lets start with 500us? If the chip is not awake in the first case, it should be after 500us for sure, so nack should not be there...

Letme commented 3 years ago

Init also contains write, so if write operation would be the problem then it would fail already there. So this is just a problem that chip is not yet back from reset - I tried on my boards and they are usually back before 100us, so we need to see what your custom PCB might add to the slowness...

jjungo commented 3 years ago

I see that messages are separated by 210us. How about we try with some bisection and lets start with 500us? If the chip is not awake in the first case, it should be after 500us for sure, so nack should not be there...

So I tried 500 us, 5ms and 150 ms and I still have the nack at the same place (I had the same issue with the debugger while stepping). I tried less than 150 us (minimum waiting time specified in the datasheet) like 80 us and the behavior didn't changed.

jjungo commented 3 years ago

Init also contains write, so if write operation would be the problem then it would fail already there.

Sorry but maybe I was not clear. I've reduced the scope around mlx90632_set_meas_type. I can see the bug by only calling mlx90632_set_meas_type in a loop so without any other write, mlx90632_init is not called in this trial.

So this is just a problem that chip is not yet back from reset - I tried on my boards and they are usually back before 100us,

Interesting, It looks like the specification given in the datasheet.

so we need to see what your custom PCB might add to the slowness...

What do you mean ?

Thank you very much for your time.

jjungo commented 3 years ago

I found something interesting !

I still loop over mlx90632_set_meas_type juste like I said in my previous post. But I changed the implementation of mlx90632_set_meas_type in the way I skip the modification of the control register.

int32_t mlx90632_set_meas_type(uint8_t type)
{
    int32_t ret;
    uint16_t reg_ctrl;

    if ((type != MLX90632_MTYP_MEDICAL) & (type != MLX90632_MTYP_EXTENDED) & (type != MLX90632_MTYP_MEDICAL_BURST) & (type != MLX90632_MTYP_EXTENDED_BURST))
        return -EINVAL;

    ret = mlx90632_addressed_reset();
    if (ret < 0)
        return ret;

    ret = mlx90632_i2c_read(MLX90632_REG_CTRL, &reg_ctrl);
    return ret;
}

Now the next i2c read after reset doesn't fail anymore. It looks like the result of the addressed reset is different depending of the state of some register.

jjungo commented 3 years ago

Another interesting thing. The nack is present on the following modes: medical burst and extended burst but not present on medical and extended modes. So the "reset behavior" is different depending on the previous selected mode[1:0] in the control register.

Letme commented 3 years ago

What do you mean?

I was thinking of some real-world interruptions may be on the power lines of the sensor, but since you tested with 500us and it is still not back then this is not the problem.

What is your return value of the init function? I think you found a problem where you have a non-extended mode chip, yet you are maybe trying to change it to the incorrect mode?

So if we move the reset to the setting of the control register with the type value (instead of reset), then this is the most probable reason. That means reset works (and its not a write problem), so what doesn't work is the type setting of your chip. This will be explained by init return value which is based on the MLX90632_EE_VERSION EEPROM value.

jjungo commented 3 years ago

What is your return value of the init function? I think you found a problem where you have a non-extended mode chip, yet you are maybe trying to change it to the incorrect mode?

The init function return ERANGE. So I would be able to use extended mode am I right ? But the MLX90632_MTYP_MEDICAL_BURST mode is only available in the extended mode ? I would say no..

The eeprom version value is 0x8505.

Letme commented 3 years ago

@slavysis can you look into this a bit more in detail as well?

slavysis commented 3 years ago

Do you see the NAK on every iteration or is it just a random thing?

jjungo commented 3 years ago

Hi @slavysis,

With this snippet I see the nack on 1/2 iterations. I have the feeling that the next i2c access after the reset will fail a repeated i2c access solve the issue. Like I said in my first post:

In this case, a repeated mlx90632_i2c_read fix the problem.

And the following workaround fix/hide the problem.

int32_t mlx90632_set_meas_type(uint8_t type)
{
    int32_t ret;
    uint16_t reg_ctrl;

    if ((type != MLX90632_MTYP_MEDICAL) & (type != MLX90632_MTYP_EXTENDED) & (type != MLX90632_MTYP_MEDICAL_BURST) & (type != MLX90632_MTYP_EXTENDED_BURST))
        return -EINVAL;

    ret = mlx90632_addressed_reset();
    if (ret < 0)
        return ret;

    /*
     * After mlx90632_addressed_reset has been called, the next mlx90632_i2c_read will fail.
     * I can see a NACK is received after the i2c address has been sent.
     * In this case, a repeated mlx90632_i2c_read fix the problem.
     * https://github.com/melexis/mlx90632-library/issues/31
     */
    ret = mlx90632_i2c_read(MLX90632_REG_CTRL, &reg_ctrl);
    if (ret < 0) {
        LOG_WARN("mlx90632_i2c_read has failed, retrying...");
        ret = mlx90632_i2c_read(MLX90632_REG_CTRL, &reg_ctrl);
        if (ret < 0) {
            LOG_ERR("mlx90632_i2c_read retry has failed. Setting is aborted.");
            return ret;
        }
        LOG_WARN("mlx90632_i2c_read retry has passed (last i2c error can be ignored).");
    }
......
.....
slavysis commented 3 years ago

I did some experiments and was able to re-produce what you are observing. Indeed, in burst mode, the addressed reset command is not executing as expected. There are 4 possible ways to deal with it:

  1. Use slower I2C clock frequency - 100KHz recommended
  2. Make sure that the time between the last clock of the addressed reset command and the STOP condition generation is less than 2us.
  3. Repeat the first command after the reset (kind of what you are currently doing)
  4. Skip the addressed reset command - the modes will still be properly set with only one drawback - the first measurement after the mode switch might be done using the old mode settings.

Thank you very much for finding this issue, digging in it and reporting it!

jjungo commented 3 years ago

Thanks for your experiments! Looks interesting.

Use slower I2C clock frequency - 100KHz recommended

Didn't work on my side, I still having the problem. Datasheet specifies fast mode and fast mode plus.

Make sure that the time between the last clock of the addressed reset command and the STOP condition generation is less than 2us.

I can measure a delay about 8us. I'm currently depending on the NRF (Nordic SDK) i2c driver stack and I don't want to patch it neither implement such workaround in my code base.

My chosen workaround repeats the mlx90632_set_meas_type API call if necessary. In that case I don't have to re-implement set_meas_type without addressed reset or hack the mlx or nrf libraries.

Repeat the first command after the reset (kind of what you are currently doing)

Maybe the mlx library should implement it, @Letme what's your point ?

jjungo commented 3 years ago

Hi @slavysis !

34 fixes it! Many thanks.

In addition, like @Letme has pointed out, i2c stop must be set for the addressed reset command.

Best,

Letme commented 3 years ago

@jjungo thanks for reporting the issue and helping us debug it. We really appreciate it. Sorry it took so long to fix. Will also tag a new release once master passes.