ruuvi / ruuvi.firmware.c

Ruuvi Firmware version 3. Built on top of Nordic SDK 15, uses both Ruuvi and external repositories as submodules. In Beta, no breaking changes are intended but may be done if absolutely necessary
BSD 3-Clause "New" or "Revised" License
54 stars 37 forks source link

NFC read may cause tag to freeze (reopen) #359

Open mhaberler opened 5 months ago

mhaberler commented 5 months ago

I frequently encounter tag freezes during NFC read with my NFC reading code

in particular this happens during an unsucessful read (UID retrieved, but not the NDEF messages - typically weak RF coupling)

disconnecting/reconnecting the battery makes the tag work again - I think this is the same as https://github.com/ruuvi/ruuvitag_fw/issues/139

this is next to a showstopper for us as diagnosis and reset of the tag in the field is cumbersome and the tag seems not to recover without help

prepared to help debugging given advice - I do have a Dev Shield at hand

mhaberler commented 5 months ago

it seems the tag eventually reboots all by itself

I just tried a previously locked-up tag and it could be NFC-read cleanly

what is the trigger for a reboot? cyclic? if so, how long?

mhaberler commented 5 months ago

it seems the tag recovers with a few minutes, which is good enough to live with

laurijamsa commented 5 months ago

I reopened this as this may be something that we would like to try to reproduce and fix.

markoaamunkajo commented 5 months ago

@mhaberler could you report the RuuviTag firmware version you are running in case it was not mentioned yet? Thanks!

mhaberler commented 5 months ago

will do but I will have access to the debug setup only wednesday onwards

this was a custom-built version with a modified I2C address for the external TMP117, will dig up the source - should have it later today @amonvinc @alueger

ojousima commented 5 months ago

Thank you for the report.

We used to have problems with NFC, the problem was supposedly fixed in v3.29.0 / https://github.com/ruuvi/ruuvi.drivers.c/commit/95d621648eaa9864da3a9f773b5b3abfaa0d9018 by updating the NFC driver to use latest binary from Nordic.

What exactly "freezes" means? Data stops updating, but old data is still broadcasted or tag does not broadcast at all? In any case, if tag freezes, e.g. by looping in NFC interrupt handler the watchdog timer will reboot the tag in 6 minutes because the heartbeat() does not run https://github.com/ruuvi/ruuvi.firmware.c/blob/master/src/app_heartbeat.c. You can adjust the watchdog trigger conditions there and configure the watchdog timer interval at https://github.com/ruuvi/ruuvi.firmware.c/blob/2778a3f0fcb08f0c5f26ba6ac13b7ad691f3cbeb/src/application_config/app_config.h#L43

Reboot clears the stored data in flash because timestamps lose their accuracy on reboot.

mhaberler commented 5 months ago

6 minutes is just fine.

with "freeze" I meant "no reaction to NFC reader" - not even UUID readable

I'll retry and see if sensor updates still happen in this state, will report

edit: need to replace the battery (now at 5%) and see if I can reproduce with new battery, might take till tomorrow

DG12 commented 5 months ago

Here is another app I sucessfuklly used to read nfc: NFC Tap https://www.st.com/en/embedded-software/stsw-st25ios001.html
or go to App Store or google play store I don't like kit as well as NRC tools as it is more tailored to ST25 tags by STMicroelectronics

DG12 commented 5 months ago

To test out NFC Tap I scanned 2 different tags (ECOF and 450A) with SW:Ruuvi FW v3.31.1+default about 20 times each with no "freezing" (no partial reads since I know exactly where to place the iPhone to get a good signal.

mhaberler commented 5 months ago

it happens rarely, maybe one in 20-50 reads

out of 2 tags I managed to get one frozen since yesterday, put it away now for > 6 minutes and it still does not respond

a good test for a frozen tag is to put it onto the ACR122U NFC reader - a live tag will make it beep and turn the reader's led green; frozen tag: no reaction

the branch we were using at the time is a bit behind master, will rebase and see what happens

I also noticed that at least with the RFID2 reader, after a successful I need to move the reader at least 5cm, better like 10cm away to safely "NFC disconnect" (for a lack of a more exact term) the tag

I'll have a look if the MFRC522 driver can detect the "Tag leaves RF field" condition

there is also two specific things I will try in my reading code:

one of the example codes for the MFRC522v2 driver explicitly halts the tag after a successful read with reader.PICC_HaltA() operation

the other thing I'm suspicious about is this line - it might mean the PICC is left in an incommunicado state

in theory a PICC (or the tag for that matter) should be able to recover though from this state eventually

so my plan is:

mhaberler commented 5 months ago

@ojousima as for your question "does the tag still advertise values if NFC-frozen": no, it does not, updates stop

--> correction: I was wrong on that, the tag still advertises values!

edit: the tags battery was at 53% when the freeze happened, the code I use for that is:

uint8_t volt2percent(const float v) {
    // convert voltage and scale for CR2032
    float percent = (v - 2.2f) / 0.65f * 100.0f;
    if (percent < 0.0f) {
        return 0;
    }
    if (percent > 100.0f) {
        return 100;
    }
    return (uint8_t) percent;
}
mhaberler commented 5 months ago

I now have two tags NFC-frozen

mhaberler commented 5 months ago

@DG12 I have not yet encountered a case where I could NFC-freeze a tag with something else other than my own code

however, once the tag enters this frozen state no other utility I tried can get an NFC response

DG12 commented 5 months ago

Have you connected Dev Shield ? Run with j-link & RTT ? Is firmware compiled with RUUVI_NRF5_SDK15_LOG_ENABLED TRUE ?

Does the firmware have calls to ri_log , NRF_LOG...standard

Can you run with a +debug version? DO you have the symbol/address map from the loader?

DG12 commented 5 months ago

Off topic : battery percent: Have you read https://github.com/ruuvi/ruuvitag_fw/wiki/FAQ:-battery

Why did you choose 2.2 as a minimum? .65 ? Do you consider once a battery appears as "low" it must always report low until reset? What kind on environment are the tags in respect to temperature changes? How important is availability of data (i.e. What is impact of tag not transmitting) How easy is its to get a tag and replace the battery?

ojousima commented 5 months ago

--> correction: I was wrong on that, the tag still advertises values!

Thank you for the update. This might actually be enough to not trigger the watchdog timer, in this case the reboots would be caused by battery collapsing under load.

one is running 3.31.0-default not sending updates any more

Does this tag reboot in 6 minutes?

@DG12 I have not yet encountered a case where I could NFC-freeze a tag with something else other than my own code

NFC code or tag firmware?

We used to handle the NFC freezes by rebooting the tag after NFC read in 2.X firmwares, that approach was dropped to keep the history in 3.X. Does your application use logged history data? If not, we could program in a reboot after NFC read to avoid freezes.

DG12 commented 5 months ago

Have you tried to NFC write to a frozen tag?

mhaberler commented 5 months ago

Have you tried to NFC write to a frozen tag?

no, because thats not possible - as I described above, the NFC reader would not even detect the presence of a tag

mhaberler commented 5 months ago

@ojousima after a few hours, the situation is still the same:

both tags do not react to an RFID reader - neither my RFID2 contraption nor the ACR122U reader

the Ruuvi running 3.31.0-default still sends updates the Ruuvi running our custom code with the external TMP 117 does not

I now have the DevShield at hand and will try with a logging build and the debugger

mhaberler commented 5 months ago

I built 3.31.0 with -DDEBUG

Console log:

00> Movement 
00> Movement             

---- here I scanned with the RFID2 reader and my code, tag still NFC-sponsive:

0> app_heartbeat.c:101 WARNING: INVALID_STATE       
00> app_heartbeat.c:101 WARNING: INVALID_STATE
00> <info> app: Setting vector table to bootloader: 0x00075000
00> <info> app: Setting vector table to main app: 0x00026000
00> app_heartbeat.c:101 WARNING: INVALID_STATE
00> app_heartbeat.c:101 WARNING: INVALID_STATE
00> app_heartbeat.c:101 WARNING: INVALID_STATE
00> <info> app: Setting vector table to bootloader: 0x00075000
00> <info> app: Setting vector table to main app: 0x00026000

any compilation flag/defines suggestions?

for debugging: are you using Segger SES?

DG12 commented 5 months ago

I usually use Segger j-link. SES is much better and easier to work with IF you have the source files!

I am surprised to see two sets of messages "Setting Vector..." which occur when NFC receives a read with the tag being FROZEN.

Heartbeat...INVALID_STATE are unusual (for me)

ojousima commented 5 months ago

00> app: Setting vector table to bootloader: 0x00075000 00> app: Setting vector table to main app: 0x00026000

I think this is related to enabling the buttonless DFU service on NFC scan.

0> app_heartbeat.c:101 WARNING: INVALID_STATE

Does this line come up reliably on frozen tags? If yes, can you try changing the line

    // Advertising should always be successful
    RD_ERROR_CHECK (err_code, ~RD_ERROR_FATAL);

to

    // Advertising should always be successful
    RD_ERROR_CHECK (err_code, RD_SUCCESS);

?

This should reboot the tag instantly after the error occurs, history is still lost but it's better than a stuck tag.

mhaberler commented 5 months ago

@ojousima this particular log is from a new tag in a DevShield, separate from the (still NFC-frozen) other two tags

I built an image from 3.31.0 and flashed that then tried a couple of NFC reads I have NOT yet managed to NFC-freeze this dev setup - I will try to and keep a console log

the log you see is just from my - so far successful - NFC-reading attempts

mhaberler commented 5 months ago

should I still try the change you suggested above?

edit: any suggestions on how to debug this setup? I have SES installed

mhaberler commented 5 months ago

100+ tries with a build with CFLAGS+= -DDEBUG - works perfectly every time

20 or so tries with a build without -DDEBUG: NFC-frozen and stuck for > 6min

updates still sent - same symptom as the other two tags

movements are still reported with RTT Viewer so the basic functionality seems alive:

00> Movement 00> Movement 00> LOG: Storing block 00> store_block:creating new record #61441

I hope this is not a timing-dependent bug..

@ojousima do you see a way I can break into that tag post-freeze with a debugger?

mhaberler commented 5 months ago

btw

00> app_heartbeat.c:101 WARNING: INVALID_STATE

does not come up with the non-debug build which got NFC-frozen

this looks to me as if after a NFC interaction considered finished, NFC "listening" mode is not properly reinitialized

I do not exclude this is caused by my code somehow

ojousima commented 5 months ago

btw 00> app_heartbeat.c:101 WARNING: INVALID_STATE does not come up with the non-debug build which got NFC-frozen should I still try the change you suggested above?

The warning about state is then probably about trying to write NFC contents while the NFC field is detected, and after the freeze the new connection does not register. In this case changing the RD_ERROR_CHECK would probably not be helpful, tag would end up resetting after every successful NFC read and possibly not after a failed read. It might make more sense to start a reboot timer instead.

100+ tries with a build with CFLAGS+= -DDEBUG - works perfectly every time 20 or so tries with a build without -DDEBUG: NFC-frozen and stuck for > 6min updates still sent - same symptom as the other two tags movements are still reported with RTT Viewer so the basic functionality seems alive: 00> Movement 00> Movement 00> LOG: Storing block 00> store_block:creating new record #61441 I hope this is not a timing-dependent bug..

The only change between the builds is -DDEBUG flag? most importantly, the debug build is not compiled with `OPT="-Og" like the official test binary?

Can you verify the power consumption of a frozen tag somehow? I'd like to understand if this issue is same as we had before, the NFC driver did not release high-frequency clock which lead to excess power consumption.

I think it would bet possible to get more information by logging the NFC events here: https://github.com/ruuvi/ruuvi.drivers.c/blob/2555fc5c9725747084b1ab7c52b29e4a4b9dedab/src/nrf5_sdk15_platform/communication/ruuvi_nrf5_sdk15_communication_nfc.c#L47

Have you made any changes to NFC interrupt handlers at https://github.com/ruuvi/ruuvi.firmware.c/blob/2778a3f0fcb08f0c5f26ba6ac13b7ad691f3cbeb/src/app_comms.c#L572?

Is this issue repeatable without your NFC hardware?

mhaberler commented 5 months ago

yes, -DDEBUG added in this file for the debug build, zero changes from 2c544d with the build which NFC-freezes: https://github.com/ruuvi/ruuvi.firmware.c/blob/v3.31.0/src/targets/ruuvitag_b/armgcc/Makefile

consequently also no changes in the NFC interrupt handlers

yes, I can check the power consumption, I have that Nordic power measurement board - will do

also will add logging as suggested

yes, the issue is repeatable - I have 3 NFC-frozen tags on my desk

if you have an ESP32 (I use a S3 but really any should do) and shell out 5€ for the RFID2 reader you should be able to reproduce easily

just build https://github.com/mhaberler/MFRC522v2_t4tag.git with Platformio, adapting the ini for your board and possibly the I2C pins

later I'll do a torture-test branch in this repo which tries a read every few seconds and turns off the RF field inbetween, so the test can run automatically

mhaberler commented 5 months ago

here's the torture-test: https://github.com/mhaberler/MFRC522v2_t4tag/tree/torture-test torturing the tag now..

mhaberler commented 5 months ago

seems the torture test does not do its job.. 3600 successful reads in sequence will wiggle the timing and see if this helps to trigger the freeze

ojousima commented 5 months ago

The NFC bug was previously triggered by corrupted data, if the stress test was with a tag being always in good signal conditions it might be that the corrupted data condition did not occur.

When I was testing the fix, I tried to read the tag from as far as possible, that seemed to be the fastest way to trigger the error.

I'd try to lift the tag off the reader as far as possible while reads are going on and see if the tag gets stuck.

I'll be travelling soon, so setting up the hardware on my end might have to wait until the end of February but I'll see what I can do on that part. I'm working normally while on the road.

mhaberler commented 5 months ago

ok, will vary distance and see what happens

edit: did not try that yet, but at close distance: 23.000 reads in a row without hang pretty sure weak field, transmission errors or interference might have a role

need to read up on NFC jammers ;)

mhaberler commented 5 months ago

@ojousima what about the following idea: every N movements reinit the NFC reading code? sensors still work, so that should work, and it would allow reinit without debug build or other intrusive changes tap board a few times, see if it NFC-answers again

ojousima commented 5 months ago

That could work, as long as there is not some kind of resource leak in NFC code itself.

I think that would look like something like adding task to deinit and reinit to the app_sensor:on_accelerometer_isr(), it's important to use scheduler there to not run the NFC tasks in interrupt context.

I think it would look like:

// Timers initialized somewhere
ri_timer_id_t m_reinit_nfc_timer;    
ri_timer_create (&m_reinit_nfc_timer, RI_TIMER_MODE_SINGLE_SHOT,
                                 &reinit_nfc_isr);

// Stop running timer if any and trigger a timer to reinitialize NFC
app_sensor:on_accelerometer_isr(){
    err_code |= ri_timer_stop (m_reinit_nfc_timer);
    RD_ERROR_CHECK (err_code, ~RD_ERROR_FATAL);
    err_code |= ri_timer_start (m_reinit_nfc_timer, APP_NFC_REINIT_TIME, null);
    RD_ERROR_CHECK (err_code, ~RD_ERROR_FATAL);
}

// Actual NFC reinit in main context
ret_type_t reinit_nfc(params) {
  // Container for NFC data
  ri_comm_dis_init_t dis = {0};
  // Initialize NFC data
  err_code |= dis_init (&dis, false);
  // Deinitialize NFC - this might fail if NFC field is on?
  err_code |=  rt_nfc_uninit();
  // Reinitialize NFC
  err_code |= nfc_init (&dis);
  RD_ERROR_CHECK (err_code, ~RD_ERROR_FATAL);
}

// Timers run in interrupt context, schedule actual reinit in main context.
ret_type_t reinit_nfc_isr(params) {
  ri_scheduler_event_put (NULL, 0, &reinit_nfc);
}

The flow would so that movement starts a timer to reset the NFC, but the timer will get reset as long as there is movement. This of course depends on environment and users, if the user holds the tag inn their hand all the time the timer will get reset over and over again.

Another option would be to connect the reinit logic to movement counter, if(!(movement_counter %10)){ schedule_reinit_nfc()}

mhaberler commented 5 months ago

got your suggestion worked in like so, and the reinit logic is triggered by tapping the board:

https://github.com/mhaberler/ruuvi.firmware.c/commit/6aeea3b2cac3dd82c137bff0a7facc601aedb72d

now onto triggering an NFC-freeze..

mhaberler commented 5 months ago

I think I trapped a freeze, and tapping the board made it wake up again as planned!

edit: hm, not quite sure yet

I have to remove the reader at least 5-7 cm to reliably cause a "no RF field" condition - staying too close may cause a false alarm

mhaberler commented 5 months ago

I believe -DDEBUG interferes in the sense it makes the bug appear less frequently

is there a way to write to the RTT log without -DDEBUG? so I just selectively can log the reinit code

mhaberler commented 5 months ago

you mentioned high battery drain when in NFC-freeze

I think I observed that.. I had the two tags in frozen state for a few days I tried to reanimate them by removing the battery both batteries empty!

DG12 commented 5 months ago

Are you interested in learning how to use j-link to do debugging. It can be used without -DDEBUG. I have some notes If you are interested.

ojousima commented 5 months ago

is there a way to write to the RTT log without -DDEBUG? so I just selectively can log the reinit code

These lines disable the debug log without debug defined, https://github.com/ruuvi/ruuvi.firmware.c/blob/2778a3f0fcb08f0c5f26ba6ac13b7ad691f3cbeb/src/application_config/app_config.h#L523

I think you could just define

#define RI_LOG_ENABLED 1
#define APP_LOG_LEVEL RI_LOG_LEVEL_INFO

instead and have the debug logs printed.

you mentioned high battery drain when in NFC-freeze I think I observed that.. I had the two tags in frozen state for a few days I tried to reanimate them by removing the battery both batteries empty!

This points to a problem in the NFC binary, it used to be that a corrupted data left high-frequency clock running. code-level reinitialisation of NFC might not fix the problem, but a full reboot would be required instead. I never found a way to detect the locked high-frequency clock in code itself.

It might be caused by this errata, https://infocenter.nordicsemi.com/index.jsp?topic=%2Ferrata_nRF52832_Rev3%2FERR%2FnRF52832%2FRev3%2Flatest%2Fanomaly_832_141.html&anchor=anomaly_832_141.

It might be worthwhile to try the fix

*(volatile uint32_t *)0x40005FFC = 0; 
*(volatile uint32_t *)0x40005FFC = 1; 

after NFC field disconnected event is detected, or if disconnect event is not detected in a while after a connect event