pycom / pycom-micropython-sigfox

A fork of MicroPython with the ESP32 port customized to run on Pycom's IoT multi-network modules.
MIT License
196 stars 167 forks source link

mperror.c: Move noticing of the next heartbeat transition #525

Closed robert-hh closed 3 years ago

robert-hh commented 3 years ago

The place at which the time for the next transition is noticed moves from the start of the respective block to it's end, when the RGB led has switched. The effect on heartbeat timing is minor. Without load, the heartbeat 'on' and 'off' times are identical for the old and new version. Timing with load new version. The first pulse at -200ms is the 'on' command for the RGB, the scattered second pulses are the 'off' commands accumulated over 12 hours.

heartbeat_timing_new_w_load

Timing with load old version. heartbeat_timing_orig_w_load

peter-pycom commented 3 years ago

some context:

https://forum.pycom.io/topic/6734/external-flash-lose-files/87?_=1615194014360

b) During that wear levelling test, I had core dumps at about every 50_000 writes. They were related to the heartbeat flash. I made a PR which changed that heartbeat timing a little bit. After that, the crashes disappeared, at least in that test. I am not 100% confident that the change cured the initial problem, which could also be a Core-0/Core-1 collision. But the change is also not intrusive and saves a few clock cycles. PR here: https://github.com/pycom/pycom-micropython-sigfox/pull/525

geza-pycom commented 3 years ago

Hello! Could you please detail why this fix is needed? What issue does this fix? Thanks ! Is it related to https://github.com/pycom/pycom-micropython-sigfox/issues/518 ?

robert-hh commented 3 years ago

Is it related to #518 ?

Yes. Analyzing the backtraces of this fails, it looked like a race condition which involved the code doing the heartbeat and the code, closing a file. The previous code reported a new time before the actual color change was done. The change moves registering the new time after the color has changed. Before the change, the code would crash about every few hours in average, After that, the code ran fine for a week, after which is was stopped manually. It was the wear leveling test. The device with this change code did 4 Million cycles. another one 10 Million file create/write/closes. And, as a slight performance improvement, it performs the assignment and subtraction only when needed. I am definitely not sure about the mechanism. So the change may only hide the real reason.

P.S.: I had another crash case caused by the heartbeat led, which disappeared after I disabled it. But I did not look into it yet.

geza-pycom commented 3 years ago

Thanks for the contribution, this change will be part of an upcoming release.