Closed martinwork closed 3 years ago
I'd say this is an issue with the CODAL because all MicroPython does is call neopixel_send_buffer()
.
@finneyj any idea why calling neopixel_send_buffer()
twice in a row would cause a hang?
That makes sense. There might be delays in my MakeCode script due to the on button pressed handlers. However, the following MakeCode script doesn't hang.
basic.showString("S")
let strip = neopixel.create(DigitalPin.P0, 5, NeoPixelMode.RGB)
strip.setBrightness(20)
strip.setPixelColor(0, neopixel.colors(NeoPixelColors.Red))
strip.show()
strip.clear()
strip.setPixelColor(1, neopixel.colors(NeoPixelColors.Green))
strip.show()
while (true) {
basic.showString("W")
basic.pause(1000)
}
I just did some testing CODAL side, and yes, could repro this bug in C++.
Slipped through my tests as it's caused by a really small glitch when you reconnect the same pin twice to the NRF52PWM... This gets masked if you have a small delay between transmissions....
I've fixed this and a few other neopixel corner cases in the master branch of the codal-nrf52 repo in the rather poetic v0.2.22 tag of codal-microbit-v2
@finneyj I updated to v0.2.22 of codal-microbit-v2 but it still seems to lock up, running that original code in MicroPython.
Hmmm.... reading this again, I think this is a different issue than the one I fixed yesterday. I read "hang" as the neopixel strip becoming unresponsive, but it sounds more like the CPU that's locking up in this issue?
I'm afraid I didn't manage to repro any CPU lock ups at all... I was running back to back transmissions of a similar scale to the case here and never experienced any CPU lock ups, just a few misbehaving LEDs...
Here's the code I was using for testing in case it helps:
int
main()
{
uBit.init();
int numberOfLeds = 5;
ManagedBuffer data(numberOfLeds*3);
int r = 0;
int g = 0;
int b = 255;
int delta = 1;
while(1)
{
r = r + delta;
b = b - delta;
if (r > 255)
{
r = 255;
b = 0;
delta = -delta;
}
if (b > 255)
{
b = 255;
r = 0;
delta = -delta;
}
for (int i = 0; i< data.length(); i+=3)
{
data[i] = g;
data[i+1] = r;
data[i+2] = b;
}
neopixel_send_buffer(uBit.io.P0, data);
}
}
The program appears to hang because it doesn't get to display "W" in the while loop.
With the variation below, only the top left corner pixel gets lit. If I include the show("A") all 3 pixels are lit.
#without the sleeps this hangs on V2
#no need to connect neopixels
from microbit import *
import neopixel
display.show("S")
np = neopixel.NeoPixel(pin0, 5)
np[0] = (32, 0, 0)
np.show()
#display.show("A")
display.set_pixel(0,0,9)
#sleep(100)
np.clear()
#sleep(100)
display.set_pixel(4,0,9)
np[1] = (0, 32, 0)
display.set_pixel(4,4,9)
np.show()
while True:
display.show("W")
sleep(1000)
Right, so with the script as you show it only runs the first display.set_pixel(0,0,9)
.
However, if we uncomment display.show("A")
above that set_pixel
, then it runs until the np.show()
line?
Does it actually update the Neopixels?
It shows red briefly, clears it, and hangs in np.show(). If the sleep after np.clear() is uncommented, it goes through. It shows red very briefly, then green and W.
Could you find the smallest way to trigger this and then try to replicate in an 1-to-1 translation to CODAL C++ code? Not sure how this is interacting with the NeoPixel PWM, but most of the code here is MicroPython calling the uBit methods.
Two programs below.
After realising that np.clear() calls write(), I tried these two variations: np[0] = (32, 0, 0) np.show() np.show() np[0] = (32, 0, 0) np.show() np.clear() Both leave the red neopixel on. The first one doesn't hang.
from microbit import *
import neopixel
np = neopixel.NeoPixel(pin0, 5)
np.show()
np.clear()
display.set_pixel(0,0,9)
while True:
sleep(1000)
#include "MicroBit.h"
#include "neopixel.h"
MicroBit uBit;
int main()
{
uBit.init();
ManagedBuffer data(5*3);
neopixel_send_buffer(uBit.io.P0, data);
for (int i = 0; i < data.length(); i++)
data[i] = 0;
neopixel_send_buffer(uBit.io.P0, data);
uBit.display.image.setPixelValue( 0, 0, 255);
while(1)
{
uBit.sleep(1000);
}
}
Both leave the red neopixel on. The first one doesn't hang.
So the MicroPython script doesn't hang, but the CODAL C++ programme hangs?
Or do you mean np[0] = (32, 0, 0) np.show() np.show()
doesn't hang, but np[0] = (32, 0, 0) np.show() np.clear()
hangs?
In what case does the C++ programme hang?
Both of the two micropython variations leave the red neopixel on. show/show doesn't hang. show/clear hangs. So my minimal sample does show/clear. In V2 micropython, it looks like clear() calls write(), so the C++ sample does a second send.
micropython hangs, C++ doesn't. Note also that a MakeCode version of the original micropython sample did not hang.
Someone else should try both samples to be sure I'm not doing something really silly :)
the two micropython variations leave the red neopixel on ... micropython hangs, C++ doesn't.
So the C++ doesn't leave the red LED and doesn't hang in any configuration, right?
@dpgeorge we've been unable to replicate this in CODAL, could you have a quick look to determine if the issue is in MicroPython?
Neither of the minimal samples set the contents of the buffer. With the C++ sample, I don't see anything on the neopixels, and it shows the pixel on LEDs, indicating that it hasn't hung inside send. With the python sample, I don't see anything on the neopixels or the LEDs, indicating that it has hung before reaching the set_pixel() call.
The point of mentioning the python variations I tried that started by setting a neopixel to red, was to flag up that show/show doesn't hang, but show/clear does.
I can reproduce the issue with C++ using the following code:
#include "MicroBit.h"
#include "neopixel.h"
MicroBit uBit;
int main()
{
uBit.init();
ManagedBuffer data(5*3);
neopixel_send_buffer(uBit.io.P0, data);
// delay for 300 microseconds
uint32_t start = system_timer_current_time_us();
while (system_timer_current_time_us() - start < 300) {
}
neopixel_send_buffer(uBit.io.P0, data);
while(1)
uBit.display.scroll("HELLO WORLD!");
}
A delay between about 200 and 425 microseconds between the two calls to neopixel_send_buffer()
will hang the second call.
Thanks for clearing up the mystery @dpgeorge . This hangs for me too.
@finneyj Can you help?
Excellent work - thanks for the repro @martinwork @dpgeorge. That's really helpful.
I guess this is related to the timings of when a DMA buffer gets shifted out. Not sure how this could cause a hang, but yes, I'll happily look into it.
Hi @martinwork @dpgeorge
I think I've found the cause of this. The PWM output is double buffered for smooth playback, so there's a one buffer long lag between scheduling the PWM to stop when an underflow occurs (e.g. the end of a neopixel databurst) and the PWM actually shutting down. This was causing a short race condition, where if new data became available in that window (a new data burst), it failed to fill the double buffers back up again properly and stalled.
It actually wasn't hanging the device - it was happily spinning around the idle task in the scheduler...
Anyhow, fixed in this patch: https://github.com/lancaster-university/codal-nrf52/commit/72191e078fd405401fcb6e5f04b3ec964ebef732
I also caught myself out a bit by using a codal::Lock primitive in IRQ context (which I never designed it for), so did also manage to observe a very rare condition with a delay of 500us too, the symptom of which was the main fiber becoming blocked forever in the neopixel write function. I've also patched that to make codal::Lock IRQ safe, but this is a deeper change, so I want to do more testing before pushing that patch to codal-core...
Do you folks need a tag to be able to test, or can you pull through from master?
@finneyj Seems fixed for reasonable cases. This triggers the 500us case, I guess?
#include "MicroBit.h"
#include "neopixel.h"
MicroBit uBit;
int main()
{
uBit.init();
ManagedBuffer data(5*3);
neopixel_send_buffer(uBit.io.P0, data);
for ( uint32_t delay = 500; delay < 502; delay++)
{
// delay for delay microseconds
uint32_t start = system_timer_current_time_us();
while (system_timer_current_time_us() - start < delay) {
}
neopixel_send_buffer(uBit.io.P0, data);
}
while(1)
uBit.display.scroll("HELLO WORLD!");
}
Yes, that seems to be the sweet spot for a race-condition in this test.
I just pushed an update to FiberLock in master of codal-core that seems to address this. Would be good if you can test?
oops - forgot to push a mutex a moment ago. Make sure you have latest HEAD of codal-core for your tests @martinwork.
FYI it now seems to perform ok in this stress test.
void
neopixel_hang_test2()
{
ManagedBuffer data(5*3);
neopixel_send_buffer(uBit.io.P0, data);
for (int i=0; i<1000; i++)
{
for ( uint32_t delay = 500; delay < 502; delay++)
{
// delay for delay microseconds
uint32_t start = system_timer_current_time_us();
while (system_timer_current_time_us() - start < delay) {
}
neopixel_send_buffer(uBit.io.P0, data);
}
}
while(1)
uBit.display.scroll("HELLO WORLD!");
}
Thanks @finneyj and @dpgeorge, that works for me too and so does...
int main()
{
uBit.init();
ManagedBuffer data(5*3);
neopixel_send_buffer(uBit.io.P0, data);
for ( uint32_t delay = 0; delay < 1000; delay++)
{
// delay for delay microseconds
uint32_t start = system_timer_current_time_us();
while (system_timer_current_time_us() - start < delay) {
}
neopixel_send_buffer(uBit.io.P0, data);
}
while(1)
uBit.display.scroll("HELLO WORLD!");
}
I've updated to use codal-microbit-v2 to v0.2.23 in commit faf83c16f36c0ad73b17670e5c9a9433ea1575fc and can confirm that the issue is fixed for me.
The following code seems OK on V1, but hangs V2. Inserting sleeps makes it work