microbit-foundation / micropython-microbit-v2

Temporary home for MicroPython for micro:bit v2 as we stablise it before pushing upstream
MIT License
43 stars 24 forks source link

Calling neopixels show() clear() show() too quickly hangs V2 #27

Closed martinwork closed 3 years ago

martinwork commented 3 years ago

The following code seems OK on V1, but hangs V2. Inserting sleeps makes it work

#without the sleeps this hangs on V2
#no need to connect neopixels

from microbit import *
import neopixel

display.show("S")

np = neopixel.NeoPixel(pin0, 5)
np[0] = (32, 0, 0)
np.show()
#sleep(100)
np.clear()
#sleep(100)
np[1] = (0, 32, 0)
np.show()

while True:
    display.show("W")
    sleep(1000)
dpgeorge commented 3 years ago

I'd say this is an issue with the CODAL because all MicroPython does is call neopixel_send_buffer().

@finneyj any idea why calling neopixel_send_buffer() twice in a row would cause a hang?

martinwork commented 3 years ago

That makes sense. There might be delays in my MakeCode script due to the on button pressed handlers. However, the following MakeCode script doesn't hang.

basic.showString("S")
let strip = neopixel.create(DigitalPin.P0, 5, NeoPixelMode.RGB)
strip.setBrightness(20)
strip.setPixelColor(0, neopixel.colors(NeoPixelColors.Red))
strip.show()
strip.clear()
strip.setPixelColor(1, neopixel.colors(NeoPixelColors.Green))
strip.show()
while (true) {
    basic.showString("W")
    basic.pause(1000)
}
finneyj commented 3 years ago

I just did some testing CODAL side, and yes, could repro this bug in C++.

Slipped through my tests as it's caused by a really small glitch when you reconnect the same pin twice to the NRF52PWM... This gets masked if you have a small delay between transmissions....

I've fixed this and a few other neopixel corner cases in the master branch of the codal-nrf52 repo in the rather poetic v0.2.22 tag of codal-microbit-v2

dpgeorge commented 3 years ago

@finneyj I updated to v0.2.22 of codal-microbit-v2 but it still seems to lock up, running that original code in MicroPython.

finneyj commented 3 years ago

Hmmm.... reading this again, I think this is a different issue than the one I fixed yesterday. I read "hang" as the neopixel strip becoming unresponsive, but it sounds more like the CPU that's locking up in this issue?

I'm afraid I didn't manage to repro any CPU lock ups at all... I was running back to back transmissions of a similar scale to the case here and never experienced any CPU lock ups, just a few misbehaving LEDs...

Here's the code I was using for testing in case it helps:

int 
main()
{
    uBit.init();

    int numberOfLeds = 5;
    ManagedBuffer data(numberOfLeds*3);

    int r = 0;
    int g = 0;
    int b = 255;

    int delta = 1;

    while(1)
    {
        r = r + delta;
        b = b - delta;

        if (r > 255)
        {
            r = 255;
            b = 0;
            delta = -delta;
        }

        if (b > 255)
        {
            b = 255;
            r = 0;
            delta = -delta;
        }

        for (int i = 0; i< data.length(); i+=3)
        {
            data[i] = g;
            data[i+1] = r;
            data[i+2] = b;
        }

        neopixel_send_buffer(uBit.io.P0, data);
    }
}
martinwork commented 3 years ago

The program appears to hang because it doesn't get to display "W" in the while loop.

With the variation below, only the top left corner pixel gets lit. If I include the show("A") all 3 pixels are lit.

#without the sleeps this hangs on V2
#no need to connect neopixels

from microbit import *
import neopixel

display.show("S")

np = neopixel.NeoPixel(pin0, 5)
np[0] = (32, 0, 0)
np.show()
#display.show("A")
display.set_pixel(0,0,9)
#sleep(100)
np.clear()
#sleep(100)
display.set_pixel(4,0,9)
np[1] = (0, 32, 0)
display.set_pixel(4,4,9)
np.show()

while True:
    display.show("W")
    sleep(1000)
microbit-carlos commented 3 years ago

Right, so with the script as you show it only runs the first display.set_pixel(0,0,9). However, if we uncomment display.show("A") above that set_pixel, then it runs until the np.show() line? Does it actually update the Neopixels?

martinwork commented 3 years ago

It shows red briefly, clears it, and hangs in np.show(). If the sleep after np.clear() is uncommented, it goes through. It shows red very briefly, then green and W.

microbit-carlos commented 3 years ago

Could you find the smallest way to trigger this and then try to replicate in an 1-to-1 translation to CODAL C++ code? Not sure how this is interacting with the NeoPixel PWM, but most of the code here is MicroPython calling the uBit methods.

martinwork commented 3 years ago

Two programs below.

After realising that np.clear() calls write(), I tried these two variations: np[0] = (32, 0, 0) np.show() np.show() np[0] = (32, 0, 0) np.show() np.clear() Both leave the red neopixel on. The first one doesn't hang.

from microbit import *
import neopixel

np = neopixel.NeoPixel(pin0, 5)
np.show()
np.clear()
display.set_pixel(0,0,9)
while True:
    sleep(1000)
#include "MicroBit.h"
#include "neopixel.h"

MicroBit uBit;

int main()
{
    uBit.init();
    ManagedBuffer data(5*3);
    neopixel_send_buffer(uBit.io.P0, data);
    for (int i = 0; i < data.length(); i++)
        data[i] = 0;
    neopixel_send_buffer(uBit.io.P0, data);
    uBit.display.image.setPixelValue( 0, 0, 255);
    while(1)
    {
      uBit.sleep(1000);
    }
}
microbit-carlos commented 3 years ago

Both leave the red neopixel on. The first one doesn't hang.

So the MicroPython script doesn't hang, but the CODAL C++ programme hangs?

Or do you mean np[0] = (32, 0, 0) np.show() np.show() doesn't hang, but np[0] = (32, 0, 0) np.show() np.clear() hangs? In what case does the C++ programme hang?

martinwork commented 3 years ago

Both of the two micropython variations leave the red neopixel on. show/show doesn't hang. show/clear hangs. So my minimal sample does show/clear. In V2 micropython, it looks like clear() calls write(), so the C++ sample does a second send.

micropython hangs, C++ doesn't. Note also that a MakeCode version of the original micropython sample did not hang.

Someone else should try both samples to be sure I'm not doing something really silly :)

microbit-carlos commented 3 years ago

the two micropython variations leave the red neopixel on ... micropython hangs, C++ doesn't.

So the C++ doesn't leave the red LED and doesn't hang in any configuration, right?

@dpgeorge we've been unable to replicate this in CODAL, could you have a quick look to determine if the issue is in MicroPython?

martinwork commented 3 years ago

Neither of the minimal samples set the contents of the buffer. With the C++ sample, I don't see anything on the neopixels, and it shows the pixel on LEDs, indicating that it hasn't hung inside send. With the python sample, I don't see anything on the neopixels or the LEDs, indicating that it has hung before reaching the set_pixel() call.

The point of mentioning the python variations I tried that started by setting a neopixel to red, was to flag up that show/show doesn't hang, but show/clear does.

dpgeorge commented 3 years ago

I can reproduce the issue with C++ using the following code:

#include "MicroBit.h"
#include "neopixel.h"

MicroBit uBit;

int main()
{
    uBit.init();
    ManagedBuffer data(5*3);
    neopixel_send_buffer(uBit.io.P0, data);

    // delay for 300 microseconds
    uint32_t start = system_timer_current_time_us();
    while (system_timer_current_time_us() - start < 300) {
    } 

    neopixel_send_buffer(uBit.io.P0, data);

    while(1)
        uBit.display.scroll("HELLO WORLD!");
}

A delay between about 200 and 425 microseconds between the two calls to neopixel_send_buffer() will hang the second call.

martinwork commented 3 years ago

Thanks for clearing up the mystery @dpgeorge . This hangs for me too.

@finneyj Can you help?

finneyj commented 3 years ago

Excellent work - thanks for the repro @martinwork @dpgeorge. That's really helpful.

I guess this is related to the timings of when a DMA buffer gets shifted out. Not sure how this could cause a hang, but yes, I'll happily look into it.

finneyj commented 3 years ago

Hi @martinwork @dpgeorge

I think I've found the cause of this. The PWM output is double buffered for smooth playback, so there's a one buffer long lag between scheduling the PWM to stop when an underflow occurs (e.g. the end of a neopixel databurst) and the PWM actually shutting down. This was causing a short race condition, where if new data became available in that window (a new data burst), it failed to fill the double buffers back up again properly and stalled.

It actually wasn't hanging the device - it was happily spinning around the idle task in the scheduler...

Anyhow, fixed in this patch: https://github.com/lancaster-university/codal-nrf52/commit/72191e078fd405401fcb6e5f04b3ec964ebef732

I also caught myself out a bit by using a codal::Lock primitive in IRQ context (which I never designed it for), so did also manage to observe a very rare condition with a delay of 500us too, the symptom of which was the main fiber becoming blocked forever in the neopixel write function. I've also patched that to make codal::Lock IRQ safe, but this is a deeper change, so I want to do more testing before pushing that patch to codal-core...

Do you folks need a tag to be able to test, or can you pull through from master?

martinwork commented 3 years ago

@finneyj Seems fixed for reasonable cases. This triggers the 500us case, I guess?

#include "MicroBit.h"
#include "neopixel.h"

MicroBit uBit;

int main()
{
    uBit.init();
    ManagedBuffer data(5*3);
    neopixel_send_buffer(uBit.io.P0, data);

    for ( uint32_t delay = 500; delay < 502; delay++)
    {
        // delay for delay microseconds
        uint32_t start = system_timer_current_time_us();
        while (system_timer_current_time_us() - start < delay) {
        } 

        neopixel_send_buffer(uBit.io.P0, data);
    }

    while(1)
        uBit.display.scroll("HELLO WORLD!");
}
finneyj commented 3 years ago

Yes, that seems to be the sweet spot for a race-condition in this test.

I just pushed an update to FiberLock in master of codal-core that seems to address this. Would be good if you can test?

finneyj commented 3 years ago

oops - forgot to push a mutex a moment ago. Make sure you have latest HEAD of codal-core for your tests @martinwork.

finneyj commented 3 years ago

FYI it now seems to perform ok in this stress test.

void
neopixel_hang_test2()
{
    ManagedBuffer data(5*3);
    neopixel_send_buffer(uBit.io.P0, data);

    for (int i=0; i<1000; i++)
    {
        for ( uint32_t delay = 500; delay < 502; delay++)
        {
            // delay for delay microseconds
            uint32_t start = system_timer_current_time_us();
            while (system_timer_current_time_us() - start < delay) {
            } 

            neopixel_send_buffer(uBit.io.P0, data);
        }
    }

    while(1)
        uBit.display.scroll("HELLO WORLD!");
}
martinwork commented 3 years ago

Thanks @finneyj and @dpgeorge, that works for me too and so does...

int main()
{
    uBit.init();
    ManagedBuffer data(5*3);
    neopixel_send_buffer(uBit.io.P0, data);

    for ( uint32_t delay = 0; delay < 1000; delay++)
    {
        // delay for delay microseconds
        uint32_t start = system_timer_current_time_us();
        while (system_timer_current_time_us() - start < delay) {
        } 

        neopixel_send_buffer(uBit.io.P0, data);
    }

    while(1)
        uBit.display.scroll("HELLO WORLD!");
}
dpgeorge commented 3 years ago

I've updated to use codal-microbit-v2 to v0.2.23 in commit faf83c16f36c0ad73b17670e5c9a9433ea1575fc and can confirm that the issue is fixed for me.