pybricks / support

Pybricks support and general discussion
MIT License
105 stars 6 forks source link

[Bug] Hub can get stuck while broadcasting #1419

Open JJHackimoto opened 5 months ago

JJHackimoto commented 5 months ago

Describe the bug Block Coding: When using Broadcast in two or more tasks running simultaneously, an error will be thrown and the program will terminate. Error below:

"OSError: This resource cannot be used in two tasks at once."

Expected behavior I expected the block coding to give me an error or prevent me from building the program this way before running the program. This is especially true since block coders may not be that experienced in coding, and may not understand the error thrown in the console. The error won't be seen if running the program off the hub without a bluetooth connection either.

I could also expect this to just work since I personally can't see the reason for this not working. It currently feels unintuitive that the only way to use Broadcasting is to have a loop that constantly sends out the chosen values instead of just sending an update when the values have actually changed (for example when just sending out Booleans once in a while).

laurensvalk commented 5 months ago

Thanks for raising this. This is the expected behavior (as in, not a bug), but I can see this being not ideal. We could perhaps improve this with documentation and a clear example.

I could also expect this to just work since I personally can't see the reason for this not working.

A fair analogy is screaming two different things from the same rooftop at the same time :smile: So even if we made the error go away, this probably isn't going to work as you'd like. The receiver wouldn't be guaranteed to receive both.

To send two values, it's better to send them together in a list. So if your code wants to send variables A and B from two different tasks, you could make a another task that just broadcasts a new list of A & B whenever either of them changes.

Generally in Pybricks, we try to raise the error when there is one, instead of not telling you and leaving you confused why something isn't working.

I expected the block coding to give me an error or prevent me from building the program this way before running the program.

We couldn't know in advance how the program might run. It's perfectly fine if two tasks both use broadcasting, just not at the same time.

JJHackimoto commented 5 months ago

I see, thanks for the clarification. Now, I'm not sure how this could work, but wouldn't it be possible to make broadcasting always happen asynchronously when using block coding? This way, it would never occur at the same time no matter where in the program the call is used.

Feel free to close this issue if you decide that no changes are needed for this :)

laurensvalk commented 5 months ago

That way, you'd still 'drown' one message in the other, so overall responsiveness is likely not as good compared to intentionally combining the messages as needed depending on your application.

laurensvalk commented 5 months ago

Let's keep the issue open since it's definitely a good question. We'll want to document this clearly and explain why =)

JJHackimoto commented 5 months ago

In addition to this, is it true that broadcasting and unpacking cannot be done at the same time as well? I've had a few issues with hubs freezing where the blue light continue to fade as normal but hub is stuck and only thing to do is to power it down with a long press. Short press won't work. After moving all Bluetooth communication blocks to a single task that runs repeatedly, the issue seem to have gone away.

laurensvalk commented 5 months ago

That should be allowed. If you find a reproducible small program we can test, that would be very useful!

laurensvalk commented 5 months ago

I did find https://github.com/pybricks/support/issues/1454. Maybe you were seeing this too?

JJHackimoto commented 5 months ago

That one is interesting. Did your hub continue to fade blue during and after the "crash"?

I've yet to have this happen while connected to a computer, and it's really inconsistent in when it happens, sometimes after a minute, and sometimes after 10. Other times it doesn't happen at all during the time I'm testing with my program. I'm still using three hubs communicating with each other and this far, two of them have randomly "crashed", one more often than the other. The third hub has been fine all the time. The difference between how they handle communication is that the good hub has been doing it in a loop in a separate task. The other two have only been unpacking in a loop in a separate task, while broadcasting from the main program whenever needed. I've since moved it all to that separate loop and it seems to have been fine since then on all three hubs.

I'll see if I can reproduce it with a small program.

laurensvalk commented 5 months ago

That one is interesting. Did your hub continue to fade blue during and after the "crash"?

No, so maybe you're seeing something different.

I'll see if I can reproduce it with a small program.

Thank you!

JJHackimoto commented 5 months ago

I wasn't able to reproduce it with a small program sadly. However, I now know that it's actually not due to broadcasting and unpacking at the same time since one of my hubs did this again today, even though the program has all broadcasting and unpacking in a single task. I now have no clue what can be causing this.

JJHackimoto commented 5 months ago

Just to add to this, same happened today, but long pressing the power button made the light on the hub flash rapidly without stopping. The hub stopped responding to long button presses altogether and the only way to revive the hub was to pull the batteries.

laurensvalk commented 5 months ago

Which firmware are you using?

from pybricks import version

print(version)

The beta firmware from https://beta.pybricks.com/ should already fix some of this, so it would be good to know which version you used.

JJHackimoto commented 5 months ago

I'm running: ('technichub', '3.4.0b2', 'v1.20.0-23-g6c633a8dd on 2024-02-14')

One of my hubs are probably on an older firmware. I read something about bad data in the thread you linked. I've been holding off updating the firmware since it fails 95% of the time ("The hub took too long to respond. Restart the hub and try again."), taking up to 30 minutes until I can get it running. I'll update the third hub now and we will see if the issue comes back. Thanks for letting me know there's fixes for this in the update :)

JJHackimoto commented 5 months ago

All hubs are now updated but the issue still occurred. This time I also had to pull the batteries to get the hub turned off.

Here's the program I'm running on the most problematic hub (Code is generated through Block Coding).


from pybricks.hubs import TechnicHub
from pybricks.parameters import Axis, Color, Direction, Port, Stop
from pybricks.pupdevices import ColorDistanceSensor, Motor
from pybricks.tools import multitask, run_task, wait

Color.WHITE = Color(0, 0, 100)
Color.BLACK = Color(0, 0, 0)

SensorHub = TechnicHub(top_side=Axis.Z, front_side=Axis.X, broadcast_channel=3, observe_channels=[1, 2])
DirectTrigger = ColorDistanceSensor(Port.A)
DirectTrigger.detectable_colors((Color.RED, Color.NONE))
LateTrigger = ColorDistanceSensor(Port.C)
LateTrigger.detectable_colors((Color.WHITE, Color.BLACK))
Tipping = Motor(Port.B, Direction.COUNTERCLOCKWISE)

DistributorTipp = False
TableReadyForTipp = False
Triggered = False
Tipped = False
TriggeredDistributor = False

async def main1():
    global Triggered
    while True:
        await wait(0)
        while not (await DirectTrigger.color() == Color.RED or await LateTrigger.color() == Color.WHITE):
            await wait(1)
        Triggered = True
        await wait(1000)
        Triggered = False

async def main2():
    global DistributorTipp, TriggeredDistributor, TableReadyForTipp
    while True:
        await wait(0)
        DistributorTipp, = SensorHub.ble.observe(2) or [0] * 1
        TriggeredDistributor, TableReadyForTipp = SensorHub.ble.observe(1) or [0] * 2
        await SensorHub.ble.broadcast([Tipped, Triggered])
        await wait(200)

async def main3():
    global Tipped
    while True:
        await wait(0)
        if DistributorTipp == True and TableReadyForTipp == True:
            await Tipping.run_angle(100, 100, Stop.BRAKE)
            await wait(1000)
            await Tipping.run_angle(100, -100, Stop.BRAKE)
            Tipped = True
            while not (DistributorTipp == False and TableReadyForTipp == False):
                await wait(1)
            Tipped = False
        else:
            pass
        await wait(500)

async def main():
    await multitask(main1(), main2(), main3())

run_task(main())
laurensvalk commented 4 months ago

The documentation issue here has been addressed via https://github.com/pybricks/pybricks-api/commit/b2b183ef8f2e027f61525b77146c03c12b24d3fa.

What remains here then is the issue of the hub getting stuck. I haven't been able to reproduce this yet.

JJHackimoto commented 4 months ago

Great!

I attended a lego-event this weekend and had my machine running for a day straight. The stuck hub issue happened about once every 30 minutes on one of the three hubs at random. Sometimes having to pull the batteries and sometimes not. I still don't know what could be causing this.

JJHackimoto commented 4 months ago

Just to update here. Using v3.5.0b1 (Pybricks Beta v2.5.0-beta.2) with the latest firmware still causes these crashes. There is a difference though, when it happens, I always have to pull the batteries in comparison to before where that was the odd case.

laurensvalk commented 4 months ago

Is it still this program that causes it for you?

I'd like to make some time to properly investigate this one. As a first step, I'd like to try to reproduce it.

Do you think we can make something with a hub and just a few motors, without replicating your whole build?

Are you only transmitting boolean values? What does your other program look like? Or can the crash be reproduced by just running this one?

Thanks!

BertLindeman commented 4 months ago

Is it still this program that causes it for you?

I'd like to make some time to properly investigate this one. As a first step, I'd like to try to reproduce it.

Do you think we can make something with a hub and just a few motors, without replicating your whole build?

Are you only transmitting boolean values? What does your other program look like? Or can the crash be reproduced by just running this one?

Tried the program you referred to on a Technic hub with one ColorDistanceSensor (I have only one ColorDistanceSensor) and a ColorSensor.

The program runs over two hours without a problem. Firmware: v3.5.0b1

Two hubs run a transmitter:

one on a primehub:


from pybricks.hubs import PrimeHub
from pybricks.tools import wait
from urandom import choice

transmitter = PrimeHub(broadcast_channel=2, observe_channels=[1, 3])

while True:
    transmitter.ble.broadcast([choice([True, False])])
    TriggeredDistributor, TableReadyForTipp = transmitter.ble.observe(3) or [0] * 2
    wait(100)

And one on a Technichub:


from pybricks.hubs import TechnicHub
from pybricks.tools import wait
from urandom import choice

transmitter = TechnicHub(broadcast_channel=1, observe_channels=[2, 3])

while True:
    transmitter.ble.broadcast([choice([True, False]), choice([True, False])])
    TriggeredDistributor, TableReadyForTipp = transmitter.ble.observe(3) or [0] * 2
    wait(100)

But as stated, no problem seen yet. Maybe @JJHackimoto has other experience or test program.

Bert

BertLindeman commented 4 months ago

Maybe a strange hit here..... The transmitting techhub with the small test program above, running disconnected from the PC, started to complain about low-battery. So long pressed the button, to get a rapid flashing hub. And need to take the batteries out. (😁 needed that anyway to re-load them)

Would battery-low have caused this?

[EDIT] The situation occurred after over four and a half hour.

New batteries IN and press button, normal blinking, OK. Press again and I get NOT a running program. Could it be that the loaded program is erased in this situation? Needed to re-load the small test program. And now it runs nicely again.

laurensvalk commented 4 months ago

@Bert - thanks for testing!

So long pressed the button, to get a rapid flashing hub. And need to take the batteries out. (😁 needed that anyway to re-load them)

I thought this was fixed but apparently not. Can you add your findings to https://github.com/pybricks/support/issues/1497 ?

The situation occurred after over four and a half hour.

Which situation? The stuck program? With your smaller test program? If yes, that's great news - will help debugging quite a lot. :slightly_smiling_face:

Could it be that the loaded program is erased in this situation?

Programs are saved during normal shutdown. So if you pull the batteries after loading a new program, it won't be saved.

BertLindeman commented 4 months ago

The situation occurred after over four and a half hour.

Which situation? The stuck program? With your smaller test program? If yes, that's great news - will help debugging quite a lot. πŸ™‚

The test for 'this' item ran for 4,5 hours and so also the crashing small program. The small program seemed to run "normally" and started flashing orange: battery-low. Pressed the button to stop it. Than fast-flashing.

Could it be that the loaded program is erased in this situation?

Programs are saved during normal shutdown. So if you pull the batteries after loading a new program, it won't be saved.

Ah, I should have thought about that. I probably disconnected the TechnicHub and did not stop the program, so it was not saved.

Will add the findings to #1497

JJHackimoto commented 4 months ago

Hi and thanks Bert for testing! I'll do my best to help out on this since it's a huge issue on my side. It happens on any of my three hubs running together, and there seems to be no reason as to when or which hub crashes. One of the hubs runs off batteries, the other two are plugged in permanently using a battery eliminator made for these hubs from PV-Productions. How do you know when it complains on low-battery? Can you see this when the hub runs disconnected from any computer? I always run my hubs that way since having three of them connected wouldn't make much sense in my case.

I am mostly broadcasting boolean values, but I have started broadcasting single integers on a few occasions. Nothing massive :)

I have changed the programs a bit since I last posted them here. Mostly to reduce the amount of broadcasting being done. I'm happy to take any feedback you might have if you see something obvious that I'm doing wrong. Here's the three programs I'm currently using.

pybricks-backup.zip

The essential stuff for testing this would probably to have at least two technic hubs, one with a color/distance sensor reading for either color or distance to approach a certain value, which would in turn change a variable and that variable is broadcasted every 200ms or similar. I'd love to have a block that could broadcast "OnChange" of a variable, but this is how I currently do it. The other technic hub can then receive that, maybe wait for 10 seconds and then broadcast another variable. I believe that would be the essentials.

Let me know if I can do something more to help :)

laurensvalk commented 4 months ago

I am mostly broadcasting boolean values, but I have started broadcasting single integers on a few occasions. Nothing massive :)

The Technic Hub has some issues when changing between large and small values. See https://github.com/pybricks/support/issues/1454.

To rule this out, keep your values between -127 and 127. Booleans are fine too. And try to always send the same kind of list. For example always (small number, bool, small number).

Mostly to reduce the amount of broadcasting being done.

Since recent versions, you can now also broadcast None to disable broadcasting.

Let me know if I can do something more to help :)

Thank you. I'll try to look at your programs later. The smaller we can make them, the better :)

BertLindeman commented 4 months ago

How do you know when it complains on low-battery

The hub led will blink orange with the original color between the blinks.

JJHackimoto commented 4 months ago

To rule this out, keep your values between -127 and 127. Booleans are fine too. And try to always send the same kind of list. For example always (small number, bool, small number).

I see. Yeah if I use integers, it's only for sending either a "1", "2" or "3". So no more than a single integer at a time. :)

Since recent versions, you can now also broadcast None to disable broadcasting.

Alright, I'm not sure where this could be useful for me, but it's good to know! Actually, that makes me wonder, if I broadcast a value once, how long will other hubs be able to pick it up for?

Thank you. I'll try to look at your programs later. The smaller we can make them, the better :)

Yeah they are really big and complicated. The new comment blocks will help massively once I start adding them :)

The hub led will blink orange with the original color between the blinks.

Oh alright. I've never seen that happen yet so I probably don't have that issue at least :)

laurensvalk commented 4 months ago

I think I was able to reproduce once, after a long time.

In your experience, is this reproducible if only one hub is broadcasting? Or is it more likely to happen when another hub is also broadcasting?

BertLindeman commented 4 months ago

I wanted to change the program in the technichub and noticed it got stuck in half an hour. At that time the large program also ran.

Now: At the moment I have running the large program only, so no other broadcasters. Added code to color the led hoping the problem still occurs and that I can see the light goes steady at the moment the problem occurs. Fingers crossed...

BertLindeman commented 4 months ago

Gotcha. Running only the large program (adapted to SEE that it got stuck) in almost an hour. There were no other transmitters.

The steady color was red, see video below. The color.red command is after an observe and before a broadcast.

The changed program so see that it goes wrong and at what command:


from pybricks import version
from pybricks.hubs import TechnicHub
from pybricks.parameters import Axis, Color, Direction, Port, Stop
from pybricks.pupdevices import ColorDistanceSensor, Motor, ColorSensor
from pybricks.tools import multitask, run_task, wait
from urandom import choice

Color.WHITE = Color(0, 0, 100)
Color.BLACK = Color(0, 0, 0)

SensorHub = TechnicHub(top_side=Axis.Z, front_side=Axis.X, broadcast_channel=3, observe_channels=[1, 2])
DirectTrigger = ColorDistanceSensor(Port.A)
DirectTrigger.detectable_colors((Color.RED, Color.NONE))
# LateTrigger = ColorDistanceSensor(Port.C)
# LateTrigger.detectable_colors((Color.WHITE, Color.BLACK))
LateTrigger = ColorSensor(Port.C)  # I have only ONE colorDistanceSensor so a ColorSensor
LateTrigger.detectable_colors((Color.WHITE, Color.BLACK))
Tipping = Motor(Port.B, Direction.COUNTERCLOCKWISE)

DistributorTipp = False
TableReadyForTipp = False
Triggered = False
Tipped = False
TriggeredDistributor = False

async def main1():
    global Triggered
    while True:
        await wait(0)
        while not (await DirectTrigger.color() == Color.RED or await LateTrigger.color() == Color.WHITE):
            await wait(1)
        # print(end="1")
        Triggered = True
        await wait(1000)
        Triggered = False

async def main2():
    global DistributorTipp, TriggeredDistributor, TableReadyForTipp
    while True:
        await wait(0)
        DistributorTipp, = SensorHub.ble.observe(2) or [0] * 1
        SensorHub.light.on(Color(180, 100, 50))  # Cyan

        TriggeredDistributor, TableReadyForTipp = SensorHub.ble.observe(1) or [0] * 2
        SensorHub.light.on(Color(0, 100, 50))  # red

        await SensorHub.ble.broadcast([Tipped, Triggered])
        SensorHub.light.on(Color(60, 100, 50))  # Yellow

        # SensorHub.light.on(Color.NONE)
        await wait(200)
        SensorHub.light.on(Color(120, 100, 50))  # green

async def main3():
    global Tipped
    while True:
        await wait(0)
        # print(DistributorTipp, "d", TableReadyForTipp, "t", end="")
        if DistributorTipp == True and TableReadyForTipp == True:
            await Tipping.run_angle(100, 100, Stop.BRAKE)
            await wait(1000)
            await Tipping.run_angle(100, -100, Stop.BRAKE)
            Tipped = True
            while not (DistributorTipp == False and TableReadyForTipp == False):
                await wait(1)
            Tipped = False
        else:
            pass
        await wait(500)

async def main():
    await multitask(main1(), main2(), main3())

print(version)

run_task(main())

https://github.com/pybricks/support/assets/8142081/d96f9acc-c3ea-4c7c-901c-ba984b15ccdf

JJHackimoto commented 4 months ago

Good testing!

I've not tested with only one active hub, but your test shows that the issue can still occur. The video you have show the exact same sequence as I experience.

Note that if the program doesn't alter the light on the hub, it continues to fade as normal even though it has gotten stuck. So it's not visible then if the program is stuck or not, until you press the button and realize nothing happens unless long-pressing. And that's where it rapidly blinks and you'll have to take the batteries out.

BertLindeman commented 4 months ago

Also the small program resulted in "stuck" Started at 17:45 and noticed at 18:30. Could be sooner.

The color (rats I forgot): see video:

https://github.com/pybricks/support/assets/8142081/755da0f9-0aa0-4f46-a37b-e5142a6fb0ce

[EDIT] Did not only forget the color, but also the test program:


from pybricks.hubs import TechnicHub
from pybricks.tools import wait
from pybricks.parameters import Color
from urandom import choice

transmitter = TechnicHub(broadcast_channel=1, observe_channels=[2, 3])

while True:
    transmitter.light.on(Color(120, 100, 50))  # green
    transmitter.ble.broadcast([choice([True, False]), choice([True, False])])
    transmitter.light.on(Color(60, 100, 50))  # orange
    TriggeredDistributor, TableReadyForTipp = transmitter.ble.observe(3) or [0] * 2
    transmitter.light.on(Color(0, 70, 50))  # red
    wait(100)
BertLindeman commented 3 months ago

Took more commands from the small program and named it simple.

  1. fixed data to broadcast
  2. no more observing

simple.py:


from pybricks.hubs import TechnicHub
from pybricks.tools import wait
from pybricks.parameters import Color
# from urandom import choice

transmitter = TechnicHub(broadcast_channel=1, observe_channels=[2, 3])

while True:
    transmitter.light.on(Color(120, 100, 50))  # green
    # transmitter.ble.broadcast([choice([True, False]), choice([True, False])])
    transmitter.ble.broadcast([False, True])
    transmitter.light.on(Color(0, 0, 50))  # white
    # TriggeredDistributor, TableReadyForTipp = transmitter.ble.observe(3) or [0] * 2
    # transmitter.light.on(Color(0, 70, 50))  # red
    wait(100)

Started the program at 10:15 and found it as I came back at 17:00 steady "green" So somewhere between 13:00 and 17:00 it stopped. And rapid blinking, of course. So happy with the old screwless cover.

BertLindeman commented 3 months ago

Hope you do not mind me going on πŸ˜„

The technichub with the simple programgot "stuck" at the moment. And I did not yet stop it.

To test if the broadcasting is really stopped, I programmed a cityhub to show that.


from pybricks.hubs import CityHub
from pybricks.tools import wait
from pybricks.parameters import Color

receiver = CityHub(observe_channels=[1])

while True:
    receiver.light.off()
    data = receiver.ble.observe(1)
    if data:
        TriggeredDistributor, TableReadyForTipp = data
        if TriggeredDistributor == False and TableReadyForTipp == True:
            receiver.light.on(Color.GREEN)
    else:
        receiver.light.on(Color.ORANGE)
        wait(500)
    wait(100)

A few miliseconds after the start the cityhub is steady green. I assume that means the TechnicHub is still broadcasting.

[EDIT] Pressing the Technichub causes rapid blink. The cityhub stays green until I removed the batteries from the Technichub. After that the cityhub led goes to orange.

laurensvalk commented 3 months ago

Thanks for following up! Yes, it does still seem to broadcast. I think the hub is just waiting for a reply from the bluetooth chip that it does not always get.

Then the hub keeps waiting forever, and won't complete shutdown, which explains the rapid blinking.

I think I have been able to reproduce this once. I could try adding a timeout to the broadcast operation to see if that works around it.

But it can take up to an hour for this to happen, so it's pretty slow going :smile:

BertLindeman commented 3 months ago

Happy to test!

BertLindeman commented 3 months ago

You probably already know.. I changed the simple program to use 3 strings with a total length of 26 bytes. Program with a one byte string takes a lot more time to get stuck than the 26 byte transmitter.

Timing of the "26"-program

Not mentioned before: this TechnicHub Bluetooth chip hub.ble.version() has version V3.02.00

laurensvalk commented 3 months ago

Does it make a difference whether you use wait statements in your loop?

I was hoping that the issue would be quicker to reproduce without waiting, but that does not appear to be the case.

laurensvalk commented 3 months ago

@BertLindeman @JJHackimoto , would you like to reproduce the problem with this firmware?

https://github.com/pybricks/pybricks-micropython/actions/runs/8481460082/artifacts/1369689875

During the broadcast command, this firmware works the same but it will change the light color during all stages of broadcasting. Depending on which color we see when it "freezes", we can see which stage fails to complete. And hopefully fix it.

So run your own favorite programs, but please remove any hub.light commands so we know for sure that the user light is not what we're seeing.

Thanks so much!


To use this firmware, begin the update process as usual. But instead of selecting a hub in the menu, choose advanced at the bottom and choose the ZIP file from the link above. The rest proceeds as usual.

laurensvalk commented 3 months ago

For what it's worth, this program does not appear to reproduce it. I'll give it a few more hours.

image

laurensvalk commented 3 months ago

Made some progress! I think you'll find that the light gets stuck on orange.

    debug_light(PBIO_COLOR_RED);
    PT_WAIT_WHILE(pt, write_xfer_size);
    HCI_LE_setAdvertisingData(value->size, value->data);
    debug_light(PBIO_COLOR_ORANGE);

    PROCESS_CONTEXT_BEGIN(&pbdrv_bluetooth_spi_process);
    etimer_set(&broadcast_timeout, 500);
    PROCESS_CONTEXT_END(&pbdrv_bluetooth_spi_process);

    PT_WAIT_UNTIL(pt, hci_command_complete || etimer_expired(&broadcast_timeout));
    if (etimer_expired(&broadcast_timeout)) {

         // and we can catch it with this timeout!

    }

But I still have to figure out how to get unstuck. I think we can still print data to Pybricks Code so Bluetooth isn't totally frozen.

But it currently freezes on de-init, perhaps the stop broadcast task. But this gives some pointers to further progress.

Waiting a few hours for the next trial....

BertLindeman commented 3 months ago

@BertLindeman @JJHackimoto , would you like to reproduce the problem with this firmware?

https://github.com/pybricks/pybricks-micropython/actions/runs/8481460082/artifacts/1369689875

During the broadcast command, this firmware works the same but it will change the light color during all stages of broadcasting. Depending on which color we see when it "freezes", we can see which stage fails to complete. And hopefully fix it.

So run your own favorite programs, but please remove any hub.light commands so we know for sure that the user light is not what we're seeing.

Thanks so much!

To use this firmware, begin the update process as usual. But instead of selecting a hub in the menu, choose advanced at the bottom and choose the ZIP file from the link above. The rest proceeds as usual.

Will do..

Would there be interference if I ran two Technic hubs that only broadcast? Until now I kept the tests at one hub only.

BertLindeman commented 3 months ago

Does it make a difference whether you use wait statements in your loop?

I was hoping that the issue would be quicker to reproduce without waiting, but that does not appear to be the case.

If I take out the wait and test with build 3342 firmware I see changing colors, like some faded red, orange with some pink in between... Most of the time the faded red, though. With the wait(100) I see a steady white.

laurensvalk commented 3 months ago

Yes, that is expected. If you put in a wait, you'll mostly see the last light in the sequence, which is white. Otherwise it spends most time cycling, so you see the average color, or a faded white.

In any case, I might have something reproducible, albeit slowly, so don't let me waste your weekend time :smile: I'll report back if I can recover from this state and have something for you all to test.

Would love to fix this for the upcoming stable release, as that would nicely wrap up Bluetooth as "just works". Finally!

BertLindeman commented 3 months ago

Hope I do not take too much time of your weekend.. Preliminary test result: The hub running the test with the wait(100) stops (after ~ half an hour) before the test without the wait (still running). The color seemed to me a faded orange. The hub stopped on button press with rapid blinking. Or was it still running and got stuck anyway? Have a good weekend all...

Bert

JJHackimoto commented 3 months ago

Thanks again for all the testing and the new firmware! I’ll test it out on Tuesday since I’m away this weekend. Can’t wait to see it all work when the issue has been pinpointed!

laurensvalk commented 3 months ago

Small update and some notes: If I:

... then the program can be restarted and still works, i.e. broadcasts again.

Forcing a reset like that during a running program would be very difficult but at least we're not fully stuck on the main CPU.

As far as I can tell, if we do stay connected when this happens, printing to stdout still seems to work, so the BT chip isn't completely frozen. It just doesn't seem to complete the stop observing and/or stop broadcasting calls....

laurensvalk commented 3 months ago

make the program skip the de-init in pb_type_ble.c (which currently never completes)

So if we exit the broadcast task when it fails and raise to stop the program, the part that blocks de-init is stop_observe_task with PT_WAIT_UNTIL(pt, device_discovery_done).

Now added a timeout there also, to see if the BT chip can recover after that without resetting the chip. Back in an hour...

laurensvalk commented 3 months ago

And if stop_observe_task is one aspect here, that could explain why we're seeing this only if we combine broadcasting and observing: if the restart task does not complete, the queue will get stuck.

laurensvalk commented 3 months ago

@BertLindeman - a useful test would be: does it still get in the stuck state if you don't have observe_channels, so only broadcast_channel? You can just use the normal firmware for this, so you could make your own light logic to see if it gets stuck.

BertLindeman commented 3 months ago

@BertLindeman - a useful test would be: does it still get in the stuck state if you don't have observe_channels, so only broadcast_channel? You can just use the normal firmware for this, so you could make your own light logic to see if it gets stuck.

Will do, Laurens. Assume beta firmware is best. Later...

[EDIT] Running without wait as that got stuck faster.. Without the wait the color needs to be watched a bit better...


from pybricks.hubs import TechnicHub
from pybricks.tools import wait
from pybricks.parameters import Color

# transmitter = TechnicHub(broadcast_channel=1, observe_channels=[2, 3])
transmitter = TechnicHub(broadcast_channel=1)  # req of Laurens

# 3 strings, so len = 3 + len(strings)
# BLE max payload is 26 bytes.
string_1 = "123456789"
string_2 = "blob"
string_3 = "asdfghjklm"
print(len(string_1) + len(string_2) + len(string_3) + 3)

while True:
    transmitter.light.on(Color(120, 100, 50))  # green
    # transmitter.ble.broadcast([choice([True, False]), choice([True, False])])
    transmitter.ble.broadcast([string_1, string_2, string_3])
    transmitter.light.on(Color(0, 0, 0))  # None
    # wait(100)