ayjaym commented 1 year ago

On Micropython 1.20 if you start a thread on Core1, even if that thread does nothing then when an IRQ is triggered from the rotary encoder on code running on Core0 the whole machine locks up e.g

on core 0

t = start_new_thread(EventManager.Start, (onKeyDown,onKeyUp)) and then some code after that to interact with the encoder using this library, which works perfectly if the thread isn't spawned first.

The thread on core 1 is intended as a class which would handle a touchscreen - I have commented out that code because the failure occurs without it. However the code does work fine on Core1. But then running anything on Core1 even the dummy code above, seems to cause the rotary encoder code to fail. (note: python indentation seems to get lost when I save the issue)

class EventManager: @staticmethod def Start(onKeyDown, onKeyUp): while True: a = 1

Problem occurs even if the while loop just has a call to time.sleep_ms(1000), it seems like if there is a process running on Core1 then IRQs fail.

miketeachman commented 1 year ago

I can reproduce the problem using v1.20 firmware using Pin object interrupts (which are also used in the Rotary library). I used your example to create a minimal test to reproduce the failure. The processor sometimes locks up when pin 16 is pulled low, thereby creating an interrupt. Note that lockups don't happen every time, typically only 1 in 3 times when the pin is pulled low. There are many Github issues and discussions about IRQs and how they run in a RP2 multi-threaded program, for example https://github.com/micropython/micropython/issues/10690. I didn't look deeply into these issues to know if you have discovered a new bug, or one of the existing issues already describes the failure you have found.

In any case, this bug needs to be fixed in RP2 firmware before the Rotary library will work in a multi threaded application.

import _thread
from machine import Pin
import time

def pin_callback(p):
    print('callback')

class EventManager:
    @staticmethod
    def Start(onKeyDown, onKeyUp):
        while True:
            a = 1
            time.sleep(1)
            print('core 1')

t = _thread.start_new_thread(EventManager.Start, (1,2))

p16 = Pin(16, Pin.IN, Pin.PULL_UP)
p16.irq(handler=pin_callback, trigger=Pin.IRQ_FALLING)

ayjaym commented 1 year ago

Thanks so much, I realised of course this was a general problem and not specific to your library. Frankly I'm very discouraged by the immaturity of multi-core support in Micropython. I have an open source (and open hardware project) which is working brilliantly hardware-wise - it's a fully user-programmable MIDI controller with an OLED display and a matrix of 16x8 RGB buttons with everything 3d printed. The basic hardware is working nicely, the Pico PI's multiple ADC channels made interfacing to the resistive touchscreen which acts as an underlay to the buttons very easy. But I really need both cores working as one will handle MIDI and the other non-time-critical tasks like updating the OLED display and interfacing with the touchscreen and the backing WS2812 led matrixes. Until then I just can't really release it. So effectively I'm now stalled. Not only do I have this issue but the WLAN stuff doesn't work on core 1 - it hangs on configuration and someone suggested adding the country code. This stopped the hangs but it still doesn't receive UDP packets so my RTPMIDI implementation (which works perfectly on core 0) is also stalled. I do appreciate the whole endeavour is being run by volunteers but it feels like the RPi Foundation really ought to contribute some support here as well, after all, they're putting these things forward as Micropython-programmable microcontrollers and they have dual cores, so that's kinda important. In any case the ESP32 and other controllers are now also multi-core. I get I could probably get all this working in C++ but I want the users to be able to program custom functionality and Micropython is really essential for the whole concept. What do we do to get these threading issues prioritised? On Tuesday, 8 August 2023 at 05:24:30 BST, Mike Teachman @.***> wrote:

I can reproduce the problem using Pin object interrupts. I used your example to create a minimal test to reproduce the failure. The processor sometimes locks up when pin 16 is pulled low, thereby creating the interrupt. Note that lockups don't happen every time, typically only 1 in 3 times when the pin is pulled low. There are many Github issues and discussions about IRQs and how they run in a RP2 multi-threaded program, for example micropython/micropython#10690 import _thread from machine import Pin import time

def pin_callback(p): print('callback')

class EventManager: @staticmethod def Start(onKeyDown, onKeyUp): while True: a = 1 time.sleep(1) print('core 1')

t = _thread.start_new_thread(EventManager.Start, (1,2))

p16 = Pin(16, Pin.IN, Pin.PULL_UP) p16.irq(handler=pin_callback, trigger=Pin.IRQ_FALLING) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

peterhinch commented 1 year ago

@miketeachman This is a rather extreme repro because core 0 has no running code. With this remedied, and some action on pin 16, the sample runs (link P16-P17):

import _thread
from machine import Pin
import time

def pin_callback(p):
    print('callback')

class EventManager:
    @staticmethod
    def Start(onKeyDown, onKeyUp):
        while True:
            a = 1
            time.sleep(1)
            print('core 1')

t = _thread.start_new_thread(EventManager.Start, (1,2))

p16 = Pin(16, Pin.IN, Pin.PULL_UP)
p16.irq(handler=pin_callback, trigger=Pin.IRQ_FALLING)
p17 = Pin(17, Pin.OUT)
while True:
    time.sleep(1)
    p17(not p17())

Outcome:

<irq>
callback
core 1
core 1
callback
core 1
core 1
callback
core 1
core 1
callback
core 1
core 1
callback
core 1
core 1
callback
core 1
core 1
callback
core 1
core 1
callback
core 1
core 1
callback
...

ayjaym commented 1 year ago

Hmm, well yes although I did see the problem with code actually running on core 0 I'm pretty sure. Let me try a while loop with a sleep as a placeholder and see what happens.

ayjaym commented 1 year ago

Ok, well, this may be a stupid error on my part but the following code, while it does not hang, doesn't work if the encoder library is launched from core 1 - I do not get value changes when it is rotated, but I do of course get switch events because these are polled.

i.e the test harness is

from _thread import * from encoder import Encoder import time

def onEncoderSwitchDown(): print ("On Encoder Switch Down")

def onEncoderChange(v): print ("On Encoder Change" , v)

t = start_new_thread(Encoder.Start,(onEncoderSwitchDown, onEncoderChange))

Encoder.Start(onEncoderSwitchDown, onEncoderChange) while True: Encoder.CheckEncoder() time.sleep_ms(100)

and the class is below. This of course is uploaded to the /lib folder on the Pico and I'm then running the test harness from Thonny directly.

Obviously when run in core 1 uncomment out the while loop in the Encoder class and remove it from the harness, so that the thread performs the poll to look for an encoder value change.

This is just kinda ripped out of the larger project and in reality I would have polled the encoder in the WLAN module when it's polling for UDP datagrams as the overhead is negligible and the value change will occur via an IRQ. But - and I could have made a stupid error here - this code doesn't seem to process the IRQs on core 1 at all. I get this is different from the original problem of course.

from rotary_irq_rp2 import RotaryIRQ from machine import Pin import time

class Encoder:

old_val = None
encsw_val = None
OnEncoderSwitchDown = None
OnEncoderChange = None
encoder = None
encsw = None

@staticmethod
def CheckEncoder():

    v = Encoder.encoder.value()
    print(v)
    sw = Encoder.encsw.value()
    if sw != Encoder.encsw_val:
        Encoder.encsw_val = sw
        if sw == 0:
            Encoder.OnEncoderSwitchDown()

    if v != Encoder.old_val:
        Encoder.old_val = v
        Encoder.OnEncoderChange(v)

@staticmethod         
def Start(onEncoderSwitchDown, onEncoderChange):
    Encoder.OnEncoderSwitchDown = onEncoderSwitchDown
    Encoder.OnEncoderChange = onEncoderChange
    Encoder.encoder = RotaryIRQ(pin_num_clk=17, 
          pin_num_dt=19, 
          min_val=0, 
          max_val=6,
          pull_up=True,
          reverse=False,            
          range_mode=RotaryIRQ.RANGE_BOUNDED)

    Encoder.old_val = Encoder.encoder.value()
    Encoder.encsw = Pin(16, mode=Pin.IN, pull=Pin.PULL_UP)
    Encoder.encsw_val = Encoder.encsw.value()
    #while True:
    #    Encoder.CheckEncoder()
    #    time.sleep_ms(100)

peterhinch commented 1 year ago

IRQ's are processed on core 0.

In my experience core 1 should be reserved for computationally intensive tasks with great care taken when communicating between cores. I wouldn't expect classes such as sockets to work when shared between cores because the internal state of the class is not designed for a GIL-free environment. Core 1 is ideally suited for running blocking functions in a way which enables the core 0 code to continue running.

I believe there was discussion among the maintainers as to whether core 1 should run GIL-free. My view is to prefer the high performance solution that we have, but it does mean that an appreciation of multiprocessor coding has to be employed. I favour using asyncio for concurrency on core 0 with core 1 being reserved for code that has an absolute need for true concurrency.

The following allows an encoder to run on core 0 with its state being tracked on core 1:

import _thread
from machine import Pin
import time
import uasyncio as asyncio
from primitives import Encoder

position = 0  # Note constraints on shared globals see THREADING.md
change = 0

def core_1():
    while True:
        time.sleep(1)
        print(f"Position = {position} change={change}")

def cb(pos, delta):
    global position, change
    position = pos
    change = delta

async def main():
    t = _thread.start_new_thread(core_1, ())
    px = Pin(16, Pin.IN, Pin.PULL_UP)
    py = Pin(17, Pin.IN, Pin.PULL_UP)
    enc = Encoder(px, py, div=4, callback=cb)  # div matches mechanical detents
    while True:
        await asyncio.sleep(1)

try:
    asyncio.run(main())
finally:
    asyncio.new_event_loop()

Docs on encoder class and on THREADING.md.

ayjaym commented 1 year ago

Thank you very much for the valuable insights.To be clear I had no intention of sharing resources like sockets across cores.My intention was for the wireless rtpmidi code to run on core 1 along with the encoder. This is because the cost of polling the encoder for value changes is minimal.The core 1 code calls back to send available midi data or to accept midi data to transmit, this seems safe as we are just marshalling a 3 byte command and not accessing internal state across core boundaries. Also as you see we have a callback for encoder value change and switch down.Core 0 then runs code which the user will put together from building blocks. There is an oled display, an SD card reader and a touchscreen which provides 137 buttons which are backed by a ws2812 set of addressable LEDs.Code to communicate with these components will all run on core 0. This then allows the time critical midi operations to run on core 1.At some point an in memory data structure will be supplied to the core 1 code to define a sequence of midi events to be transmitted with timestamps. However at present I cannot achieve this. I would much prefer this architecture as the device user only adds custom code to core 0 and so it's much sinpler to understand for them. I had not anticipated the issues I encountered, I am experienced in multithread coding on actual operating systems and appreciate this is a bare metal implementation and I am aware of the need for locking but found that adding locks did not mitigate my initial issues, if I could resolve these I certainly realise the need to put locks around things like array allocations etc but at present the code doesn't do anything like that. So to summarise ideally I need the wlan subsystem to actually work on core 1 and the rotary encoder to work on core 1. This would make things much simpler for my intended audience. I hope this helps clarify where I was going, architecturally. I appreciate your input very much.

Sent from Yahoo Mail on Android

On Sat, 12 Aug 2023 at 8:42, Peter @.***> wrote:

IRQ's are processed on core 0.

In my experience core 1 should be reserved for computationally intensive tasks with great care taken when communicating between cores. I wouldn't expect classes such as sockets to work when shared between cores because the internal state of the class is not designed for a GIL-free environment. Core 1 is ideally suited for running blocking functions in a way which enables the core 0 code to continue running.

I believe there was discussion among the maintainers as to whether core 1 should run GIL-free. My view is to prefer the high performance solution that we have, but it does mean that an appreciation of multiprocessor coding has to be employed. I favour using asyncio for concurrency on core 0 with core 1 being reserved for code that has an absolute need for true concurrency.

The following allows an encoder to run on core 0 with its state being tracked on core 1: import _thread from machine import Pin import time import uasyncio as asyncio from primitives import Encoder

position = 0 # Note constraints on shared globals see THREADING.md change = 0

def core_1(): while True: time.sleep(1) print(f"Position = {position} change={change}")

def cb(pos, delta): global position, change position = pos change = delta

async def main(): t = _thread.start_new_thread(core_1, ()) px = Pin(16, Pin.IN, Pin.PULL_UP) py = Pin(17, Pin.IN, Pin.PULL_UP) enc = Encoder(px, py, div=4, callback=cb) # div matches mechanical detents while True: await asyncio.sleep(1)

try: asyncio.run(main()) finally: asyncio.new_event_loop() Docs on encoder class and on THREADING.md.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

ayjaym commented 1 year ago

And your comment on irqs and core 0 led me to Google and find this interesting thread because I said "how did he know that, I don't recall reading that in the docs" https://github.com/orgs/micropython/discussions/10638It looks like the whole multi threading area is evolving and that right now what I want to do isn't really feasible.Problem is that I don't want the user having to write a loop to capture encoder events on core 0. And I assume that possibly the wlan failures on core 1 may be because irqs aren't being handled in that scenario, as well. Now that the pi zero is becoming available again perhaps I need to redesign around that (probably the zero w2 ideally though I'm not sure about availability). That's a nuisance because I need two ADC channels for the touchscreen so will add significant cost. Or hope that multicore support on the rp2040 might be enhanced and just park this project for a few months. Or get my hands dirty on the code, myself, of course! Sent from Yahoo Mail on Android