micropython / micropython-lib

Core Python libraries ported to MicroPython
Other
2.44k stars 1k forks source link

Writing to a second serial connection in a thread has errors #874

Closed mischif closed 2 months ago

mischif commented 5 months ago

Running on a Pi Pico with MicroPython 1.23.0 and whatever the latest version of usb-device-cdc is in mip; code to reproduce:

from utime import sleep_ms, time
from _thread import start_new_thread

import usb.device
from usb.device.cdc import CDCInterface

SERIAL_TWO = None

def update():
    while True:
        print("Update: {}".format(time()), file=SERIAL_TWO, end="\r\n")
        sleep_ms(1000)

def main():
    global SERIAL_TWO
    SERIAL_TWO = CDCInterface()
    SERIAL_TWO.init(timeout=0)
    usb.device.get().init(SERIAL_TWO, builtin_driver=True)

    while not SERIAL_TWO.is_open():
        sleep_ms(100)

    start_new_thread(update, ())

    while True:
        sleep_ms(1000)

if __name__ == "__main__": main()

If you connect to both ports with mpremote a0 and mpremote a1, you'll see this error sporadically:

Traceback (most recent call last):
  File "usb/device/core.py", line 316, in _xfer_cb
  File "usb/device/cdc.py", line 327, in _wr_cb
  File "usb/device/core.py", line 833, in finish_read
AssertionError: 

But I did see this error once:

Unhandled exception in thread started by <function update at 0x2000aef0>
Traceback (most recent call last):
  File "<stdin>", line 12, in update
  File "usb/device/cdc.py", line 356, in write
  File "usb/device/cdc.py", line 322, in _wr_xfer
  File "usb/device/core.py", line 570, in submit_xfer
  File "usb/device/core.py", line 302, in _submit_xfer
RuntimeError: xfer_pending
projectgus commented 5 months ago

Hi @mischif,

Thanks for the report, I can reproduce this here. The CDC class currently uses machine.disable_irq() and machine.enable_irq() to control concurrent access to the serial buffer, but this doesn't prevent a thread on CPU1 from accessing it at the same time a USB callback is running. EDIT: This is wrong, machine.disable_irq() locks a mutex.

Will look into an alternative locking mechanism for this.

projectgus commented 5 months ago

Root cause is not quite that actually, it's that the current USB transfer callback code marks the endpoint as free (no pending transfer) before it calls the transfer callback, so the callback code can call submit_xfer() if needed to submit a new transfer immediately.

However when the thread on the other CPU is submitting transfers, there's a race where it may see the endpoint has no pending transfer and submit a new transfer before the callback is done executing. Need to redesign the callbacks to handle this sequence.

mischif commented 4 months ago

Is there an update for this bug? I see one PR related to this was merged, but it's apparently only a partial fix.

projectgus commented 4 months ago

There is, I was going to return to these fixes this week but I got sick instead. I should have something soon.