tinygo-org / tinygo

Go compiler for small places. Microcontrollers, WebAssembly (WASM/WASI), and command-line tools. Based on LLVM.
https://tinygo.org
Other
15.29k stars 900 forks source link

nRF 52840 freezes when USBCDC receives too much data #1820

Open dmlambea opened 3 years ago

dmlambea commented 3 years ago

I am experiencing system freezes while using the USBCDC device. The freezes always occur on data overloads. If the amount of data sent to the nRF's USBCDC adapter is chunked in small blocks, it seems to go fine. But if the data block is large, the board freezes in the usb.Read(...) call.

Definitions of "small" and "large" blocks are not precise. Assuming that the board's USBCDC is connected to the /dev/ttyACM1 device, I've been able to run the following without any problems, even several times:

$ echo "Hi there!" > /dev/ttyACM1

But if I use a large data block, like for example TinyGo's README file (https://raw.githubusercontent.com/tinygo-org/tinygo/release/README.md), which is currently 8990 bytes long, the board freezes:

$ curl https://raw.githubusercontent.com/tinygo-org/tinygo/release/README.md -o /dev/ttyACM1

The following code is the example I'm using to test the communications:

package main

import (
    "machine"
)

var (
    respOK    = []byte("ok\n\r")
    respError = []byte("err\n\r")
    respEmpty = []byte("??\n\r")
)

func main() {
    usb := machine.USB
    usb.Configure(machine.UARTConfig{})

    var buf [128]byte
    for {
        readFromUSB(&usb, buf[:])
    }
}

func readFromUSB(usb *machine.USBCDC, buf []byte) {
    if usb.Buffered() == 0 {
        return
    }

    n, err := usb.Read(buf)
    var response []byte
    switch {
    case err != nil:
        response = respError
    case n > 0:
        response = respOK
    default:
        response = respEmpty
    }
    usb.Write(response)
}
dmlambea commented 3 years ago

The method I'm using to see where the hang occurs is by turning on and off a LED before and after the usb.Read(...) call. The system always freezes with the LED on.

dmlambea commented 3 years ago

The problem is reproduceable under TinyGo v0.16 and v0.17

tinygo version 0.17.0 linux/amd64 (using go version go1.15.10 and LLVM version 11.0.0)

sago35 commented 3 years ago

I reproduced it using the instructions you provided.

The following method also reproduced the problem.

sago35@M75q2  ~/tinygo/tinygo (dev)
$ tinygo version
tinygo version 0.18.0-dev-6152a661 linux/amd64 (using go version go1.16.2 and LLVM version 11.0.0)

sago35@M75q2  ~/tinygo/tinygo (dev)
$ tinygo flash --target feather-nrf52840 --size short --port /dev/ttyACM0 ./src/examples/serial/
   code    data     bss |   flash     ram
   7408      32    4060 |    7440    4092

sago35@M75q2  ~/tinygo/tinygo (dev)
$ while true; do cat README.md > /dev/ttyACM0; done
aykevl commented 3 years ago

@dmlambea is it still reproducible with #1823? Does that PR improve anything?

dmlambea commented 3 years ago

Yes, @aykevl , it is still reproducible. I've noticed no improvements.

aykevl commented 3 years ago

Okay, thank you for testing!

dmlambea commented 3 years ago

Thank you for your support! If you have any other tests I can help with, just tell me.

aykevl commented 3 years ago

I haven't done much work on USB, @deadprogram might know more. @ardnew is working on a potential replacement of the USB stack, which might be better able to deal with a higher load.

dmlambea commented 3 years ago

@aykevl (cc @deadprogram @ardnew ) I've tracked the problem down to the function:

func waitForEasyDMA() {
    for easyDMABusy.HasBits(1) {
        arm.Asm("wfi")
    }
}

I've changed it so it doesn't wait forever, just to see if the lock is happening there. My patched func is now:

func waitForEasyDMA() {
        var counter int
        for counter = 0xff; easyDMABusy.HasBits(1) && counter > 0; counter-- {
                // arm.Asm("wfi")
        }
        if counter == 0 {
                LED3.High()
        }
}

Please, bear in mind that LED3 is a known led in my setup that I'm using to distinguish when the function exits normally and when it exits due to "timeout"; and that the initialization value of counter is just an arbitrary number for testing purposes. I wanted to know if the infinite loop is happening there or not.

My findings are that if the loop exits due to counter reaching zero, then the board doesn't freeze and the program can continue "normally" (well, not so normally: the loop exited while easyDMA was still marked as busy).

If I remove the "counter reaching zero" as exit condition, then the loop is infinite again due to easyDMABusy's bit never getting reset. The only calls to exitCriticalSection() (the function clearing the easyDMABusy bit) occur in the handleInterrupt code.

The arm.Asm("wfi") is commented out because it makes the loop forever, even with a counter initialized to 1: it seems that no interrupts are fired after entering the function.

ardnew commented 3 years ago

Those critical section locks of course need to be as close as possible to the events that enable/disable the DMA controller.

It might be tricky figuring out what is keeping it locked with just LEDs. If it is a previous read keeping it locked (bulk OUT endpoint), then its likely you'll only reproduce the issue when transmitting data larger than a single Rx packet. On the nRF52840, it looks like max packet size is 128. Minus the 64 byte header, that leaves 64 bytes free before it needs to buffer a second transfer descriptor.

Can you verify the issue is only reproducible at one of these boundaries? >64 bytes, >128 bytes, etc.?

dmlambea commented 3 years ago

@ardnew I've run the following tests by uploading the README.md file in loop, using different chunk sizes each test:

I will leave it running for the whole night.

ardnew commented 3 years ago

That suggests to me it is indeed a race condition with the DMA Rx buffers (as opposed to a race between a read and a control response, for example), which really helps narrow down the cause.

I can give it a closer look this evening.

BTW what board are you using?

dmlambea commented 3 years ago

I'm using nRF 52480. Thank you for your support, @ardnew

Latest news: the 63-byte transfers finally crashed. It took minutes for it to crash, but it failed at the end. So I will try with smaller packet sizes (32 bytes, for example) and will leave it running the whole night.

dmlambea commented 3 years ago

There is something strange here. I've been able to submit 3950 packets of 32 bytes each before getting a freeze. But if I lower the packet size to 16 bytes, I'm able to upload up to 11260 packets before the crash. In all cases, the board freezes anyway.

aykevl commented 3 years ago

~Note that there is still a known issue with heap allocations. It's probably easy to fix, but I haven't gotten to it yet. See #1823 for details. This might be the problem although I suspect it's something else.~

EDIT: nevermind, just realized this is only on SAM D21 and SAM D51 boards, not on the nrf52840. The other heap allocations are fixed in #1823 so in theory the nrf52840 USB code should be free of heap allocations.

aykevl commented 3 years ago

If you have a debug probe, you could try attaching it. It should give you a nice backtrace of where it is stuck. (However, I realize that most USBCDC using boards don't have an easily accessible debug interface).