tinygo-org / tinygo

Go compiler for small places. Microcontrollers, WebAssembly (WASM/WASI), and command-line tools. Based on LLVM.
https://tinygo.org
Other
15.45k stars 911 forks source link

samd51 clocks not being initialized properly? #3317

Closed ajanata closed 1 year ago

ajanata commented 1 year ago

I'm making a project with the Adafruit Matrix Portal board (which has an ATSAMD51J19 processor) to control a HUB75 LED panel. My animation was running at about 45 fps. I decided it was time to start using the GD25Q16 QSPI flash chip on the board, so I hooked up the flash driver for it. After I did so, the animation was running at about 90 fps. I have confirmed that it is not the optimizer behaving differently (the speed difference remains with -opt=2 and -opt=z, though the raw values are of course different).

I was able to make a minimal reproduction of the issue. It is quite clearly faster after the QSPI is initialized; it is not the mere presence of the code causing compiler behavior to change. My only guess is that something isn't being initialized properly and initializing the QSPI fixes it.

Output of the test program:

$ tinygo flash -target matrixportal-m4 -opt z ./ && sleep .5 && tinygo monitor
Connected to /dev/ttyACM0. Press Ctrl-C to exit.
Before initializing QSPI
Running test
Looped 10000000 times in 2.153564454s
Running test
Looped 10000000 times in 2.154418945s
Initializing QSPI
Running test
Looped 10000000 times in 1.157958985s
Running test
Looped 10000000 times in 1.157714844s
$ tinygo flash -target matrixportal-m4 -opt 2 ./ && sleep .5 && tinygo monitor
Connected to /dev/ttyACM0. Press Ctrl-C to exit.
Before initializing QSPI
Running test
Looped 10000000 times in 835.083008ms
Running test
Looped 10000000 times in 835.327148ms
Initializing QSPI
Running test
Looped 10000000 times in 611.450195ms
Running test
Looped 10000000 times in 611.450196ms

Trying to chase this down is well outside my wheelhouse. I don't have any other samd51 devices to test with in case this is something specific to the matrixportal-m4.

deadprogram commented 1 year ago

Perhaps @sago35 can help with this?

sago35 commented 1 year ago

@ajanata @deadprogram

I checked. I think CMCC (Cortex M Cache Controller) is enabled by the following code, which affects it.

    sam.CMCC.CTRL.SetBits(sam.CMCC_CTRL_CEN)

For example, ili9341/pyportal_boing in tinygo-org/drivers will also speed things up considerably. I thought TinyGo's samd51 was somehow slower than Arduino and others, but this may be the cause.

original

$ tinygo flash --target wioterminal --size short --monitor ./examples/ili9341/pyportal_boing/
   code    data     bss |   flash     ram
  28684     228   37832 |   28912   38060
Connected to COM38. Press Ctrl-C to exit.
width, height == 320 240
51  fps
46  fps
45  fps

with CMCC enabled

$ tinygo flash --target wioterminal --size short --monitor ./examples/ili9341/pyportal_boing/
   code    data     bss |   flash     ram
  28700     228   37832 |   28928   38060
Connected to COM38. Press Ctrl-C to exit.
width, height == 320 240
85  fps
85  fps
85  fps
78  fps
80  fps
sago35 commented 1 year ago

Seeed SAMD Boards 1.8.3 The default for wioterminal (arduino) is Cache enabled.

image

If there are no disadvantages to CMCC, I would like to make Enable the default.

deadprogram commented 1 year ago

https://ww1.microchip.com/downloads/en/DeviceDoc/How-to-Achieve-Deterministic-Code-Performance-using-CortexM-Cache-Controller-DS90003186A.pdf

ajanata commented 1 year ago

@ajanata @deadprogram

I checked. I think CMCC (Cortex M Cache Controller) is enabled by the following code, which affects it.

  sam.CMCC.CTRL.SetBits(sam.CMCC_CTRL_CEN)

Adding this to my scratch program is sufficient to change the execution speed.

My primary program has effectively had this enabled for a few months, with tens of hours of runtime since then with no unexplained problems. It's probably a good change to consider making in general, or at least calling it out in the machine documentation.

aykevl commented 1 year ago

This will be part of the next release.

deadprogram commented 1 year ago

This is part of the v0.28 release so now closing. Thank you!