raspberrypi / picotool

BSD 3-Clause "New" or "Revised" License
525 stars 86 forks source link

Improve loading speed by 50% #66

Open DavidEGrayson opened 1 year ago

DavidEGrayson commented 1 year ago

The W25Q16JVUXIQ flash chip on the Pico has multiple erase commands that erase different sizes of memory, as documented in the datasheet:

image

The current implementation of picotool only uses the 4 KB erase command, which is the least efficient one in terms of bytes per second.

This pull request changes picotool to pick erase sizes based on how much flash is left to load. The test results below show that loading a 596 KiB UF2 currently takes 4.8 seconds, and with my changes it will only take 3.2 seconds, which means the speed has increased by 50%. This test was on Windows but I also saw similar results on Linux.

Status quo (commit 03f2812):

$ time ./picotool load --verify rp2-pico-20220618-v1.19.1.uf2
Loading into Flash: [==============================]  100%
Verifying Flash:    [==============================]  100%
  OK

real    0m4.807s
user    0m0.015s
sys     0m0.000s

With this PR (commit cdf6228):

$ time ./picotool load --verify rp2-pico-20220618-v1.19.1.uf2
Loading:   [==============================]  100%
Verifying: [==============================]  100%

real    0m3.212s
user    0m0.000s
sys     0m0.000s

There is room for further improvement if anyone is interested, because the way the code picks erase sizes is very simple right now. When it needs to erase 12 KB, it will use three 4 KB commands instead of using a single 32 KB erase command which would be faster. Also, it would do the erasing using three separate USB commands when one would work.

I haven't looked into how standardized the larger erase commands are and I don't know what set of flash chips picotool is aiming to support, but I thought I'd make the pull request anyway to start the conversation.

The first commit of this pull request contains the speed improvement, and could be merged in or cherry-picked on its own.

The pull request also contains a second commit that further improves things but doesn't have much effect on speed. The second commit changes the order that things are done so that all verifications are done after all writes. This is important because the erasing/writing commands might conceivably malfunction and write to parts of the chip they were not supposed to touch, invalidating an earlier verification: the later we can delay verification, the more meaningful it is. Secondly, the current code fills the screen with numerous progress bars and "OK" messages when you give it a sparse UF2 file that writes to many different regions of flash: it prints "Loading", "Verifying", and "OK" for every region. Moving all the verifications to the end fixes that.

If anyone is interested, I have a branch with all of my improvements here: https://github.com/DavidEGrayson/picotool/tree/dev/david/master

lurch commented 1 year ago

When it needs to erase 12 KB, it will use three 4 KB commands instead of using a single 32 KB erase command which would be faster.

But surely you can't guarantee that there's not data in that other 20KB that you don't want to be erased? :thinking:

DavidEGrayson commented 1 year ago

This PR only erases what it needs to, but for future work maybe it could be argued that erasing a 32 KB or 64 KB chunk isn't all that different from erasing 4 KB. I'm actually accustomed to programming smaller chips like AVRs where we just erase the whole chip by default.

kilograham commented 1 year ago

thanks; yes we will need to think about how to tell picotool what erase commands it can use based on the flash chip (or whether there is a standard progression)

lurch commented 1 year ago

for future work maybe it could be argued that erasing a 32 KB or 64 KB chunk isn't all that different from erasing 4 KB.

Picotool allows you to write arbitrary chunks of data at arbitrary addresses, which might be useful if you have a multi-stage bootloader or something, or if you have a "data blob" that you want to update independently of the executable code. :man_shrugging:

DavidEGrayson commented 12 months ago

Just a reminder: I think think PR represents a very good improvement to the performance of picotool and should be merged in, even if there are some code style issues that someone wants to change later.

lurch commented 12 months ago

There are many 3rd-party boards all using the RP2040 https://github.com/raspberrypi/pico-sdk/tree/develop/src/boards/include/boards

Do you know if this PR works correctly with all the various Flash chips that they use?

kilograham commented 12 months ago

Just a reminder: I think think PR represents a very good improvement to the performance of picotool and should be merged in, even if there are some code style issues that someone wants to change later.

some version of it will be at some point ;-) just not super imminently