Closed Jaqster closed 1 year ago
5-10 seconds is really fast. I'd expect first time download on a slow computer to be around 30 seconds. Some of the machines we tried at BETT were up to 1 minute.
It appears to be CPU-related - if you try throttling your CPU in dev tools it slows down.
We thought about this in the Python editor and decided on the following Ux that has gone down quite well:
Subsequent flash (~2 seconds)
I think the progress bar really helps users, especially because they don't necessarily know what 'first time' is in this case. It could be a MakeCode update, or a CODAL update, or that the micro:bit was previously used for Scratch, Python or something else like MicroBlocks
Yikes! I think a 1-minute download is not acceptable. Is there anything we can do to improve performance for the first-time download?
I think something to ask @mmoskal - I guess because it's CPU bound on the host, hopefully. I'm not sure if anyone has profiled the web USB code to see where the cycles are spent?
Some benchmarking here. Using 4x CPU throttling on my M1 Macbook Pro I see ~35 seconds for a web USB flash from MakeCode (replacing MicroPython) and the same for replacing MakeCode when I've just added datalogging extension. (FWIW, this is about half the time the Python Editor takes with the same 4x slow down, so I think MakeCode's approach is pretty good here.)
There are two ways to 'full flash' a micro:bit - one is to send the whole hex (either universal, or the intel hex for the specific version of the hardware) and let the micro:bit do a full erase and flash. The other is to use the partial flashing mechanism over the whole of flash.
If using the former method it will be quicker to only send the intel hex, not a universal hex. By the way the micro:bit is behaving, I think MakeCode's using the latter method anyway.
Finally, there definitely seems to be a few things that happen that take some time before the flashing even starts (measured for me by when the orange LED starts to blink) - especially if you've just added an extension. This is much more visible with throttling on.
One thing we can try to speed up "full flashing" could be to send the raw data as a bin file instead of Intel Hex to DAPLink. That would save DAPLink from parsing the Intel Hex format, improving flashing times a little bit.
In my Macbook, with a test file that's around 1.2MBs in the Intel Hex format, the timings for flashing via drag&drop with DAPLink 0258 are:
However, there is one snag, the UICR data is at a memory address far away from the rest of the flash data. Flash goes from address 0x0 to 0x8_0000, and UICR starts at around 0x1000_0100. So generating a bin file results in a file larger than 256 MBs, mostly filled with 0xFF, and sending all that data via WebUSB is not an option. But we can ignore the UICR to generate the bin file data, and after that is completed we can use partial flashing to programme the few bytes left that go into the UICR. This would add very little extra time, and should still result in an overall reduction in flashing time.
Profiling WebUSB flashing might still yield better results (I have no idea where the bottleneck might be), but this is also an option that will likely shave a few extra seconds.
@jwunderl there is progress indicator in skillmap->tutorial loading screen. Can we reuse that?
I've been trying to repro the performance issues we saw in the classroom, but no luck. I think we need to test on some low-performance machines. Here's some of the times I've been seeing on my Surface Book: 38 seconds - new v2, in private Edge 9 seconds - used v2, in private Edge 6 seconds - used v1, in private Edge 33 seconds - used v2, Chrome 10 seconds - used v2, in private Chrome 8 seconds - used v2, logged in Edge 7 seconds - used v2, logged in Edge
I did manage to repro a download failure once - I was logged in, using a tutorial, Edge, went through WebUSB download flow for the first time, and clicking Download didn't actually transfer code.
What kind of timings do you get on the same machine when setting the Chrome/Edge developer tools CPU throttling to x6? It might not be 6 times slower, but I would assume it will still be significant, as there is some CPU bottleneck with WebUSB flashing.
Closing as I added progress indicator https://github.com/microsoft/pxt-microbit/pull/5200
I also poked around the rest of the webusb stack quite a bit for if there was anything we can do to trim off any time and I didn't find anything in particular; I put up a PR swapping from the explicit promise chain to async / await loops in case there was any time we could snag out of it (e.g. potentially browser optimizations could treat it better than a long tail recursive chain) but I didn't get a noticeable amount of time -- https://github.com/microsoft/pxt-microbit/pull/5206 has a before / after build to try, under 4x cpu throttling it seemed to be ~ .5-1 second faster for a full flash but it was too inconsistent to confirm / within the margin of error (e.g. range for a full flash was around 38-44 seconds in those circumstance, mostly just awaiting webusb calls). If anyone does have a device where flashing runs particularly slowly, and sees a difference between the two builds, it'd be very useful to know!
Not sure if there's anything that can be done to improve performance here, but the first time you do a WebUSB program download, it's taking 5-10 seconds to download :-(