Closed KarlK90 closed 4 years ago
Is it possible to benchmark the performance of this change? The default size()
impl is very close to the code in this PR, but leverages more features of embedded-graphics. I'd be interested to see if this PR speeds up the clear method at all.
Sure I can try to benchmark it, but it already felt faster. My reasoning was that the draw_rectangle
method of this crate does check if the pixel lay inside the display bounds, if these checks are not elided in release builds there should be a noticeable speedup.
Ah, I see. I was looking at the embedded_graphics code but this optimises away some driver code instead :)
That was certainly interesting! System is a Raspberry Pi 3B with a 3.2" ili9341 Display, spi speed 62.5 MHz using linux-embedded-hal spidev abstraction.
"Unoptimized" version:
Optlevel 3 + LTO + codegen-units = 1: 678ms
Optlevel 3: 693ms
"Optimized" version:
Optlevel 3 + LTO + codegen-units = 1: 675ms
Optlevel 3: 693ms
So basically marginal difference, which is really interesting and shows how subjective things can be! But I saw huge (12x!) differences from using the display-interface enabled version by @therealprof implemented in https://github.com/yuri91/ili9341-rs/pull/16
"Unoptimized" version display-interface:
Optlevel 3 + LTO + codegen-units = 1: 56ms
"Optimized" version display-interface:
Optlevel 3 + LTO + codegen-units = 1: 52ms
Maybe the 4ms improvement is still relevant enough to justify the merge?
But I saw huge (12x!) differences from using the display-interface enabled version
Wow! That's a huge improvement. Well worth the benchmark.
Maybe the 4ms improvement is still relevant enough to justify the merge?
Certainly can't hurt :) but I'll leave that decision up to the maintainers ;)
Well, the trivial implementation takes each value from the iterator, turns it into an array of 2 u8s and sends it to the SPI peripheral via embedded-hal adapter using individual sends. The display-interface SPI driver implements a (currently fixed sized) buffer. Depending on how efficient the SPI embeded-hal adapter works there might be a lot of overhead per send.
It should be possible to write a Linux spidev specific display-interface driver which could go even faster. The embedded-hal traits are universal but the slice based approach is not ideal for this kind of application. The nice thing about the display-interface is that one can easily chose (or write) a platform specific driver to get the most bang for the buck and simply swap it via initialisation in the application.
Thanks for the PR, seems good to me and I am going to merge it!
The optimized clear method draws directly to the display without bounds checking.
Thank you for the driver :slightly_smiling_face: