joukos / PaperTTY

PaperTTY - Python module to render a TTY or VNC on e-ink
946 stars 101 forks source link

Test IT8951 displays #32

Closed joukos closed 4 years ago

joukos commented 4 years ago

Comment on this issue for your experiences with the IT8951 support.

Tested:

math85360 commented 4 years ago

Tested with PaperTTY & VNC on Waveshare 9.7" with IT8951 :

20191103_145711

Video Waveshare 9.7" eink displaying Midori browser with PaperTTY

joukos commented 4 years ago

Wow, that's beautiful, thanks!

joukos commented 4 years ago

@math85360, is it okay to add the image (and link to the video) to the README?

math85360 commented 4 years ago

Yes, of course !

markbirss commented 4 years ago

Very nice, hope the 7.8 inch works well also

jmi2k commented 4 years ago

I'm getting a 7.8 inch screen soon, I'll report back if everything works fine. It should arrive in about a month.

joukos commented 4 years ago

@jmi2k cool, thanks :) Good luck!

jeLee6gi commented 4 years ago

Here is a terminal on the 7.8 using a raspberry pi 4 running archlinuxarm.

The CPU usage goes up to 100% periodically and even partial refreshes only happen every two seconds or so. Is that to be expected? The only thing I changed was the VCOM to match my display (-1.38V).

MVIMG_20200107_223435 IMG_20200107_222219

jmi2k commented 4 years ago

Right now the only SD card I have is 2GB so I'll have to wait until next week.

joukos commented 4 years ago

Thanks for letting us know @jeLee6gi !

The CPU usage goes up to 100% periodically and even partial refreshes only happen every two seconds or so. Is that to be expected?

Well, I haven't actually tried PaperTTY on a Pi4 and it's very likely there's room for optimization to at least make better use of the multiple cores. Do you use it in TTY or VNC mode? I don't think it should take very long to process an updated frame, but with a display that big, simply sending the data to it might take a while. With partial refresh there shouldn't be too much data to send though...

chi-lambda commented 4 years ago

I've done some profiling, because I use a Pi Zero (with a 6" display) and the problem is even more pronounced there. I only use TTY mode. It seems pack_image in driver_it8951.py is rather slow. I've done some optimization, but still need to test. I'll share it when I'm confident that it still works.

joukos commented 4 years ago

@chi-lambda thanks, performance improvement is always nice, hope it works. I wish we had some unit tests though so it would be easier to avoid anything breaking... ;)

jeLee6gi commented 4 years ago

Well, I haven't actually tried PaperTTY on a Pi4 and it's very likely there's room for optimization to at least make better use of the multiple cores. Do you use it in TTY or VNC mode? I don't think it should take very long to process an updated frame, but with a display that big, simply sending the data to it might take a while. With partial refresh there shouldn't be too much data to send though...

I only used TTY mode so far

I've done some profiling, because I use a Pi Zero (with a 6" display) and the problem is even more pronounced there. I only use TTY mode. It seems pack_image in driver_it8951.py is rather slow.

I also tried to profile:

import time

for i in range(1_000_000):
    print(i)
    time.sleep(1)

I ran this program run until it started scrolling the terminal and then profiled with py-spy for two minutes. Whenever spi_write was running, py-spy had trouble getting samples so it realistically it spent even more time in spi_write than it recorded. Anyways, this is how it spends its time and when it's scrolling like this I get a new image roughly every 8 to 15 seconds.

2m

chi-lambda commented 4 years ago

I used cProfile. Sadly, testing will have to be postponed as my Raspberry Pi seems to have turned into a Raspberry Fry. I'll push the code to my repository later though.

Ultimately, I think it would be ideal to reimplement parts of PIL/Pillow to directly write the output format.

chi-lambda commented 4 years ago

Check out pull request #38.

Alright, I got out my spare Pi 1B and the effect is even more dramatic.

Configuration: 6" display, 800x600, configured for 72 columns and 27 rows, using Terminus 11x22 as a pil font.

Ran "lorem -p5" twice, essentially triggering a full display refresh every time.

Average run time for pack_image with old code: 14.7 seconds. Average run time for pack_image with new code: 1.4 seconds(!!)

Still not really enough for fluent typing, but it's a start.

joukos commented 4 years ago

That's a great speedup! But will this produce the same output as the old code and will it work with the other displays too?

chi-lambda commented 4 years ago

Purely visually speaking, the output is fine. The three changes I made are:

  1. Turn the image into a list quicker using getdata().
  2. Use logic instead of arithmetic to fill the packed buffer. This is the part most likely to contain an error, but should be pretty easy to verify.
  3. Write the bytes in the proper order straight away instead of swapping in a separate step.

The code makes the bold assumption that the image has a number of pixels divisible by 4, but I seem to recall that it's always padded to a multiple of 8 anyway.

There's still a lot of potential for further speedup by breaking up updates into smaller pieces, but this would have to be done in papertty.py. I think finding the optimal subdivision is NP-hard though.

The changes only affect IT8951 devices.

joukos commented 4 years ago

Okay, let's hope there's no quirks and I'll merge it since it's such a significant boost. I added a tag for the old code (v0.03_unoptimized) in case it doesn't work for someone.

Thanks a lot for this contribution!

markbirss commented 4 years ago

I will soon test also on 9.7 inch, nice to see these efforts to enhance the performance, thank you

chi-lambda commented 4 years ago

I found a little wrinkle in my new code: It assumes that the input is 1bpp. That's true for TTY (which could be reviewed, incidentally), but not for VNC. Maybe we should create two different draw methods or a branch in the existing one.

joukos commented 4 years ago

Hmm, well that's a bit bad for the VNC feature... I can't really test it myself or have time to spend on guesswork right now, so if you or someone else is willing to implement some quick fix so that it uses the optimized one for TTY only, it would be very appreciated :)

Best would of course be that the draw method is optimized for both cases, but that can be done gradually. Again, thanks for your effort on this.

stdlogicvector commented 4 years ago

Slightly unrelated question, but do you guys know if the IT8951 Driver Hat from Waveshare has special firmware/settings in the SPI flash for the different panel sizes/resolutions?

I've only seen a function that reads the panel size from the controller but none to set it. Maybe it's stored in the flash?

Perhaps some of you have different panels on hand and can try if they work with the same IT8951 board? Or even dump the SPI flash from different boards and compare?

chi-lambda commented 4 years ago

@joukos The packing algorithm can be sped up a lot (about 100x) by implementing it in C. Would you consider that or do you want to keep it pure Python?

The effect is less noticeable on a Pi4 (.2* vs .002 seconds) than on older models and Zeros, where rendering times can be in the range of seconds. Right now, my code spends the most time (.3 seconds) on loading the image into an array, which could potentially be optimized more; and sending the image to the device (usually 1, but sometimes up to 1.5 seconds for 800x600; much shorter for smaller updates), which is pretty much out of our hands.

* .2 seconds on an already optimized version of the code. The one in the linked commit runs much slower.

gdkrmr commented 4 years ago

Can numba do this? Maybe this is easier than having to build C code.

joukos commented 4 years ago

I'd prefer to keep things Python and as simple as possible, unless there's a big enough gain to justify any extra dependencies or complexity. I think the fact that the program is in Python makes it easier for others to quickly get to know the code and improve it in its current early stages. In the end the most limiting factor is simply the speed of transmitting image data to the display via SPI, and yeah, there's not much that can be done about that with the current supported displays.

That said, the processing part can and should be improved where applicable, but I'd say that the gains need to be fairly significant to justify spending too much time on them at this time. If a whole second out of a 3 second refresh time can be shaved off by simply better code, that's great, but for example needing to add Fortran and NumPy as dependencies to shave a 100 ms of the same 3 seconds is not worth it in my opinion. Lean and mean is preferred.

I'm not saying it's bad to optimize the bit twiddling if one is willing to put effort into it, but I think the usability issues are more of a priority, and if we want to go very low level with this, might as well turn it into a kernel module and at that point, we'd be working for Waveshare maybe ;)

Thank you both for the code you've provided, I'll try to get them merged as soon as possible, personal life is just getting seriously in the way right now (though in a good way).

markbirss commented 4 years ago

https://blog.the-ebook-reader.com/2020/01/14/new-10-3-waveshare-e-ink-monitor-released-for-539-video/

On Wed, 15 Jan 2020, 12:42 joukos, notifications@github.com wrote:

I'd prefer to keep things Python and as simple as possible, unless there's a big enough gain to justify any extra dependencies or complexity. I think the fact that the program is in Python makes it easier for others to quickly get to know the code and improve it in its current early stages. In the end the most limiting factor is simply the speed of transmitting image data to the display via SPI, and yeah, there's not much that can be done about that with the current supported displays.

That said, the processing part can and should be improved where applicable, but I'd say that the gains need to be fairly significant to justify spending too much time on them at this time. If a whole second out of a 3 second refresh time can be shaved off by simply better code, that's great, but for example needing to add Fortran and NumPy as dependencies to shave a 100 ms of the same 3 seconds is not worth it in my opinion. Lean and mean is preferred.

I'm not saying it's bad to optimize the bit twiddling if one is willing to put effort into it, but I think the usability issues are more of a priority, and if we want to go very low level with this, might as well turn it into a kernel module and at that point, we'd be working for Waveshare maybe ;)

Thank you both for the code you've provided, I'll try to get them merged as soon as possible, personal life is just getting seriously in the way right now (though in a good way).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/joukos/PaperTTY/issues/32?email_source=notifications&email_token=AFKZ2J5GSCH3SG6CUGC3EULQ53SC7A5CNFSM4JIJ4BO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI734QA#issuecomment-574602816, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFKZ2J5HILX44JEDJXF745TQ53SC7ANCNFSM4JIJ4BOQ .

gdkrmr commented 4 years ago

@jeLee6gi How did you install py-spy? I tried and it doesn't work:

(papertty) pi@raspberrypi:~ $ pip install py-spy
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
ERROR: Could not find a version that satisfies the requirement py-spy (from versions: none)
ERROR: No matching distribution found for py-spy
jeLee6gi commented 4 years ago

@gdkrmr Good question! I was going to compile it first (py-spy is mostly written in rust) but then I got tired of satisfying compilation dependencies and just downloaded the armv7 binary from their github releases page. :grin:

https://github.com/benfred/py-spy/releases

joukos commented 4 years ago

That py-spy looks pretty neat, I should try it out too. Side note about the compilation issues: shouldn't cargo install py-spy handle all the (Rust) dependencies automatically or are there some problems building it on arm? For what it's worth, I got it installed on Ubuntu 18.04 which has a Rust environment installed with Rustup by doing:

sudo apt install libunwind-dev # at the very end, the linker complained about not finding this
cargo install py-spy
C-Rothnie commented 4 years ago

I can confirm the 10.3 inch screen works fine. Still setting it up, but I want to use it as a daylight-visible chart plotter on a sail boat. waveshare_10-3_e-paper

joukos commented 4 years ago

@C-Rothnie thanks for letting us know (and for the nice image)!

chi-lambda commented 4 years ago

@C-Rothnie

What's the update speed like on that huge display?

C-Rothnie commented 4 years ago

I want it slow in fact - I am happy with a 5 second update for my sailing application. Things don't change much faster than that on the water. The testing I have done is with a 1 second sleep and even so, it does a partial refresh on the changed area after a second or two. I am using a Raspberry Pi 4. I don't intend to use the e-ink screen in an interactive mode very much - just display marine charts, boat position, other nearby boats etc in the chart plotting application OpenCPN. I will give further feedback after I have finished setting it up.

joukos commented 4 years ago

Cool! I seem to remember taking a look at OpenCPN a few years ago and thought it would be ideal to have an e-ink with it, but back then I didn't have such a display (and they weren't really available anyway). I'm interested in knowing how your project turns out!

gdkrmr commented 4 years ago

@gdkrmr Good question! I was going to compile it first (py-spy is mostly written in rust) but then I got tired of satisfying compilation dependencies and just downloaded the armv7 binary from their github releases page.

https://github.com/benfred/py-spy/releases

Thanks, works now! I didn't realize that this was a standalone program, I just assumed that this worked somehow like the debugger, python -m pdb ....

Just for the record: I had to run sudo py-spy record -o profile.svg --pid xxx, because running python as a child process, py-spy -o profile.svg -- ~/.virtualenvs/..., crashed, left the child process alive, and the display unusable until killing the python process manually.

chi-lambda commented 4 years ago

Right now, the SPI uses a speed of 2 MHz (search for self.SPI.max_speed_hz in driver_it8951.py), which I guess was chosen rather arbitrarily. I've managed to raise it to 18 MHz (20 doesn't work), and hoo boy does that speed up the transfer step. Could you other IT8951 owners test what works for you?

@C-Rothnie @jeLee6gi @math85360

gdkrmr commented 4 years ago

Right now, the SPI uses a speed of 2 MHz (search for self.SPI.max_speed_hz in driver_it8951.py), which I guess was chosen rather arbitrarily. I've managed to raise it to 18 MHz (20 doesn't work), and hoo boy does that speed up the transfer step. Could you other IT8951 owners test what works for you?

@C-Rothnie @jeLee6gi @math85360

I did some testing with the 4.2 inch monochrome display and times seemed the same, https://github.com/joukos/PaperTTY/pull/40#issuecomment-578510436, maybe I did something wrong. I could crank it up to 40MHz on a Pi4.

joukos commented 4 years ago

I was also wondering if perhaps there was something wrong with either the measurement or actually setting the speed, since the flamegraphs seemed peculiarly near-identical with the 4.2"...

In any case, great to hear that at least the IT8951 may benefit from that! @chi-lambda, which Pi version (assuming a RPi) did you use and can you give some rough numbers on the speed-up?

gdkrmr commented 4 years ago

I was also wondering if perhaps there was something wrong with either the measurement or actually setting the speed, since the flamegraphs seemed peculiarly near-identical with the 4.2"...

It did "something", because it actually started failing at 50MHz, maybe it's the display.

chi-lambda commented 4 years ago

I tested on a Pi Zero W, speed-up from 1–1.5 seconds to about .2–.3 seconds for a full (800x600) update. The initializing update is ridiculously slow (over 10 seconds at 2 MHz) for some reason, but also scales about (inverse) linearly with frequency. Pi Zero has so far been about 50 percent slower sending date than my Pi4, which is also just about the ratio of the CPU frequency. Haven't checked the higher frequency on the Pi4 yet.

chi-lambda commented 4 years ago

In unrelated news, there's a new IT8951 display: https://www.waveshare.com/6inch-hd-e-paper-hat.htm

joukos commented 4 years ago

Well, that's certainly a huge speedup, especially for a measly Zero. We've come quite far from the early video in the first comment in this issue, where a Zero W seems to struggle quite a bit to update the display (though it's probably bogged down by the browser too)...

joukos commented 4 years ago

In unrelated news, there's a new IT8951 display: https://www.waveshare.com/6inch-hd-e-paper-hat.htm

I ordered one a week ago ;)

jeLee6gi commented 4 years ago

I've been meaning to mess with the clock speed and other SPI related things because in my measurements it looked like most time was spent in the SPI library. I was going to follow this blog post which has lots of technical stuff about how to efficiently use SPI.

If I remember correctly, the only thing I tried back then was to remove the max_transfer_size and the loop that calls SPI.writebytes on each chunk and replace it with a big numpy array containing the frame and SPI.writebytes2 which resulted in ~60% fewer writes. It seemed to help but I didn't test it too much.

chi-lambda commented 4 years ago

After letting it run for about half a day, I wasn't able to restart it at higher frequencies. 8 MHz would work, but not 12 or more. Restarting the Pi seems to have fixed it. Just something to keep in mind.

joukos commented 4 years ago

I finally got an IT8951 display of my own (6" HD) and it works too. For anyone interested, a couple of poor quality pics until I have a chance to try it out some more:

sq2

fluxbox

(The Blake Stone window there is 640x400...)

joukos commented 4 years ago

Since the IT8951 support seems to be working pretty good and the boxes are ticked, I'll close this issue now. Thanks all!

chi-lambda commented 3 years ago

There should only be one application controlling the display at a time. What's this other driver you are running?

joukos commented 3 years ago

However, even sudo papertty --driver IT8951 scrub results in no change on the new display, but it seems to identify it

You may want to try something else besides the scrub (which is a bit of a flaky/useless feature with some bugs pending a fix - unfortunately it's also used in the examples so this isn't very obvious) to test the display operation. Might not be the issue here, but best to make sure at least - simply pipe some text to the stdin feature or perhaps try the image viewer function, both should be good to verify basic functionality.

Did you try VNC already, and did it report image updates to terminal log even if nothing was shown on the display?

Also, you should first make sure the display at least works with Waveshare example code for it, ie. it's not just some loose cable issue or a switch in the wrong position on the controller board, or something similar.

chi-lambda commented 3 years ago

Refresh can't be faster than about 300ms, plus the time it takes to transfer the data, which linearly depends on the number of pixels to refresh. You can try to increase the transfer speed in driver_it8951.py:183, but I've found that this can occasionally make the display stop working temporarily.

I'm not even sure you can use the USB interface under Linux. Definitely not with the drivers we have.

Greyscale (as used in X) is also significantly slower than black and white (as used in terminal mode), so much that I've given up on it.

You can find the C packing algorithm on my github. 🙂 But it only speeds up the packing, not the transfer to the display, which is its own bottleneck.