llandsmeer / inkvt

Experimental VT100 terminal emulator for Kobo e-readers
GNU General Public License v3.0
39 stars 5 forks source link

Semi freezes after a period of time #5

Closed rien333 closed 4 years ago

rien333 commented 4 years ago

I run inkvt over ssh. (for some reason, I can't kfmon to get inkvt show up in nikel, though the kfmon logs do show that inkvt is being registered, without errors). Though inkvt runs fine for a while, after a seemingly random — but short — period of time, inkvt freezes the screen. In addition to freezing the screen, I can't acces the underlying application anymore (e.g. if I would swipe while inkvt is running, the Plato reader would normally update the screen and show the next page). There are no errors shown in my ssh terminal, and I can still start new ssh sessions on the Kobo after the occurrence of the freeze. Oddly, if inkvt is in its frozen state, I can still send keypresses over http and ssh, but it takes a while for them to arrive.

To my knowledge, I followed your instructions carefully. Do you have any idea of what might be going wrong, and where/how to look for errors?

I run inkvt as as follows:

$ ssh $KOBO_ADDRESS  # login to KOBO
$ /mnt/onboard/.adds/inkvt
$ ./inkvt.armhf # the same happens if I run ./inkvt.sh
[FBInk] Detected a Kobo Clara HD (376 => Nova @ Mark 7)
[FBInk] Enabled Kobo Mark 7 quirks
[FBInk] Clock tick frequency appears to be 100 Hz
[FBInk] Screen density set to 300 dpi
[FBInk] Variable fb info: 1072x1448, 32bpp @ rotation: 3 (Counter Clockwise, 270°)
[FBInk] Fontsize set to 24x48 (Terminus base glyph size: 8x16)
[FBInk] Line length: 44 cols, Page size: 30 rows
[FBInk] Vertical fit isn't perfect, shifting rows down by 4 pixels
[FBInk] Fixed fb info: ID is "mxc_epdc_fb", length of fb mem: 6782976 bytes & line length: 4352 bytes
[FBInk] Pen colors set to #000000 for the foreground and #FFFFFF for the background

Generally, there are no errors upon the moment it freezes. This one time, inkvt did tell me the following (though this could be completely unrelated):

[FBInk] MXCFB_SEND_UPDATE_V2: Invalid argument!
[FBInk] update_region={top=340, left=0, width=1072, height=1104}!
[FBInk] Failed to refresh the screen!

System info

I used the same compiler as you suggested to compile inkvt. I have a KOBO Clara HD, with the latest 2019 firmware (I haven't checked for firmware since 2020).

llandsmeer commented 4 years ago

Hi rien333,

Thanks for the detailed issue!

I do not have the time now to dive into this directly, but maybe the problem is interference between Plato and inkvt? When you start inkvt.sh in Nickel, inkvt kills Nickel and takes control over the framebuffer. Plato, instead, will still be alive in the background. It might be that plato updates the framebuffer state while inkvt is running. However, that has never been a problem for me with KOReader. I'll try to reproduce the bug when I'm back at home.

Maybe you have found out that keyboard over HTTP is a bit unstable too in general, I'm still trying to figure out a better way to fix this (maybe more into the direction of using the Kobo as VNC screen)

For this:

MXCFB_SEND_UPDATE_V2: Invalid argument!

Maybe @NiLuJe known that that means? Update region seems to be within the screen resolution. Sorry to chip you in if you don't want to be fixing problems here :)

Kind regards, Lennart

rien333 commented 4 years ago

maybe the problem is interference between Plato and inkvt

Unfortunately, the same thing happens with Nikel.

When you start inkvt.sh in Nickel, inkvt kills Nickel and takes control over the framebuffer ... It might be that plato updates the framebuffer state while inkvt is running.

Strangely, Nickel seems to freeze faster (though this could be purely subjective). I'll experiment with ensuring that inkvt runs with full control over the framebuffer. So far, this has failed, probably because I can't get inkvt to properly work with fmon (it doesn't show up in my books, for some reason). I'll try and sort that out first.

Maybe you have found out that keyboard over HTTP is a bit unstable too in general, I'm still trying to figure out a better way to fix this

ssh seems pretty stable and fairly straightforward, actually. (though it has some obvious drawbacks) It even forwards key combos correctly.

NiLuJe commented 4 years ago

Plato (EDIT: and Nickel, duh.) can definitely do hardware rotations, something which will definitely break inkvt right now (there'd need to be a checked fbink_reinit() + state refresh + whatever else might be needed to resync the new state w/ libvterm's state) at key places to deal with it; but dealing with it could arguably be construed as out of scope ^^).

That would explain that one specific error log (and the subsequent lack of inkvt refreshes ;)).

(FWIW, for legacy reasons, KOReader doesn't do hardware rotation, on the other hand. It'll setup the fb at startup and exit, but that's it).


I haven't actually tried the KFMon script, but it looked sane, I'll double-check ;).


Sidebar on compilers: https://github.com/koreader/koxtoolchain (TL;DR: kobo if you want a non-sucky GCC version, at the expense of needing to link against the STL statically; nickel if you want a sucky GCC version with which you'll be able to link against the STL dynamically).

Ubuntu TCs might work in theory, but target a far too recent glibc, which will likely lead to stuff randomly breaking in fun and interesting ways at load or runtime.

The official TC binaries provided by Kobo are essentially an old binary build of what koxtoolchain's nickel target will do, but will obviously do the job, if you can get them working on your system (never tried).

NiLuJe commented 4 years ago

As usual, I'd check what dmesg & htop have to say about whatever's happening next time you can replicate this.

(It's conceivable that a broken MXCFB request could softlock the device. I should be dropping all the known offenders inside FBInk, but, who knows. dmesg should be helpful if that's actually the case).

rien333 commented 4 years ago

As usual, I'd check what dmesg & htop have to say about whatever's happening next time you can replicate this.

Good one! Totally forgot that dmesg is available on the Kobo. As far as htop goes, ps does show inkvt[.sh] as still running after the freeze. I could also kill it, but the screen's contents remained the same.

NiLuJe commented 4 years ago

I was mainly wondering about something stuck in a busy-loop, since you mentioned some stuff behaving slower ;).

llandsmeer commented 4 years ago

@rien333 How fast is fast? :)

I've been typing things in invt and frantically rotating my Libra H2O for quite a while now, over SSH in KOReader, but everything is still working fine.. (it did hang shortly after issueing a dmesg, but it started working after a few seconds again)..

Could you still maybe send a dmesg | tail -n 20 output? A photo would be fine too. And attach you inkvt.armhf binary, so I can test it on my device. Maybe the build environment is doing something strange (?)

Firmware Mark 7

Looks good, that what I use too during development

ssh seems pretty stable and fairly straightforward, actually. (though it has some obvious drawbacks) It even forwards key combos correctly.

Oh I thought you were using my Keyboard over HTTP hack :) Yeah ssh works quite straightforward :)

[..] In addition to freezing the screen, I can't acces the underlying application anymore (e.g. if I would swipe while inkvt is running, the Plato reader would normally update the screen and show the next page).

That doesn't sound like only inkvt is hanging.. (its not hooking into evdev, unless you did edit the main.cpp file to do that).

Oddly, if inkvt is in its frozen state, I can still send keypresses over http and ssh, but it takes a while for them to arrive.

So the display still updates? Very strange..

NiLuJe commented 4 years ago

@llandsmeer: KOReader doesn't do hardware rotation, as such, it won't trash the state behind inkvt's back like Nickel or Plato can ;).

I alluded a bit to it in my original answer, but FBInk has a mechanism to deal with that:

fbink_reinit, which basically does an ioctl to see if the fb state changed. If it did in a significant way (depth/rotate), it updates its internal state, and (since fairly recently) returns a specific value to the caller to mention that fact. (Otherwise, it does nothing more than that ioctl & an early successful return).

In inkvt's case, in such instances, you'd also have to reissue an fbink_state_dump to update inkvt's own copy of that, and then make libvterm aware of the new layout.

That said, whether you want to deal with that or not to begin with is debatable: in "normal" conditions, killing Nickel ensures something like that won't happen ;).

rien333 commented 4 years ago

@rien333 How fast is fast? :)

Think in units of 10 seconds — no freezes are immediate, and I have used inkvt without problems for what felt like more than a minute, almost 2 minutes.

Could you still maybe send a dmesg | tail -n 20 output? A photo would be fine too. And attach you inkvt.armhf binary, so I can test it on my device. Maybe the build environment is doing something strange (?)

I was planning on reinstalling and recompiling everything in accordance with the recently updated instructions. Do you think you'll still find logs and binaries from my failed installation useful?

rien333: Oddly, if inkvt is in its frozen state, I can still send keypresses over http and ssh, but it takes a while for them to arrive. llandsmeer: So the display still updates? Very strange..

No, sorry for creating confusion. The kobo's display itself remains frozen, but if I were to host a tmux session on my PC, and then connect the kobo to this tmux session through ssh, that ssh session will still revieve keypresses send through inkvt, albeit with a significant delay and some other glitchiness. (I found this out, because the tmux session hosted on my PC suddenly started to move a bit after me trying to send keypresses during a freeze) (doesn't matter though, just a random observation)

I'll post some logs/new results tomorrow!

NiLuJe commented 4 years ago

I was planning on reinstalling and recompiling everything in accordance with the recently updated instructions. Do you think you'll still find logs and binaries from my failed installation useful?

If it's no bother, it certainly can't hurt ;).

rien333 commented 4 years ago

I ran htop and dmesg on my failed installation (basically, the one I described in my opening post).

This is what dmesg outputs after running inkvt.sh (while plato is running, and with messages from RTL871X stripped out):

PMU:STATUS= 6: IBAT= -2: VSYS= 4337500: VBAT= 4105250: DSOC= 10000: RSOC= 9800: cc_delta=72: rrf= 1
tps6518x_get_temperature():temperature = 25
# this is probaly where I started inkvt
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
PMU: ricoh61x_displayed_work Full-Clear CC, PSWR(100)
PMU:STATUS= 6: IBAT= -2: VSYS= 4331500: VBAT= 4101000: DSOC= 10000: RSOC= 9800: cc_delta=72: rrf= 1
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
# the message above is repeated a ton of times
...
imx_epdc_v2_fb 20f4000.epdc: Ignoring collision withnewer update.
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D [∞x]
PMU: ricoh61x_displayed_work Full-Clear CC, PSWR(100)
PMU:STATUS= 6: IBAT= -2: VSYS= 4333250: VBAT= 4100700: DSOC= 10000: RSOC= 9800: cc_delta=72: rrf= 1
...
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
imx_epdc_v2_fb 20f4000.epdc: TCE underrun! Will continue to update panel
imx_epdc_v2_fb 20f4000.epdc: TCE underrun! Will continue to update panel
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
imx_epdc_v2_fb 20f4000.epdc: Ignoring collision withnewer update.
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
imx_epdc_v2_fb 20f4000.epdc: Ignoring collision withnewer update.
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
...
# maybe this is where I rebooted?
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
PMU: ricoh61x_displayed_work Full-Clear CC, PSWR(100)
PMU:STATUS= 6: IBAT= -2: VSYS= 4335500: VBAT= 4100400: DSOC= 10000: RSOC= 9800: cc_delta=72: rrf= 1
imx_epdc_v2_fb 20f4000.epdc: collision detected, can not do REAGl/-D
imx_epdc_v2_fb 20f4000.epdc: Timed out waiting for update completion
imx_epdc_v2_fb 20f4000.epdc: Timed out waiting for update completion
imx_epdc_v2_fb 20f4000.epdc: Timed out waiting for update completion
imx_epdc_v2_fb 20f4000.epdc: Timed out waiting for update completion
imx_epdc_v2_fb 20f4000.epdc: Timed out waiting for update completion
PMU: ricoh61x_displayed_work Full-Clear CC, PSWR(100)
PMU:STATUS= 6: IBAT= -2: VSYS= 4328500: VBAT= 4094600: DSOC= 10000: RSOC= 9800: cc_delta=72: rrf= 1
imx_epdc_v2_fb 20f4000.epdc: Timed out waiting for update completion
imx_epdc_v2_fb 20f4000.epdc: Timed out waiting for update completion

This seems to be largely with you guys predicted, inkvt has trouble drawing while plato is also using the screen (btw: I used this hacky Kobo terminal before, and I was able to run it alongside plato). Practically all the messages consisted of the "collision detected" message, though I also included the other messages I was able to find. htop wasn't too interesting either. CPU and memory usage were low both before and after inkvt froze (~1% iirc)

Gonna try the new and improved installation instructions now!

NiLuJe commented 4 years ago

The collisions are somewhat to be expected given the patterns of ioctl generated by InkVT, so those don't worry me too much.

The timeouts, on the other hand, is where things start to get interesting. What's interesting is that InkVT itself never waits on update completion, which tells me that those are actually Plato's refreshes going wonky.

I'm not quite sure how that could come to be, unless something really upset the EPDC, leading to one of the softlocks I mentioned earlier. (In which case, reboot to recover. Or an mxcfb uninit/init might work, but I've never tried). In which case, I'd be very interested in more details, ideally with an exact trace of the offending ioctl (i.e., wrapping inkvt in a strace -fitv -e trace=ioctl).

llandsmeer commented 4 years ago

@llandsmeer: KOReader doesn't do hardware rotation, as such, it won't trash the state behind inkvt's back like Nickel or Plato can ;).

Yeah I tried getting inkvt to launch with Nickel or Plato running yesterday as I thought that would make the difference, but I even couldn't get SSH access working in a day.. (I just always use KOReader's dropbear version)

I alluded a bit to it in my original answer, but FBInk has a mechanism to deal with that:

fbink_reinit, which basically does an ioctl to see if the fb state changed. If it did in a significant way (depth/rotate), it updates its internal state, and (since fairly recently) returns a specific value to the caller to mention that fact. (Otherwise, it does nothing more than that ioctl & an early successful return).

In inkvt's case, in such instances, you'd also have to reissue an fbink_state_dump to update inkvt's own copy of that, and then make libvterm aware of the new layout.

That said, whether you want to deal with that or not to begin with is debatable: in "normal" conditions, killing Nickel ensures something like that won't happen ;).

Thank you very much (again, and also for the pull requests! :smile:). A single extra ioctl() per draw request doesn't sound that bad. I think I'll hide it behind a env variable/argv for inkvt.sh (for people launcing it in the right :tm: way).

llandsmeer commented 4 years ago

The timeouts, on the other hand, is where things start to get interesting. What's interesting is that InkVT itself never waits on update completion, which tells me that those are actually Plato's refreshes going wonky.

Maybe this is the problem for Plato (eg. it doesnt' expect to be running in 8bit mode)?

echo "Restoring original fb bitdepth @ ${ORIG_FB_BPP}bpp & rotation @ ${ORIG_FB_ROTA}" >>crash.log 2>&1
./fbdepth -d "${ORIG_FB_BPP}" -r "${ORIG_FB_ROTA}" >>crash.log 2>&1
NiLuJe commented 4 years ago

Oh, if @rien333 launched inkvt via the script, definitely ^^.

(I'd naively assumed he was running the binary directly by hand... :D).

llandsmeer commented 4 years ago

This is what dmesg outputs after running inkvt.sh (while plato is running, and with messages from RTL871X stripped out):

I guess so... maybe inkvt.sh should include a error message if it finds plato running

rien333 commented 4 years ago

Oh, if @rien333 launched inkvt via the script, definitely ^^. (I'd naively assumed he was running the binary directly by hand... :D).

Basically, I did something like this: from ssh, I ran /mnt/onboard/.adds/inkvt/inkvt.sh (with plato active, not 100% about the path, but I think you get what I mean)

I guess so... maybe inkvt.sh should include a error message if it finds plato running

Maybe. Though hopefully most users will just launch inkvt normally.

NiLuJe commented 4 years ago

Okay, then, yeah, that'll definitely break Plato ;).

llandsmeer commented 4 years ago

Could reproduce a freeze at my kobo by issueing a fbdepth -r 1 inside inkvt

[...]
[FBInk] MXCFB_SEND_UPDATE_V2: Invalid argument!
[FBInk] update_region={top=8, left=0, width=1264, height=1664}!
[FBInk] Failed to refresh the screen!
[...]

But thats after the fbink_reinit() patch... still working on it :)

llandsmeer commented 4 years ago

Ah that was a deployment problem (new inkvt.armhf binary wasn't overwritten..). With 1cd991c, inkvt handles rotations kind of ok (only inversions which do not change WxH screen size)

Now I'll have to figure out how to fix the deployment

rien333 commented 4 years ago

Okay, I got everything to work, including kfmon integration. Maybe my mistake last time was not doing the import from nickel. CPU usage seems to be a lot higher compared to my last attempt (htop shows inkvt.armhf at 23%, as opposed to ~1%).

I've seen one freeze very similar to the one described in this thread, but now everything has been running smoothly for at least a few minutes.

Thanks again for this cool project. I hope it will see other interesting improvements!

I guess all my problems outlined in this thread have been resolved by the new and improved instructions (I like the auto-generated zip, very easy to install!). If I have any new, concrete problems I'll open a new issue.

llandsmeer commented 4 years ago

Those last two commits should make inkvt handle arbitrary screen rotations when started from SSH, but keep functioning as before when launched from Nickel with kfmon (maybe that should fix the increased CPU usage, but I'm not sure) I also think that constantly refreshing htop type payload will cause higher CPU usage regardless..

I guess all my problems outlined in this thread have been resolved by the new and improved instructions (I like the auto-generated zip, very easy to install!). If I have any new, concrete problems I'll open a new issue.

Yes I'm very happy with that too (thanks @NiLuJe :smile:). I think this thread brought some nice improvements to inkvt, thanks for showing you interest in this project :)

NiLuJe commented 4 years ago

The actual userland logic in fbink_reinit is pretty minimal (basically two integer comparisons), so the main bottleneck will be the ioctl itself.

Not quite sure there's any better 'spot' to put it in inkvt, though ;).