nexus511 / gpd-ubuntu-packages

This repository shall provide the base for building ubuntu packages from most of the patches currently used to get linux on the gpd-pocket.
GNU General Public License v3.0
73 stars 4 forks source link

spontaneous display corruption, fixed by standby/wakeup #30

Open sobukus opened 7 years ago

sobukus commented 7 years ago

It happens from time to time that the display enters a funky state, out of sudden during normal operation: The picture moves a bit up, the upper part reappearing at the bottom. The colors get shifted, too. I can continue working with keyboard and trackpoint (not touchscreen, of course), mentally adjusting for the shift. This is fixed by a standby/resume cycle.

So there is some corruption in the video driver that is overwritten when resuming (reinitialising) … or is this an issue with my hardware? Can anyone confirm?

This is using the xubuntu 17.04 image, kernel 4.13.0-2-generic, intel xorg driver 2:2.99.917+git20170309-0ubuntu1. I'll try just switching to the linux console next time … not sure if that should make a difference in times of KMS.

nexus511 commented 7 years ago

I have not see something like that.

As it is not clear, if this is a driver, a firmware or a hardware issue, we should just wait, if there is some input on that by someone else.

Does your device show any other display issues like the screen flickering that is caused by the WLAN connector touching the shielding case?

sobukus commented 7 years ago

The screen does not flicker. It switches into this different state. I think I remember it changing the color hue again later on. But generally, it is one sudden change and then it stays like that for an extended period of time. No flickering. It does strike me as a driver issue … had lots of them with intel graphics in past laptops. it looks like the framebuffer is not scanned from the beginning but a line in between. The color shift could be similar. RGB → GBR?

nexus511 commented 7 years ago

@sobukus Well yes, I understood that scanning is somewhat broken from your first post.

I just wanted to understand, if the issue might have been caused by some misconfiguration due to electric interference on the display connector.

As I can do for now is marking that as a bug as we are just using Hans kernel here.

sobukus commented 7 years ago

One thing that might influence this (not sure if it happened before those settings), I got tearfree enabled. That's not the default in your GPD images, or is it? (Hm, checking the timestamps, I guess this came as default setup from your image … could still play a role as this is not the default mode of the intel driver for some reason.)

Section "Device"
  Identifier  "Intel Graphics"
  Driver      "intel"
  Option      "AccelMethod"     "sna"
  Option      "TearFree"        "true"
  Option      "DRI"             "3"
EndSection

Without tearfree, display behaviour of intel chips is just too horrid.

psrb191921 commented 7 years ago

Same issues here, also with disapearing lines of text in gedit

stephen-hocking commented 7 years ago

Happens here too,although fliipng from one virtual screen to another in the MATE desktop environment fixes it.

On 14 Sep 2017 09:16, "psrb191921" notifications@github.com wrote:

Same issues here, also with disapearing lines of text in gedit

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nexus511/gpd-ubuntu-packages/issues/30#issuecomment-329323450, or mute the thread https://github.com/notifications/unsubscribe-auth/AM84bDX1knx-Ai-kHsZgap3C-5gMkcE2ks5siGI6gaJpZM4PUmaP .

nexus511 commented 6 years ago

Okay. I guess we have to wait and hope that it disappears with the ongoing kernel development.

I mark this a bug then.

sobukus commented 6 years ago

Just some confirmation about the nature of the issue. I cannot fix it by switching to a console and back. I do not clearly remember if switching Xfce virtual desktops would influence it, but doubt it. I tried to nail down the nature of the color shift using GIMP on a screenshot and during that, I had to suspend/resume multiple times to fix the issue. I wonder if this was coincidence or if the screen changes with GIMP are triggering it more easily than, say, web browsing.

Anyhow, I can confirm that the color change fits nicely a cyclic swap of RGB to BRG. So it might be explained by an offset in some buffer … not sure if I would notice an off-by one error with some shifted pixels. I guess one should. Will watch more closely next time.

sobukus commented 6 years ago

I whitnessed a new variant of this with kernel 4.14 now: The picture moved a little bit up, wrapping the xfcce panel from the top to the bottom, and after some time actually moved back to normal. The colors were not changed this time. So the effect might be different with the new kernel (DRM module), but it is still there. Could be in the intel video driver after all. Should one try the modesetting driver instead? I fear that it will give me tearing again.

Herst commented 6 years ago

BTW, the display is VRBG (vertical and the more rare subpixel configuration).

sobukus commented 6 years ago

I remember one incident with the 4.14 kernel. A small shift this time, no color change.

sobukus commented 6 years ago

I tried my luck with a newer kernel: 4.16.0-rc5+ (git ae718bc from Hans).

This issue is still a major showstopper with that one. I tried using the GPD for two days straight and finally chickened out of using it for a presentation (even after buying the micro HDMI adapter, having realized that it is a pipe dream to get working HDMI/DP/VGA out of a USB C hub) … because, after using it for a certain time (an hour or so?), this corruption creeps up increasingly.

I can temporarily fix it by changing the resolution via xrandr tp 1024x768 and back. This fix is only for a short time, though. Display corruptions creep up soon, with and without color shift.

I also observed something that reminds me of corruption issues I had on a Toshiba laptop with Intel video in the chipset about 10 years ago: Certain glyphs in a font get replaced with a different one. Fast-forward many years, and stability of the Intel video support sucks even more! Does Intel simply not care for proper support of their hardware? The people working on the display drivers seem friendly enough, but I suppose they are just too few / not given proper time.

Of course, I am also angry at GPD for not ensuring stable support for their device when boasting it as Linux laptop. And it's not like they are making it easier with their design decisions.

Anyhow, this comment shall just emphasize that kernel 4.16 apparently does not improve this issue. It enabled me to successfully do a suspend to disk cycle, but I got troubles after that. Display crashing. I2C errors streaming on the console …

I wonder: Is there anyone really using the device under Linux and not experiencing these issues? I had some play with BIOS settings (RAM clock, for example, DP over USB on/off, before giving up on that one) … but they did not seem to influence the un-usableness of the device. I remember presenting a slideshow with the thing for about 2 hours. I guess I was lucky that it did not freak out during this time. Note that some activity seems to be needed to trigger this. Usually I quickly got the corruption when compiling a LaTeX document and then switching to the PDF viewer, perhaps scrolling a bit.

stockmind commented 6 years ago

I use linux daily without any graphic issue, could you try one of my respins and report back? I never had feedback about an issue like this.

https://github.com/stockmind/gpd-pocket-ubuntu-respin

This doesn't happen on Windows?

sobukus commented 6 years ago

@stockmind I suppose your respins do not use different drivers/xorg versions. My wild guess is that I am one of those people irritated by video tearing (a lot!!!) and have this:

$ cat /etc/X11/xorg.conf.d/20-intel.conf 
Section "Device"
  Identifier  "Intel Graphics"
  Driver      "intel"
  Option      "AccelMethod"     "sna"
  Option      "TearFree"        "true"
  Option      "DRI"             "3"

Probably TearFree and/or SNA is too buggy. I will disable these settings and try if I can trigger the corruption then. Maybe you could try enabling them to reproduce? Loading a PDF in evince and some switching between windows has a good chance, but it could take an hour if you're lucky.

For me, I see differing variants of corruption prior to the eventual vertical shift and color shift. Maybe the TearFree option is simply triggering bad buffer writes. If we can narrow it down to that option.

PS: Just realized the USB A ports on my USB C hub don't work for real devices come and go. Damn, what is not broken? I guess I should get me a normal USB hub and consider USB C a charging port only. Sorry, OT here … opening another ticket titled "USB C is crap"?

sobukus commented 6 years ago

Some detail out of Xorg log … after I noticed that I actually have heavy tearing durign scrolling right now.

[    32.742] (**) intel(0): TearFree enabled
[    32.743] (==) intel(0): Using gamma correction (1.0, 1.0, 1.0)
[    32.743] (==) intel(0): DPI set to (96, 96)
[    32.753] (II) intel(0): SNA initialized with Cherryview (gen8) backend
...
[  8651.543] (II) intel(0): resizing framebuffer to 1920x1200
[  8651.563] (II) intel(0): switch to mode 1200x1920@60.4 on DSI1 using pipe 1, position (0, 0), rotation right, reflection none
[  8652.229] (EE) intel(0): Page flipping failed, disabling TearFree

So it's even being disabled by the driver at some point. Fun. I wonder why page flipping should fail.

sobukus commented 6 years ago

I tried UXA instead of SNA, briefly, but that is unusable. I can watch the lines being drawn for screen updates. No point in further testing.

Now I removed the intel config and this switches to the modeset driver. Is this what you all are using? This may work, but it has HORRIBLE tearing. Just open the Xfce dropdown menu and move the mouse to highlight the entries in turn. This is not funny. I will have to wait if the corruption appears here, too. But I guess it is rooted in the intel driver, though possibly it could be hidden in the kernel part of that, too.

I guess I could learn to live with the tearing if the video is at least stable, but I will not be happy. Is it too much to ask to get a smooth desktop experience comparable to a Windows 95 PC with a 32 bit PCI video card? Heck, I could try to spin up a K6-III box with some old Linux on it to prove the point, even.

stockmind commented 6 years ago

@stockmind I suppose your respins do not use different drivers/xorg versions.

Sure, but Nexus packages doesn't seems to provide those lines in environment file (correct me if i'm wrong, i've just had a look to the packages):

COGL_ATLAS_DEFAULT_BLIT_MODE=framebuffer
LIBGL_DRI3_DISABLE=1

Easy way to add them:

echo "COGL_ATLAS_DEFAULT_BLIT_MODE=framebuffer" >> /etc/environment
echo "LIBGL_DRI3_DISABLE=1" >> /etc/environment

Those seem to help prevent graphical glitches and distortions. On my environment without those lines, i had sometimes missing characters after wake, but other users reported that those are useful also for other apps glitches, i had those lines on my configurations from almost the start of the project and never had a report about the issue you are reporting.

My wild guess is that I am one of those people irritated by video tearing (a lot!!!)

My wild guess is that your device is faulty or with messed up configurations. I don't like video tearing too and that is the reason why I've chosen to use the intel driver, like suggested anywhere and probably used by almost all the GPD Pocket users, modesetting driver is just not ready for native-rotated displays.

I repost my question: This behavior happen on Windows too? Never? If i'm not wroing, I remember a post on Reddit about something like this on Windows and it was fixed sending unit to warranty. I've also seen behavior like this on lot of smartphones broken lcd's (image that shift to the top and reappear at the bottom, colors shifted, etc...).

If this problem were so common you can be sure it would be full of tickets everywhere, but is not. Lot of peoples enjoy linux on this device without tearing and graphical glitches, all of them are ignoring this not-so-tiny problem and are almost all quiet about it?

If the problem is on system configuration, install from scratch a pre-made iso that you know is working, and see what happen, if problem persist then the problem should be somewhere else. If the problem is on hardware, ask for a replacement and you are good to go.

No need to get upset and send a roundup of little thought-out posts that doesn't provide any evidence that the problem is on Linux (Or GPD Pocket) and blame all the users that are not fussy enough to see all the problems you are facing.

The support wasn't great by GPD (Company) from the start, but the community effort took the support of this device to a good 90% of pain free Linux experience, there are still little things that needs a patch like Video on USB-C, or audio on HDMI (it seems that it just need updated alsa-lib), but everything else works, and works pretty good.

The device itself also have the limited support to power up external devices like almost any other tiny ARM Windows tablet, not so much to be upset to, it's just an understandable problem given by the nature of the device.

Let me know how your experiments goes, and please try a clean install of a working system and a full week Windows session to exclude (almost) any hardware issue (there could be hardware issues that maybe arises only on Linux, due to different handling of drivers but this gets pretty specific and ...random/not reproducible?)

sobukus commented 6 years ago

Sorry if I got too carried away with my frustration. My recent flood of comments on the GPD issues simply results from the first time I actually tried to rely on the device for a few days. I wanted to give all data I got during that period, before switching back to my main device again. My remarks towards GPD were also fueled by comments in kernel patches by Hans where he noted certain design choices by GPD making things hard unnecessarily. My remark about the world maybe not caring about tearing as much as I do also stems from some frustration that this is an issue at all nowadays, as it used to be a solved problem for video playback, and even then, it seems to have spread into the desktop more than ever before. There are long-lived bug reports about tearing for various drivers … and then decisions like ‘switch vsync off for fullscreen mode for better performance’. People do seem to value FPS more.

I'm trying to calm down.

I cannot say anything about running Windows on this machine. I'm a crowdfunding backer and got the device last year with Linux (at least I never ran it with Windows, not sure right now if there was an image lurking on the disk initially). I don't have Windows at hand and don't intend to change that. Also please note that there are two people in this issue thread who confirmed the issue. Maybe @psrb191921 and @stephen-hocking can comment if they have seen this recently?

Also I am not intending to run Ubuntu once I really switch to using the GPD. Therefore, I am thankful for detailed information like what you provided with the hint to you disabling DRI3 and this cogl setting. Judging from the description of the latter, I presume it only deals with corruption in certain applications and would not cause this kind of full-screen corruption with shifted colors, even. But then, as evince is good at triggering the issue, it may be related. May I ask why you have LIBGL_DRI3_DISABLE=1 globally instead of just

Section "Device"
  Identifier  "Intel Graphics"
  Driver      "intel"
  Option      "AccelMethod"     "sna"
  Option      "TearFree"        "true"
  Option      "DRI"             "2"
EndSection

Is that not enough?

It may be possible that there is a hardware fault. But it feels just like another intel driver issue.

I am so far unable to reproduce this issue using the modesetting driver. I don't observe any of the glitches that serve as hint before the big shift. I will test different configurations like disabling DRI3 with the intel driver before wiping the system here. I want to know what the fix is.

sobukus commented 6 years ago

@stockmind I now made a range of tests after half-automating a procedure to trigger the issue. Basically, I load a PDF file (http://sobukus.de/gpd/displayglitch/slides.pdf works well) in evince and repeatedly scroll up and down through the pages. I do it with this basic automation:

while true
do
  xdotool key --repeat 50 --repeat-delay 20  Page_Up
  sleep 1
  xdotool key --repeat 50 --repeat-delay 20  Page_Down
  sleep 1
done

See http://sobukus.de/gpd/displayglitch/ with the README.txt in there and also screenshots. I tested with and without your environment settings, differing setup of the intel driver. I also tested on the live system of one of your respins, namely gpdpocket-20180306-4.16.0-rc3-ubuntu-17.10.1-unity-desktop-amd64.iso .

Net result: Any instance of the intel driver is susceptible and will yield to the glitch most likely in less than 10 minutes. The modesetting driver seems to be immune.

I have trouble believing that the hardware is faulty, rather some subtle difference is more or less easily triggering the bug in the intel driver. It messes up something talking to the DRM part in the kernel, so that the display stays distorted also in the boot splash screen and on the linux console. The modesetting driver avoids that, but it is no good choice as long as there is no cure for the tearing.

This is unfortunate, as the intel driver seems pretty much abandoned in favour of the modesetting driver. So I am not sure there will be energy spent on fixing this issue. But anyway, could you, @stockmind , @stephen-hocking, @psrb191921 try my test for an hour or so to see if you can trigger the issue and also how long it takes?

stephen-hocking commented 6 years ago

Left it running for a few hows in an icewm session, and experienced no corruption. On the other hand, this 4.16.0-rc3-custom-matlala-01-03-2018 kernel and 20170628 BIOS don't seem to ket the pocketbook charge up. I've had it connected to power for a while, and the battery level keeps on dropping. Am going to confirm that Windows behaves better in this regard.

On 29 March 2018 at 00:12, sobukus notifications@github.com wrote:

@stockmind https://github.com/stockmind I now made a range of tests after half-automating a procedure to trigger the issue. Basically, I load a PDF file (http://sobukus.de/gpd/displayglitch/slides.pdf works well) in evince and repeatedly scroll up and down through the pages. I do it with this basic automation:

while true do xdotool key --repeat 50 --repeat-delay 20 Page_Up sleep 1 xdotool key --repeat 50 --repeat-delay 20 Page_Down sleep 1 done

See http://sobukus.de/gpd/displayglitch/ with the README.txt in there and also screenshots. I tested with and without your environment settings, differing setup of the intel driver. I also tested on the live system of one of your respins, namely gpdpocket-20180306-4.16.0-rc3-ubuntu-17.10.1-unity-desktop-amd64.iso .

Net result: Any instance of the intel driver is susceptible and will yield to the glitch most likely in less than 10 minutes. The modesetting driver seems to be immune.

I have trouble believing that the hardware is faulty, rather some subtle difference is more or less easily triggering the bug in the intel driver. It messes up something talking to the DRM part in the kernel, so that the display stays distorted also in the boot splash screen and on the linux console. The modesetting driver avoids that, but it is no good choice as long as there is no cure for the tearing.

This is unfortunate, as the intel driver seems pretty much abandoned in favour of the modesetting driver. So I am not sure there will be energy spent on fixing this issue. But anyway, could you, @stockmind https://github.com/stockmind , @stephen-hocking https://github.com/stephen-hocking, @psrb191921 https://github.com/psrb191921 try my test for an hour or so to see if you can trigger the issue and also how long it takes?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nexus511/gpd-ubuntu-packages/issues/30#issuecomment-376881516, or mute the thread https://github.com/notifications/unsubscribe-auth/AM84bDSAuGMdNAxwuqgjt0W75TAgi2m9ks5ti4xagaJpZM4PUmaP .

--

"I and the public know what all schoolchildren learn Those to whom evil is done Do evil in return" W.H. Auden, "September 1, 1939"

sobukus commented 6 years ago

Thanks for testing. To make sure: You had the intel driver configured in /etc/X11/xorg.conf.d ? And a bare icewm without compositor, right? I'll check if turning off the compositor makes a difference here.

It's becoming more crucial now for me to find someone who can reproduce this … at least intel gfx folks did not suspect hardware failure but pointed me towards a possible intel drm driver bug.

sobukus commented 6 years ago

There's a bug report against the intel DRM now: https://bugs.freedesktop.org/show_bug.cgi?id=105834

stephen-hocking commented 6 years ago

Yup, I had the intel driver configured thusly.

Section "Device" Identifier "Intel Graphics" Driver "intel" Option "AccelMethod" "sna" Option "TearFree" "true" Option "DRI" "3" EndSection

On 31 March 2018 at 23:43, sobukus notifications@github.com wrote:

Thanks for testing. To make sure: You had the intel driver configured in /etc/X11/xorg.conf.d ? And a bare icewm without compositor, right? I'll check if turning off the compositor makes a difference here.

It's becoming more crucial now for me to find someone who can reproduce this … at least intel gfx folks did not suspect hardware failure but pointed me towards a possible intel drm driver bug.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nexus511/gpd-ubuntu-packages/issues/30#issuecomment-377690500, or mute the thread https://github.com/notifications/unsubscribe-auth/AM84bBNzwaWbwN7TmatR92L1VPI7UMUTks5tj3nigaJpZM4PUmaP .

--

"I and the public know what all schoolchildren learn Those to whom evil is done Do evil in return" W.H. Auden, "September 1, 1939"

sobukus commented 6 years ago

Hm, so how did you experience this before? Did you change the BIOS (or UEFI firmware …) in the meantime? Also, I figured that it's actually easier to trigger the issue with

mplayer -loop 0 http://sobukus.de/gpd/displayglitch/kenjo_vidtest_60fps.mp4

Things happily jump between different states (including jumping back to normal) when playing this. Maybe any video would do, even. But then, maybe not.

Your charging issue seems more serious. I also got the 20170628 BIOS. Charging works. The recent kernel fan driver annoys me with fan noise when charging, though. I wonder if that is really necessary, especially when I'm charging from a powerbank and the fan (able to work with 2W or so) wastes energy in additon to being annoying. I'd have bought a somewhat weaker GPD without fan to begin with …

stephen-hocking commented 6 years ago

I've reverted back to a 4.14 kernel, and that seems to be charging just fine. I'll try your mplayer thing, but it's getting on in time here in Oz, so that'll wait until tomorrow.

On 1 April 2018 at 09:49, Thomas Orgis notifications@github.com wrote:

Hm, so how did you experience this before? Did you change the BIOS (or UEFI firmware …) in the meantime? Also, I figured that it's actually easier to trigger the issue with

mplayer -loop 0 http://sobukus.de/gpd/displayglitch/kenjo_vidtest_60fps.mp4

Things happily jump between different states (including jumping back to normal) when playing this. Maybe any video would do, even. But then, maybe not.

Your charging issue seems more serious. I also got the 20170628 BIOS. Charging works. The recent kernel fan driver annoys me with fan noise when charging, though. I wonder if that is really necessary, especially when I'm charging from a powerbank and the fan (able to work with 2W or so) wastes energy in additon to being annoying. I'd have bought a somewhat weaker GPD without fan to begin with …

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nexus511/gpd-ubuntu-packages/issues/30#issuecomment-377731641, or mute the thread https://github.com/notifications/unsubscribe-auth/AM84bAipBUUVw7g9PIdS_poUoBnl_YESks5tkBXvgaJpZM4PUmaP .

--

"I and the public know what all schoolchildren learn Those to whom evil is done Do evil in return" W.H. Auden, "September 1, 1939"

stockmind commented 6 years ago

@stephen-hocking There was a regression in kernel about charging, it has been fixed around 06/03/2018 by Hans - https://www.reddit.com/r/GPDPocket/comments/820i3e/abnormally_long_charging_times_on_linux/dv9cp48/

This kernel build should be fine: https://bitbucket.org/simone_nunzi/gpdpocket-kernel/downloads/gpdpocket-20180306-4.16.0-rc3-kernel-files.zip

I still hadn't time to test the screen corruption, will post my results asap.

stephen-hocking commented 6 years ago

Have rebuilt the kernel, as per instructions at https://github.com/petrmatula190/gpd-pocket-kernel, installed it & it seems to be charging just fine.

On 1 April 2018 at 20:28, stockmind notifications@github.com wrote:

@stephen-hocking https://github.com/stephen-hocking There was a regression in kernel about charging, it has been fixed around 06/03/2018 by Hans - https://www.reddit.com/r/GPDPocket/comments/820i3e/ abnormally_long_charging_times_on_linux/dv9cp48/

This kernel build should be fine: https://bitbucket.org/simone_ nunzi/gpdpocket-kernel/downloads/gpdpocket-20180306- 4.16.0-rc3-kernel-files.zip

I still hadn't time to test the screen corruption, will post my results asap.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nexus511/gpd-ubuntu-packages/issues/30#issuecomment-377777330, or mute the thread https://github.com/notifications/unsubscribe-auth/AM84bLGPe_LP1xsAV8d2V0kBNkeG5azTks5tkKvGgaJpZM4PUmaP .

--

"I and the public know what all schoolchildren learn Those to whom evil is done Do evil in return" W.H. Auden, "September 1, 1939"