raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.14k stars 4.99k forks source link

Garbled screen when running an OpenGLESv2 app on X11 and fkms without compositor with 5.4 kernel #3665

Open imbens opened 4 years ago

imbens commented 4 years ago

Describe the bug This has been discussed in https://github.com/raspberrypi/firmware/issues/1382 in a follow up to an unrelated camera issue. I now have more precise steps to reproduce the problem. It happens only when using kernel 5.4 (it does happen with the latest 32bit image when you rpi-update to the 5.4 kernel) and fkms and X11 without lightdm (or with lightdm but with the compositor disabled) with a fullscreen Xwindow and with vsync enabled (via eglSwapInterval (display, 1)).

To reproduce Start from the 64bit image from https://downloads.raspberrypi.org/raspios_arm64/images/raspios_arm64-2020-05-28/2020-05-27-raspios-buster-arm64.zip Boot the image, it will boot to the desktop.

In a terminal or via ssh enter (this fixes the raspberrypi-dev package):

sudo apt update
sudo apt -q -y full-upgrade

At this point you should probably reboot. In a terminal or via ssh enter:

sudo apt -q -y install libegl1-mesa-dev libgles2-mesa-dev libx11-dev xterm
git clone git@github.com:imbens/opengltest
cd opengltest
gcc opengltest.c -lGLESv2 -lEGL -lX11 -o opengltest
export DISPLAY=:0 && ./opengltest

EDIT: fixed the link to the git repository above The application will show a red rectangle on a background that cycles between green and blue. In a terminal or via ssh enter (this will disable lightdm): sudo systemctl set-default multi-user.target Reboot. The pi will now boot to a console. In the console or via ssh enter: sudo xinit & This will start plain X11 without a desktop manager and it will open an xterm. In xterm or via ssh enter:

cd opengltest
export DISPLAY=:0 && ./opengltest

The application will show a garbled red rectangle. Edit opengltest.c and change eglSwapInterval (mEGLDisplay, 1); to eglSwapInterval (mEGLDisplay, 0); (this will disable vsync). In xterm or via ssh enter:

gcc opengltest.c -lGLESv2 -lEGL -lX11 -o opengltest
export DISPLAY=:0 && ./opengltest

Now the rectangle is ok again.

Expected behaviour I expect a red rectangle on a background that cycles between green and blue.

Actual behaviour The red rectangle is garbled

System Copy and paste the results of the raspinfo command in to this section. Alternatively, copy and paste a pastebin link, or add answers to the following questions:

Logs

Additional context The problem occurs with both 32bits and 64 bits versions of Raspberry Pi OS. The problem occurs only for kernel 5.4, not for kernel 4.19. The problem occurs when using fkms. The problem occurs when using openglesv2, with egl, with X11, but only when xcompmgr is disabled. The problem occurs only when vsync is enabled with eglSwapInterval (mEGLDisplay, 1) and not when it is disabled with eglSwapInterval (mEGLDisplay, 0). The problem occurs only for a fullscreen window. The 'garbling' is an 'invertible' operation. When I display a full screen picture that has been pre-processed with the following code, it looks ok on the screen.

        uint32_t *p = (uint32_t *)thePic2->GetPixels (); // the pre-process destination
        for (int i = 0 ; i < 1920*1080 ; i += 4) {
            // 776666 4433221111
            int a = i % (60*32);
            int b = i / (60*32);
            int i8 = (b>>3);
            int i7 = (b>>2)&1;
            int i6 = b&3;
            int i5 = (a>>5);
            int i4 = (a>>4)&1;
            int i3 = (a>>3)&1;
            int i2 = (a>>2)&1;
            int i1 = a&3;
            int theRow = 18*i5 + (i4+2*i8)/15;
            int theCol = i1 + 16*i2 + 64*i3 + 4*i6 + 32*i7 + 128*((i4 + 2*i8)%15);
            uint32_t *q = (uint32_t *)thePic1->GetPixels (theRow, theCol); // the original picture
            for (int n = 0 ; n < thePic1->mBytesPerPixel ; n++) {
               *p++ = *q++;
            }
        }
6by9 commented 4 years ago

Can you provide the output of vcgencmd dispmanx_list and sudo cat /sys/kernel/debug/dri/1/state I have a suspicion that one side is trying to render to UIF, and the display pipeline can't display that.

Sorry, I'm not going to decode that code for how/where you get the pixels from. Is that totally from reverse engineering, or have you based that on some doc?

imbens commented 4 years ago

The output from vcgencmd dispmanx_list: display:2 format:XRGB8888 transform:0 layer:-127 1920x1080 src:0,0,1920,1080 dst:0,0,1920,1080 cost:1156 lbm:0

The output from sudo cat /sys/kernel/debug/dri/1/state:

plane[31]: plane-0
    crtc=crtc-0
    fb=65
        allocated by = Xorg
        refcount=2
        format=XR24 little-endian (0x34325258)
        modifier=0x700000000000006
        size=1920x1080
        layers:
            size[0]=1920x1080
            pitch[0]=7680
            offset[0]=0
            obj[0]:
                name=0
                refcount=3
                start=000113cc
                size=8294400
                imported=no
    crtc-pos=1920x1080+0+0
    src-pos=1920.000000x1080.000000+0.000000+0.000000
    rotation=1
    normalized-zpos=0
    color-encoding=ITU-R BT.601 YCbCr
    color-range=YCbCr limited range
plane[38]: plane-1
    crtc=(null)
    fb=0
    crtc-pos=0x0+0+0
    src-pos=0.000000x0.000000+0.000000+0.000000
    rotation=1
    normalized-zpos=0
    color-encoding=ITU-R BT.601 YCbCr
    color-range=YCbCr limited range
plane[45]: plane-2
    crtc=(null)
    fb=0
    crtc-pos=0x0+0+0
    src-pos=0.000000x0.000000+0.000000+0.000000
    rotation=1
    normalized-zpos=0
    color-encoding=ITU-R BT.601 YCbCr
    color-range=YCbCr limited range
crtc[52]: crtc-0
    enable=1
    active=1
    self_refresh_active=0
    planes_changed=1
    mode_changed=0
    active_changed=0
    connectors_changed=0
    color_mgmt_changed=0
    plane_mask=1
    connector_mask=1
    encoder_mask=1
    mode: "": 0 148500 1920 2008 2052 2200 1080 1084 1089 1125 0x0 0x5
connector[54]: HDMI-A-1
    crtc=crtc-0
    self_refresh_aware=0
imbens commented 4 years ago

Sorry, I'm not going to decode that code for how/where you get the pixels from. Is that totally from reverse engineering, or have you based that on some doc?

It is totally from looking at the screen and trying to understand how the bits are mixed up. The gist is that all pixels are there, they are just in the wrong location.

timg236 commented 4 years ago

Does this happen at all display resolutions e.g 1024x768 ?

6by9 commented 4 years ago

It is totally from looking at the screen and trying to understand how the bits are mixed up. The gist is that all pixels are there, they are just in the wrong location.

Thanks. The 3D hardware in each generation has had an internal tiled format - T-format in Pi0-3, and UIF in Pi4. They tile the RGB images in a way to make it more efficient to access the image for 3D purposes. The rough description of UIF is at https://elixir.bootlin.com/linux/latest/source/include/uapi/drm/drm_fourcc.h#L640

There was a patch for mesa that made sure that it chose an appropriate output format for the rendering pipe - I wondering if something in that path has gone wrong, but don't know enough of how that is configured.

Trying to clone your repo fails - it looks like it's set as a private repo.

imbens commented 4 years ago

Trying to clone your repo fails - it looks like it's set as a private repo.

I just tried without credentials with git clone https://github.com/imbens/opengltest.git and that worked.

imbens commented 4 years ago

Does this happen at all display resolutions e.g 1024x768 ?

It happens at 1920x1080 and 1280x720 and I just checked 1024x768 and it happens there as well.

6by9 commented 4 years ago

Original post

git clone git@gitlab.com:imbens/opengltest

Gitlab != Github

imbens commented 4 years ago

Gitlab != Github

You are absolutely right. I am very sorry.

There was a patch for mesa that made sure that it chose an appropriate output format for the rendering pipe - I wondering if something in that path has gone wrong

When using an OpenGL app on X11 with fkms, I would expect two planes at the HVS level with both planes showing up in vcgencmd dispmanx_list. That seems not to be the case. Does the drm driver create an extra plane for the OpenGL app at the HVS level without using dispmanx, or is there only a single plane at the HVS level?

6by9 commented 4 years ago

When using an OpenGL app on X11 with fkms, I would expect two planes at the HVS level with both planes showing up in vcgencmd dispmanx_list. That seems not to be the case. Does the drm driver create an extra plane for the OpenGL app at the HVS level without using dispmanx, or is there only a single plane at the HVS level?

What second plane are you expecting? Within X you'll have the desktop (as primary) and cursor planes active. If your app takes over fullscreen, then X is replaced in using the primary plane. All composition within X is done using GL which sucks, but otherwise every window/surface has to be an independent composition plane to get the Z ordering right.

imbens commented 4 years ago

The 3D hardware in each generation has had an internal tiled format - T-format in Pi0-3, and UIF in Pi4. They tile the RGB images in a way to make it more efficient to access the image for 3D purposes. The rough description of UIF is at https://elixir.bootlin.com/linux/latest/source/include/uapi/drm/drm_fourcc.h#L640

There was a patch for mesa that made sure that it chose an appropriate output format for the rendering pipe - I wondering if something in that path has gone wrong, but don't know enough of how that is configured.

You are probably right. I took mesa from git, commented out the line DRM_FORMAT_MOD_BROADCOM_UIF, in uint64_t available_modifiers[] = { , https://github.com/mesa3d/mesa/blob/master/src/gallium/drivers/v3d/v3d_screen.c#L638 , and now the output looks fine. Obviously this is not a fix, this just amputates the part of the functionality where the bug is hiding.

imbens commented 4 years ago

When removing case DRM_FORMAT_MOD_BROADCOM_UIF: in vc4_fkms_format_mod_supported (https://github.com/raspberrypi/linux/blob/rpi-5.4.y/drivers/gpu/drm/vc4/vc4_firmware_kms.c#L721), the output also looks fine. Does fkms/dispmanx/firmware really support DRM_FORMAT_MOD_BROADCOM_UIF? Why did this work with 4.19? Does mesa have to convert from tiled to linear and is that a performance issue?

6by9 commented 4 years ago

Oh poot! Where did that come from?

No, none of the rendering pipeline supports UIF, only the 3D block (which uses it nearly exclusively). None of the DRM components should be advertising it as a format. As to how it worked on 4.19 I have no idea. I suspect it is to do with this being advertised through format_mod_supported when the planes then don't have it configured.

3D can render to linear buffers with minimal performance overhead, so it should be fine. The only niggle is when reconsuming the current frame to create the next as it has to convert the current frame to UIF first.

Thanks for investigating this - I'll create a PR in a few minutes.

popcornmix commented 4 years ago

rpi-update kernel should now have this fix

imbens commented 4 years ago

I can confirm that after sudo rpi-update, uname -a reports Linux 1080dots-DSPlayer 5.4.51-v7l+ #1329 SMP Wed Jul 29 10:26:41 BST 2020 armv7l GNU/Linux, and the issue is resolved.

bluestang2006 commented 4 years ago

Ok this is weird but I am experiencing this garbled screen issue with the vc4-kms-v3d driver as well. It is happening on RetroArch when I turn vsync off and using X11. Not sure if it happens in DRM too, I will have to check later.

This happened on a clean install of RPiOS 32bit that was fully updated via apt. I have a Pi 4 4GB.

6by9 commented 4 years ago

vc4-kms-v3d on Pi4 is still in the early stages - expect issues.

RomanValov commented 4 years ago

Hi, folks. I've run into my Xorg.log being aggressively spammed with messages like:

(WW) modeset(0): flip queue failed: Invalid argument
(WW) modeset(0): Page flip failed: Invalid argument
(EE) modeset(0): present flip failed

and analyzed that the log spam has introduced by the fix of this issue.

Filed a bug for Mesa. Please join the discussion.

Could you please tell if you also experiencing logs spamming?

RomanValov commented 4 years ago

P.S.: also it seems the fix breaks rainbow square logo to be shown on boot of Raspberry Pi

imbens commented 4 years ago

Could you please tell if you also experiencing logs spamming?

Yes, I see the exact same sequence of messages at each page flip at a rate of 10Kbyte/s in /var/log/Xorg.0.log

6by9 commented 4 years ago

P.S.: also it seems the fix breaks rainbow square logo to be shown on boot of Raspberry Pi

Breaks in what way? I'm going to say impossible as the fix is in the Linux kernel whilst the rainbow square is displayed by the firmware long before it's even loaded the kernel into RAM, let alone started executing it.

RomanValov commented 4 years ago

P.S.: also it seems the fix breaks rainbow square logo to be shown on boot of Raspberry Pi

Breaks in what way? I'm going to say impossible as the fix is in the Linux kernel whilst the rainbow square is displayed by the firmware long before it's even loaded the kernel into RAM, let alone started executing it.

Double checked again and definitely I was wrong about rainbow square logo is broken with the commit.

bluestang2006 commented 3 years ago

@6by9 @popcornmix @itoral @infapi00 @txenoo

This issue is still reproducible with a fullscreen OpenGL (vsync enabled) application using X11, with or without the compositor enabled, using 5.10.7 kernel + the KMS video and audio driver (vc4-kms-v3d), and the latest MESA drivers built from upstream that include Vulkan on the Pi4.

Disabling vsync in the OpenGL application will fix this issue. I've been using vkQuake3 to reproduce the issue this morning but I've seen this exact issue happen in RetroArch as well when using the OpenGL renderer. I built vkQuake3 with the following - make -j4 USE_LOCAL_HEADERS=0 since I am also building SDL2 from upstream. (v2.0.15) To disable vsync in vkQuake3 you can set the console variable r_swapinterval 1, changing it back to the default value r_swapinterval 0 will result in the garbled screen.

This garbled screen issue does not occur with Vulkan at all.

popcornmix commented 3 years ago

@bluestang2006 are you sure you are seeing this issue? The fix for this issue (trying to display a UIF image as linear with fkms) is present in 5.10.7 kernel.

I suspect you are seeing an unrelated issue and should probably create a new issue. It may be worth trying a kernel with https://github.com/raspberrypi/linux/pull/4075

bluestang2006 commented 3 years ago

@bluestang2006 are you sure you are seeing this issue? The fix for this issue (trying to display a UIF image as linear with fkms) is present in 5.10.7 kernel.

Let me clarify:

  1. You are right, this specific issue is resolved when using the FKMS (dtoverlay=vc4-fkms-v3d) driver on the Pi4. I just tested it out FWIW.

  2. However, this exact problem - the garbled screen output, persists in the KMS (dtoverlay=vc4-kms-v3d) driver on the Pi4.

    a. Using vkQuake3 as a test program, using X11 & OpenGL in fullscreen, and disabling vsync will cause the garbled screen output.

    b. This is also reproducible in RetroArch, again using X11 & OpenGL in fullscreen and disabling vsync results in the same garbled output.

    c. Turning vsync back on fixes the output. In vkQuake3 you can switch back and forth between r_swapinterval 1/0 with vid_restart and watch the screen go from normal output to garbled output. (It might a bit of a challenge switching back when the screen is garbled, but if you type in the commands in the console by memory it will switch back =] )

    d. The other bit of nuance with this issue seems to be if you start the X11 & OpenGL app from a window (not fullscreen) and from within the app you change it to fullscreen and turn vsync off the garbled screen does not occur. Switching back and forth results in normal screen outputs.

KMS/DRM is not affected by this. This is exclusive to X11 and the dtoverlay=vc4-kms-v3d driver, at least from what I've seen thus far.

I suspect you are seeing an unrelated issue and should probably create a new issue. It may be worth trying a kernel with #4075

If this needs to be opened up as a separate issue, I am happy to do so. My initial thought was that these issues were related because of the screen behavior.

Once the firmware appears in rpi-update I will try it out.

bluestang2006 commented 3 years ago

I suspect you are seeing an unrelated issue and should probably create a new issue. It may be worth trying a kernel with #4075

With the latest kernel, this issue still persists as described in my last post.

popcornmix commented 3 years ago

I've had a look. I can reproduce with neverball when set to full screen with vsync disabled. This works with fkms but not with kms.

kms has a slightly different code path to fkms, but it doesn't have the original bug of fkms. Its vc4_format_mod_supported function does correctly return false when given a XR24 (DRM_FORMAT_XRGB8888) plane with modifier DRM_FORMAT_MOD_BROADCOM_UIF

However /sys/kernel/debug/dri/0/state does show that we have a XR24 with DRM_FORMAT_MOD_BROADCOM_UIF (0x700000000000006) active. I'm not currently sure how that is possible after vc4_format_mod_supported returned false.

plane[68]: plane-3
        crtc=crtc-3
        fb=226
                allocated by = Xorg
                refcount=2
                format=XR24 little-endian (0x34325258)
                modifier=0x700000000000006
                size=1280x720
                layers:
                        size[0]=1280x720
                        pitch[0]=5120
                        offset[0]=0
                        obj[0]:
                                name=0
                                refcount=3
                                start=00010997
                                size=3686400
                                imported=no
        crtc-pos=1280x720+0+0
        src-pos=1280.000000x720.000000+0.000000+0.000000
        rotation=1
        normalized-zpos=0
        color-encoding=ITU-R BT.601 YCbCr
        color-range=YCbCr limited range
bluestang2006 commented 3 years ago

I've had a look. I can reproduce with neverball when set to full screen with vsync disabled.

2 things:

  1. if you start neverball in windowed mode and then switch it to fullscreen and disable vsync does the garbled screen still happen? Same output as above?

  2. MESA still has UIF as an available format modifier - https://github.com/mesa3d/mesa/blob/master/src/gallium/drivers/v3d/v3d_screen.c#L634 There was discussion to switch the ordering but was never investigated further. https://gitlab.freedesktop.org/mesa/mesa/-/issues/3601