lategoodbye / rpi-zero

Linux kernel source tree
Other
22 stars 3 forks source link

VC4 DRM waiting for flip down makes UI freeze a while with kernel 5.15 #53

Closed starnight closed 2 years ago

starnight commented 2 years ago

I tested Linux mainline kernel 5.15 (aarch64) with enabled VC4 on RPi 4B. I notice UI freezes a while (about 10 seconds) some times. The kernel shows the error message during the time:

[   62.942964] [drm:drm_crtc_commit_wait] *ERROR* flip_done timed out
[   62.942984] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:68:crtc-3] flip_done timed out
[   62.943007] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:68:crtc-3] commit wait timed out
[   73.183055] [drm:drm_crtc_commit_wait] *ERROR* flip_done timed out
[   73.183098] vc4-drm gpu: [drm] *ERROR* Timed out waiting for commit

dmesg-5.15.log

It is easy to reproduce this issue by invoking GL related things, for example es2gears.

starnight commented 2 years ago

After detail test, I found it is related to these commits:

This issue cannot be reproduced after I revert the commits.

lategoodbye commented 2 years ago

@starnight This is not the right place to report VC4 issues. Please report to Maxime Ripard and dri-devel@lists.freedesktop.org

alien999999999 commented 1 year ago

After detail test, I found it is related to these commits:

* [f3c420fe19f8 ("drm/vc4: kms: Convert to atomic helpers")](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f3c420fe19f8cc39adf379365decf63167596dc3)

* [82faa3276012 ("drm/vc4: kms: Remove async modeset semaphore")](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=82faa3276012d272d930026e7666d978ef2c6fef)

This issue cannot be reproduced after I revert the commits.

I have your issue on 5.15 ; did you end up reporting this someplace? is it fixed in later versions? do you have any follow up link?

lategoodbye commented 1 year ago

Maxime made great progress with VC4 in mainline. I don't see a reason to stick to such a older kernel version. Even the Raspberry Pi folks switched to 6.1 recently.

alien999999999 commented 1 year ago

ok, thanks a lot, i'll try to get a newer kernel and see if this is improved...

alien999999999 commented 1 year ago

So, not sure where to look, but using 6.1.6 instead of 5.15 i have a black screen, the changes i can see is that drm now says no crtc or size found; and the X log just says EDID reports nothing; which was different in 5.15 .

Is this something that's changed wrt 5.15 and i have to manually provide EDID or something? or is there actually some weird driver bug with "crtc" whatever that is?

I tried both hdmi ports; i tried setting hdmi_mode (and group) in config.txt but that made no difference, i thought that the idea is: the graphics driver asked monitor for EDID?

the only thing changed is the kernel, the rest of the system is exactly the same...

lategoodbye commented 1 year ago

Just to make sure, you are talking about the mainline kernel from here: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/

and the kernel from the Raspberry Pi folks?

Btw you should also update the devicetree blob

alien999999999 commented 1 year ago

no, i switched from mainline kernel 5.15 to 6.1 (a distro built one)

alien999999999 commented 1 year ago

I saw this other thread about someone having issues with EDID and making one to use manually; but i thought it weird that in 5.15 the X.org log contained EDID modeline stuff. I've been using the mainline kernels in combination with the firmware i got from the raspberry pi folks, so I do see the rainbow thing everytime, so the TV can handle those, and i also tried with hdmi_group and hdmi_mode , but 6.1 complained about no finding crtc or sizes, while 5.15 did not complain of that... I donno if this is a regression, or I need to fix the EDID manually...

lategoodbye commented 1 year ago

Yes, this sounds like a regression. I would be fine, if you could find the last major release which work with your setup. There is a year development work between 5.15 and 6.1.

alien999999999 commented 1 year ago

myeah... in any case, i tried with a manual edid built from the modeline from when it was working with edid-generator, but that made no difference, except it complained that i used drm_kms_helper.edid_firmware instead of drm.edid_firmware and the bootup was 10x slower, but still crtc or sizes error and X log file did not report any edid.... (this is all on the 2nd HDMI port, i didn't try the first one yet.)

There are some more backported kernels in between, so i'll go check betweeen latest 5 kernel and see if i can pinpoint it down first. Am i correct in assuming i need to look for crtc driver issues somewhere? or is that not where the problem lies?

lategoodbye commented 1 year ago

Sorry, i've no deeper knowledge about DRM. So my idea was to narrow it down to the major version and report to the DRM kernel mailing list.

alien999999999 commented 1 year ago

ok, good idea, cause 5.16 still worked, but 5.19 didn't, so there's a few more pre-built kernels in between, so i'm gonna do that.

on a positive note 5.16 has 4K as default even, on HDMI2 while 5.15 didn't. it would be great to fix this regression, cause i noticed 6.1 had like v3d driver support

alien999999999 commented 1 year ago

it broke in 5.17.11 (worked at 5.16.18) and the output is a bit different:

Aug 06 13:09:35 rpimedia kernel: vc4-drm gpu: bound fef05700.hdmi (ops vc4_hdmi_ops [vc4]) Aug 06 13:09:35 rpimedia kernel: vc4-drm gpu: bound fe004000.txp (ops vc4_txp_ops [vc4]) Aug 06 13:09:35 rpimedia kernel: vc4-drm gpu: bound fe206000.pixelvalve (ops vc4_crtc_ops [vc4]) Aug 06 13:09:35 rpimedia kernel: vc4-drm gpu: bound fe207000.pixelvalve (ops vc4_crtc_ops [vc4]) Aug 06 13:09:35 rpimedia kernel: vc4-drm gpu: bound fe20a000.pixelvalve (ops vc4_crtc_ops [vc4]) Aug 06 13:09:35 rpimedia kernel: vc4-drm gpu: bound fe216000.pixelvalve (ops vc4_crtc_ops [vc4]) Aug 06 13:09:35 rpimedia kernel: vc4-drm gpu: bound fec12000.pixelvalve (ops vc4_crtc_ops [vc4]) Aug 06 13:09:35 rpimedia kernel: checking generic (3ea81000 12c000) vs hw (0 ffffffffffffffff) Aug 06 13:09:35 rpimedia kernel: fb0: switching to vc4 from simple Aug 06 13:09:35 rpimedia kernel: Console: switching to colour dummy device 80x25 Aug 06 13:09:35 rpimedia kernel: [drm] Initialized vc4 0.0.0 20140616 for gpu on minor 0 Aug 06 13:09:35 rpimedia kernel: vc4-drm gpu: [drm] Cannot find any crtc or sizes

alien999999999 commented 1 year ago

5.17.4 is also broken, and 5.16.18 worked ; that's the closest i can pinpoint it

lategoodbye commented 1 year ago

Okay fine, please write a bug report via email to the following: Emma Anholt Maxime Ripard dri-devel@lists.freedesktop.org

Adresses can be found here: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/MAINTAINERS?h=next-20230306

Thanks

alien999999999 commented 1 year ago

bug report sent: https://lists.freedesktop.org/archives/dri-devel/2023-March/394254.html

alien999999999 commented 1 year ago

So, any idea how long i need to wait for reply on this email? i ended up replying to myself today to add more info and be more specific...

lategoodbye commented 1 year ago

On the kernel mailing list, you always need patience. The DRM developer are usually very busy. So usual wait "timeout" should be 2 weeks before sending a "ping".

In case you have some time, you could try to git bisect this issue.

alien999999999 commented 1 year ago

There is quite some commits, i looked through some of them on crtc but it didn't seem like any of them fit, i looked through the code a bit, trying to guess from the error message, but drm has lotsa files... If i had some pointers in some regions or candidates, that would be a lot less timeconsuming work...

lategoodbye commented 1 year ago

Since you narrowed it down between two major versions, bisecting should take ~ 14 steps, but i agree this takes a lot time ...

alien999999999 commented 1 year ago

am i correct in assuming the drm stuff comes all at once in a new major version? (except for some regression fixes?), i noticed in the git that there is only like a bunch of rc's

lategoodbye commented 1 year ago

I don't think it's a good idea to assume this issue comes from drm-misc. Sometimes changes from other subsystems can also be the cause.

alien999999999 commented 1 year ago

oh, so, you would get 5.17 and revert drm to 5.16 and then from there revert less until you have the issue?

lategoodbye commented 1 year ago

As i said https://git-scm.com/docs/git-bisect is your friend :-)

alien999999999 commented 1 year ago

ok, so you would do bisect on the whole kernel, not the drm subtree?

lategoodbye commented 1 year ago

yes

alien999999999 commented 1 year ago

for reference, it seems the conclusion is a bad EDID from TV combined with bad hdmi hotplug behavior, and so not a kernel regression in the sense that newer kernels are stricter about bad EDID's , but for me it's solvable with kernel parameters

alien999999999 commented 1 year ago

stupid question: I was trying v4l2m2m ffmpeg in the kernel 6.1 and it seems all /dev/video* devices are gone? I have used these before, but i donno why these are gone now? (there should be a 10 and 11 for h264 and hevc) do you know what kind of kernel module is responsible for these devices?

alien999999999 commented 1 year ago

apparently people have bcm2835-codec that does this... but i just can't seem to find this kernel driver anywhere? is this removed, obsoleted, or replaced?

nullr0ute commented 1 year ago

It's not in the upstream kernel as yet but it is in the rpi foundation downstream kernel.

lategoodbye commented 1 year ago

@alien999999999 nullr0ute is right, the bcm2835-codec isn't mainlined yet. As long as VCHIQ isn't out of staging there is no point adding more of these drivers to staging.

Btw: Please stopping commenting on this issue and open new ones.