grate-driver / xf86-video-opentegra

X.Org video driver for NVIDIA Tegra
Other
12 stars 8 forks source link

Driver causes memory-controller related issues #60

Closed KaiJan57 closed 3 years ago

KaiJan57 commented 3 years ago

Installing this driver on a tegra30 based tablet (corresponding package for postmarketOS: link) apparently poses an issue to the embedded memory controller. Linux version is: 5.12.0-rc2-00208-ga74e6a014c9d-dirty Examples for error messages are: tegra-mc 7000f000.memory-controller: idxsrd2: read @0x00801000: EMEM address decode error (EMEM decode error) (Addresses sometimes vary) and tegra_mc_irq: n callbacks suppressed (replace n by a two place decimal number). A bunch of these would be spit out every now and then. The screen output also indicates that something is going wrong: on the desktop some horizontal stripes of icons are rendered correctly while others are not (or are missing entirely). Any ideas on what might be the cause(es) of this issue?

digetx commented 3 years ago

Hi, I see that version is git20200423 and there were fixes after that date which should address the problem you described.

@okias Is it possible to update the opentegra package?

KaiJan57 commented 3 years ago

Definitely a good idea to update the package. Maybe I can find some time tomorrow to do so, as that seems to be quite a lot of work… What do you think, is it more convenient to compile the driver on-device?

digetx commented 3 years ago

Compiling on device should be the easiest variant if you don't want to invest time and effort into creating a cross-compile environment.

KaiJan57 commented 3 years ago

Ok, so that is very interesting. I installed the latest commit. The screen appears to be sectioned into stripes of, say, about 20 px in height. Every even-numbered stripe renders correctly while every odd-numbered stripe is transparent or black. Dmesg: [ 3482.214224] tegra-mc 7000f000.memory-controller: idxsrd2: read @0x029bd000: EMEM address decode error (EMEM decode error) [ 3482.639641] tegra-mc 7000f000.memory-controller: idxsrd2: read @0x02095000: EMEM address decode error (EMEM decode error) [ 3482.662546] tegra-mc 7000f000.memory-controller: idxsrd2: read @0x027dc000: EMEM address decode error (EMEM decode error) [ 3482.670086] tegra-mc 7000f000.memory-controller: idxsrd2: read @0x020e3000: EMEM address decode error (EMEM decode error) [ 3482.678879] tegra-mc 7000f000.memory-controller: idxsrd2: read @0x02197000: EMEM address decode error (EMEM decode error) [ 3482.711108] tegra-mc 7000f000.memory-controller: idxsrd2: read @0x0297c000: EMEM address decode error (EMEM decode error) [ 3482.721028] tegra-mc 7000f000.memory-controller: idxsrd2: read @0x02237000: EMEM address decode error (EMEM decode error) [ 3482.725678] tegra-mc 7000f000.memory-controller: idxsrd2: read @0x02156000: EMEM address decode error (EMEM decode error) [ 3482.734849] tegra-mc 7000f000.memory-controller: idxsrd2: read @0x0231c000: EMEM address decode error (EMEM decode error) [ 3482.760748] tegra-mc 7000f000.memory-controller: idxsrd2: read @0x02001000: EMEM address decode error (EMEM decode error)

The addresses on which read errors occur changed ranges, but the errors are still there. I can also provide the device tree I used, does that help?

digetx commented 3 years ago

T30 has two GPU units, by default the h/w is configured such that first unit draws even lines of 16x16 block and odd lines are drawn by the second unit.

The EMEM address decode error tells us that second GPU unit isn't programmed correctly either by userspace (opentegra) or by kernel driver. It also tells us that IOMMU is disabled for GPU in kernel because phys memory addresses start from 0x80000000 on T30+ and this is not an SMMU fault, which is interesting because IOMMU should be enabled by default.

Could you please give me this information:

  1. What tablet device your are using
  2. Show the device-tree
  3. Show Xorg.log
  4. Run this script https://gist.github.com/digetx/a9423a454f96711e3b61efd6c2e69233 and show the output
KaiJan57 commented 3 years ago

Tablet is a Lenovo IdeaTab A2109A (working android device configuration: link Device tree I made from specs (probably full of errors and inconsistencies, as I had very little experience with device trees): tegra30-kai.txt

Everything else I will provide later.

okias commented 3 years ago

https://gitlab.com/postmarketOS/pmaports/-/merge_requests/2102

not tested thou, waiting at least for CI to finish.

EDIT: blocked by https://github.com/grate-driver/xf86-video-opentegra/issues/61

digetx commented 3 years ago

@KaiJan57 Very nice! Could you please enumerate what works and what not? We could add the device-tree to grate-kernel and try to finalize it. Please feel free to open pull request with the DT, I'll merge it.

@okias Thank you!

KaiJan57 commented 3 years ago

For the sake of completeness, here are the requested files. Xorg.0.log basic-system-info.txt

That is a lot of system-info, I hope you know where to look at ;) NOTE: The kernel cmdline supplied by the bootloader is entirely different; I had to override it by kernel configuration. If required I can provide the original line too.

KaiJan57 commented 3 years ago

@KaiJan57 Very nice! Could you please enumerate what works and what not? We could add the device-tree to grate-kernel and try to finalize it. Please feel free to open pull request with the DT, I'll merge it.

@okias Thank you!

What about the kernel configuration? How should I add it to the PR? Enumeration of works/broken: Works:

Broken:

digetx commented 3 years ago

Good news, @KaiJan57! This is a known kernel IOMMU driver bug that is already fixed in a newer kernel version. All you need to do is to update yours kernel to a newer version and the problem will be fixed. I'd also recommend to use grate-kernel since this a work-in-progress device and grate-kernel has all the most recent and pending fixes + features that are yet not supported by mainline.

I see that the device-tree is incomplete and has some obvious issues, so please feel free to open the pull request and I'll help with fixing and improving it. Couple missing drivers is a normal thing for a new device coming to mainline, we can fix it later on. You could add custom kernel config to https://github.com/grate-driver/linux/tree/master/arch/arm/configs along with the device-tree in a single commit.

KaiJan57 commented 3 years ago

That's really great and thank you two very much for your effort and all the interesting information! I am going to prepare the PR then. That will probably take some time, but it seems to be worth it.

digetx commented 3 years ago

Thank you and please let me know whether the kernel update indeed fixes yours problem.

KaiJan57 commented 3 years ago

Yep, grate-driver/linux works nicely. Error messages disappeared, no more glitches. Thank you very much! However, there are a lot of dmesg lines like this one: f 21#948: signaled from irq context (numbers vary) I doubt that's a problem, but I wonder what those mean…