yuq / mesa-lima

Deprecated, new place: https://gitlab.freedesktop.org/lima
https://github.com/yuq/mesa-lima/wiki
165 stars 18 forks source link

MMU faults / pp error irq on Mali450 in higher resolutions #26

Closed mmind closed 6 years ago

mmind commented 6 years ago

I'm working on making a Rockchip pipe driver, but right now run into some issues

./kmscube -d -D /dev/dri/card1 i.e. directly on the lima device, works as expected

./kmscube -d -D /dev/dri/card0 i.e. accessing the Rockchip card and going through the pipe driver, creates correct images but at the end I get a [ 205.133880] lima ff300000.gpu: still active bo inside vm

and finally doing the main ./kmscube or ./kmscube -M rgba I end up with [ 40.925679] lima ff300000.gpu: pp error irq state=200 status=41 [ 40.926342] [drm] lima worker wait task error and half a frame on the display as seen in the attached screenshot img_20180116_234656 (the frame is static and gets replaceed by a fully black display a bit later)

After a bit there is some more output like [drm] lima worker wait dep fence error -110

As both the sunxi and exynos pipes seem to be working, it looks like more of a fault of the connection to the Rockchip drm (something prime-related?), but right now I definitly lack the drm-related knowledge about where to look first. So maybe someone might be able to provide a pointer on where I should look first here.

anarsoul commented 6 years ago

@mmind try disabling iommu driver. It could happen that Rockchip DRM driver allocates non-linear scanout, but as far as I can tell, lima driver doesn't support it at the moment.

yuq commented 6 years ago

Another possibility is what's your screen resolution? If it's 4K, then current lima hard code the limit to 2048x2048.

And check if following two commits in your:

  1. mesa: https://github.com/yuq/mesa-lima/commit/c6c2844c2a5f9877220f1b73b272ad72a5c26c30
  2. linux: https://github.com/yuq/linux-lima/commit/a826702396f059844dfebb9be0bcb02a4329c746
anarsoul commented 6 years ago

@yuq mesa-lima works in 2536x1440 for me, but it's clipped to 2048x1440

yuq commented 6 years ago

@anarsoul You mean with 2536x1440 screen, mesa-lima only render the 2048x1440 region? Do you have this commit: https://github.com/yuq/mesa-lima/commit/c6c2844c2a5f9877220f1b73b272ad72a5c26c30

anarsoul commented 6 years ago

@yuq I think I do, I'm rebasing my repo pretty often.

yuq commented 6 years ago

It's here that I assume the max w/h to be 2048: https://github.com/yuq/mesa-lima/blob/lima-17.3/src/gallium/drivers/lima/lima_context.h#L208

But it's a total size limit, not each dimension limit, 2536x1440 still within this total size limit. I don't know why 2536x1440 screen can be clipped to 2048x1440.

anarsoul commented 6 years ago

Maybe it's not 2048x1440, but cube isn't centered and a bit stretched vertically. FWIW it's working fine in 1920x1080

yuq commented 6 years ago

Oh, seems I reproduced your 2536x1440 problem when offscreen rendering for kmscube, is your screen look like this dump0

anarsoul commented 6 years ago

@yuq yes.

yuq commented 6 years ago

This commit should fix it: https://github.com/yuq/mesa-lima/commit/fcfbfee7612eb8bc113076a24ca935375f0e08b1

fourkbomb commented 6 years ago

ARM's website suggests that the max texture size is 4096x4096: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka13314.html

yuq commented 6 years ago

OK, thanks for your info. I've set it to 4096 now: https://github.com/yuq/mesa-lima/commit/86c9c19c0a930aa9252d860534ae79a25084d9ac

mmind commented 6 years ago

@yuq screen resolution is a standard 1080p

I'm going to check what the drm driver does with the iommu disabled. In theory it should support that but I don't think anybody has tested this with mainline yet :-)

superna9999 commented 6 years ago

@mmind @yuq I have the exact same result with Amlogic Meson DRM driver, 1080p

mmind commented 6 years ago

Just to fill in some things I was able to check today. First of all, disabling the iommu works for the general output but does not also lead to scrambled screen output. Secondly I was able to wire up my rk3036-based kylin board (arm32) again which has a nice Mali400MP1 and got this nice output on it - both with and without the iommu: img_20180118_172105

So it looks like there is no general issue in the Rockchip drm driver, but something a bit more special somewhere.

@superna9999 for completenes sake, from the irc log I gathered, that you're also on an arm64 board, right? And out of curiosity, does the dump option of kmscube work for you? As you can see in my initial report at the top, both dumping lima directly and also via the pipe driver through the rockchip-drm produces correct png images. Is this the case for you as well?

yuq commented 6 years ago

@mmind here's some guess for your problem:

  1. mali4xx can only address 32bit address space, but ARM64 may have some RAM above that place
  2. linux-lima uses dma_alloc_xxx for all buffers, but some chip's IOMMU only cover the display engine, so the dma_alloc_xxx may not implement properly which causes returned mem not physically continuous for mali GPU
  3. special hardware/cache problem, when I switched from offscreen rendering to onscreen rendering, there's a MMU cache problem which is fixed by https://github.com/yuq/linux-lima/commit/a826702396f059844dfebb9be0bcb02a4329c746, so there may be some other place need be changed for your HW by comparing the official mali kernel driver exactly for your chip and linux-lima

Another experiment I think is changing the kmcube dump FB size to 1080p and see if it success.

BTW. the buffer displayed on screen is also allocated by lima and exported to display drm.

superna9999 commented 6 years ago

For Amlogic GX, there is no IOMMU and the dram does not cross the 32bit boundary. The official Mali driver works without any specific changes :

superna9999 commented 6 years ago

https://github.com/superna9999/meson_gx_mali_450/blob/DX910-SW-99002-r6p1-01rel0_meson_gx/driver/src/devicedrv/mali/platform/meson/meson.c

superna9999 commented 6 years ago

@mmind when possible I’ll try to render offscreen (i’m Flying to Sydney for LCA2018)

yuq commented 6 years ago

Seems the problem mali450 spec?

mmind commented 6 years ago

So I found the time to dig around a bit more and the issue really seems unrelated to the rockchip display-engine, as it also happens when using the kmscube dumping function at higher resolutions on the mali450.

For reference dumping at 1920x1080 on my arm32 rk3036 with a Mali400MP1 (64MB cma area) worked just fine and produced correct images with mesa binaries build from the same (most recent source version).

Dumping on the arm64 Mali450MP2 (rock64 board has 2GB of ram) I get: 1280x720: dump works as expected 1366x768: dump works as expected

1600x900: 1920x1080: one of he following [ 68.552686] lima ff300000.gpu: mmu page fault at 0x75cad9c0 from bus id 0 of type read on gpmmu [ 68.553649] [drm] lima worker wait task error [ 68.554145] lima ff300000.gpu: mmu resume [ 69.569276] [drm] lima worker wait dep fence error -110 [ 69.585639] lima ff300000.gpu: still active bo inside vm

OR

[ 1018.992680] lima ff300000.gpu: pp error irq state=200 status=41 [ 1018.993348] [drm] lima worker wait task error [ 1026.172479] [drm] lima worker wait dep fence error -110 [ 1026.172486] [drm] lima worker wait dep fence error -110 [ 1030.468945] lima ff300000.gpu: still active bo inside vm

With the gpu switching somewhat randomly between the two errors.

Limitting the kms drm-mode to 720p makes kmscube run without errors on the rockchip display-engine so I would assume I'll see real output in that mode, but can only check the actual tomorrow when I'm back home. [only remote board access right now]

I've also tried to make sure I'm not running into out-of-memory errors and tried cma sizes of 64, 128 and even 256MB.

So it looks like it's either arm64 or Mali450 specific ... especially as @superna9999 seems to see the same issue.

yuq commented 6 years ago

It sounds very likely a memory or cache problem. After comparing your mali450 support commit and official mali driver, seems you'll always flush the GP and PP L2 cache at the same time for both GP and PP task worker. But mali driver only flush GP L2 cache when GP worker and flush PP L2 cache when PP worker.

mmind commented 6 years ago

Another observation:

As long as kmscube didn't run sucessfully, I always seem to get the mmu pagefault [ 70.608826] lima_gp_start_task [ 70.613766] lima ff300000.gpu: mmu page fault at 0x7dcadaa0 from bus id 0 of type read on gpmmu [ 70.614746] [drm] lima worker wait task error [ 70.615294] lima ff300000.gpu: mmu resume [ 71.618574] [drm] lima worker wait dep fence error -110

and after kmscube ran sucessfully (with a smaller resolution for example) it seems to switch to the pp error irq for all future kmscube starts: [ 935.068815] lima_gp_start_task [ 935.071314] lima_gp_end_task [ 935.073764] lima_pp_start_task pp0 [ 935.076260] lima_pp_start_task pp1 [ 935.080365] lima ff300000.gpu: pp error irq state=200 status=41 [ 935.084390] [drm] lima worker wait task error

superna9999 commented 6 years ago

Offscreen rendering works as expected: dump0 Same with 1024x768 resolution with the meson driver in rgba mode: dump0

anarsoul commented 6 years ago

@superna9999 looks like you have R and B channels swapped in textured cube.

superna9999 commented 6 years ago

@anarsoul I have the same inversion with the offscreen rendering:

$ kmscube -d -D /dev/dri/card0 (lima is on card0)

Execution log: http://pastebin.baylibre.com/view/8e034245

anarsoul commented 6 years ago

@superna9999 probably texture descriptor differs on mali450.

mmind commented 6 years ago

@superna9999 can you also try offscreen rendering at 1920x1080 please? (Size settings at the bottom of kmscube/common.h)

superna9999 commented 6 years ago

@mmind Done, here is the log:

Using display 0x36cc04d0 with EGL version 1.4
===================================
EGL information:
  version: "1.4 (DRI2)"
  vendor: "Mesa Project"
  client extensions: "EGL_EXT_client_extensions EGL_EXT_platform_base EGL_KHR_client_get_all_proc_addresses EGL_KHR_debug EGL_EXT_platform_x11 EGL_MESA_platform_gbm"
  display extensions: "EGL_EXT_buffer_age EGL_EXT_image_dma_buf_import EGL_KHR_cl_event2 EGL_KHR_config_attribs EGL_KHR_create_context EGL_KHR_create_context_no_error EGL_KHR_fence_sync EGL_KHR_get_all_proc_addresses EGL_KHR_gl_renderbuffer_image EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_image EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_no_config_context EGL_KHR_reusable_sync EGL_KHR_surfaceless_context EGL_KHR_wait_sync EGL_MESA_configless_context EGL_MESA_drm_image EGL_MESA_image_dma_buf_export "
===================================
lima_resource_create: pres=0x36cdcee0 width=65536 height=1 depth=1 target=0 bind=10
lima_resource_create: pres=0x36cdda10 width=1920 height=1080 depth=[   48.311568] lima d00c0000.gpu: mmu page fault at 0x51c740e0 from bus id 0 of type read on gpmmu
[   48.320091] [drm] lima worker wait task error
[   48.324433] lima d00c0000.gpu: mmu resume
1 target=2 bind=18000a
lima_surface_create: pres=0x36cdda10 psurf=0x36cddb70
OpenGL ES 2.x information:
  version: "OpenGL ES 2.0 Mesa 17.3.0"
  shading language version: "OpenGL ES GLSL ES 1.0.16"
  vendor: "lima"
  renderer: "Mali450"
  extensions: "GL_EXT_blend_minmax GL_EXT_multi_draw_arrays GL_EXT_texture_format_BGRA8888 GL_OES_compressed_ETC1_RGB8_texture GL_OES_depth24 GL_OES_element_index_uint GL_OES_fbo_render_mipmap GL_OES_mapbuffer GL_OES_rgb8_rgba8 GL_OES_stencil8 GL_OES_texture_3D GL_OES_texture_npot GL_OES_vertex_half_float GL_OES_EGL_image GL_OES_depth_texture GL_OES_packed_depth_stencil GL_OES_get_program_binary GL_APPLE_texture_max_level GL_EXT_discard_framebuffer GL_EXT_read_format_bgra GL_EXT_frag_depth GL_NV_fbo_color_attachments GL_OES_EGL_image_external GL_OES_EGL_sync GL_OES_vertex_array_object GL_EXT_unpack_subimage GL_NV_draw_buffers GL_NV_read_buffer GL_NV_read_depth GL_NV_read_depth_stencil GL_NV_read_stencil GL_EXT_draw_buffers GL_EXT_map_buffer_range GL_KHR_debug GL_OES_required_internalformat GL_OES_surfaceless_context GL_EXT_separate_shader_objects GL_EXT_compressed_ETC1_RGB8_sub_texture GL_EXT_draw_elements_base_vertex GL_EXT_texture_border_clamp GL_KHR_context_flush_control GL_OES_draw_elements_base_vertex GL_OES_texture_border_clamp GL_KHR_no_error "
===================================
lima_screen_get_compiler_options
lima_screen_get_compiler_options
lima_resource_create: pres=0x37141840 width=864 height=1 depth=1 target=0 bind=10
lima_transfer_map: pres=0x37141840
lima_transfer_map: pres=0x37141840
lima_transfer_map: pres=0x37141840
lima_set_framebuffer_state
lima_set_framebuffer_state: psurf color=0x36cddb70 z=(nil)
fb dim change tiled=120/68 block=15/17 shift=3/2
lima_clear
lima_clear
lima_create_depth_stencil_alpha_state
depth enable=0 min_b=0.000000 max_b=0.000000
lima_bind_depth_stencil_alpha_state
lima_create_fs_state
lima_bind_fs_state
lima_create_vs_state
lima_bind_vs_state
lima_set_polygon_stipple
lima_create_blend_state
lima_bind_blend_state
lima_create_rasterizer_state
lima_bind_rasterizer_state
lima_set_viewport_states
viewport scale=960.000000/-540.000000/0.500000 translate=960.000000/540.000000/0.500000
glViewport x/y/w/h = 0.000000/0.000000/1920.000000/1080.000000
glDepthRange n/f = 0.000000/1.000000
lima_set_constant_buffer
shader 0 index 0 cb buffer (nil) offset 0 size d0
lima_create_vertex_elements_state
lima_bind_vertex_elements_state
lima_set_vertex_buffers
lima_draw_vbo
lima_draw_vbo
lima_draw_vbo
lima_draw_vbo
lima_draw_vbo
lima_draw_vbo
lima_flush_resource
lima_flush
lima_flush: flags=1
lima_resource_create: pres=0x37188e10 width=65536 height=1 depth=1 target=0 bind=10
lima_transfer_map: pres=0x36cdda10
[   49.344526] [drm] lima worker wait dep fence error -110
kmscube: dump.c:128: dump_run: Assertion `result' failed.
Aborted
# 

Then:

# dmesg | grep lima
[    2.322168] lima d00c0000.gpu: bus rate = 166666667
[    2.324805] lima d00c0000.gpu: mod rate = 666666666
[    2.329906] lima d00c0000.gpu: found 3 PPs
[    2.333689] lima d00c0000.gpu: l2 cache 8K, 4-way, 64byte cache line, 128bit external bus
[    2.341829] lima d00c0000.gpu: gp - mali450 version major 0 minor 0
[    2.348072] lima d00c0000.gpu: l2 cache 64K, 4-way, 64byte cache line, 128bit external bus
[    2.356215] lima d00c0000.gpu: pp0 - mali450 version major 0 minor 0
[    2.362520] lima d00c0000.gpu: pp1 - mali450 version major 0 minor 0
[    2.368925] lima d00c0000.gpu: mmu command 6 timeout
[    2.373691] lima d00c0000.gpu: bringup pp 2/3
[    2.378300] [drm] Initialized lima 1.0.0 20170325 for d00c0000.gpu on minor 0
[   48.311568] lima d00c0000.gpu: mmu page fault at 0x51c740e0 from bus id 0 of type read on gpmmu
[   48.320091] [drm] lima worker wait task error
[   48.324433] lima d00c0000.gpu: mmu resume
[   49.344526] [drm] lima worker wait dep fence error -110
mmind commented 6 years ago

@superna9999 cool in so far as it looks like we're really hitting the same issue, which in turn seems to be Mali450 specific and somehow tied to the output resolution.

superna9999 commented 6 years ago

@mmind good I can reproduce the same

yuq commented 6 years ago

@mmind I meet same problem when offscreen rendering in 1920x1080 resolution on mali450 and fixed with latest two mesa commits but have no chance try the onscreen rendering, does it work for you?

mmind commented 6 years ago

@yuq summary: better but still some issues, detail below :-)

I'm living on the bleeding edge, so the lima code is sitting on top of what was in torvalds master yesterday. And I did the upgrade to mesa 18.0 on my Debian-based test systems.

Dumping kmscube at 1080p does still work (extended to 400 frames to make sure), running on the rk3328-display at 720p also still works for at least 5 minutes. So everything that worked before seems to still work :-) .

Running on the rk3328-display at 1080p does start nicely, kmscube looks great, but always after around 30 seconds or so I end up with something like: [ 80.337843] ------------[ cut here ]------------ [ 80.338588] kernel BUG at ../mm/vmalloc.c:1621! [ 80.339256] Internal error: Oops - BUG: 0 [#2] PREEMPT SMP [ 80.340049] Modules linked in: lima rockchip_io_domain dw_hdmi_i2s_audio rockchip_thermal gpu_sched dw_wdt ip_tables ipv6 smsc95xx smsc75xx ax88179_178a rtc_rk808 [ 80.342171] CPU: 3 PID: 1719 Comm: kmscube Tainted: G D 4.16.0-08669-g3d5c2b089303 #834 [ 80.343485] Hardware name: Pine64 Rock64 (DT) [ 80.344125] pstate: 00000005 (nzcv daif -PAN -UAO) [ 80.344833] pc : vunmap+0x30/0x38 [ 80.345330] lr : __dma_free+0x78/0xb0 [ 80.345866] sp : ffff00000d5cbb60 [ 80.346354] x29: ffff00000d5cbb60 x28: 0000000079008000 [ 80.347132] x27: ffff000008a67000 x26: 000000000183d000 [ 80.347910] x25: ffff0000096a5000 x24: 0000000000000004 [ 80.348687] x23: 0000000079008000 x22: 0000000000000000 [ 80.349463] x21: 0000000000000000 x20: ffff80007718a410 [ 80.350239] x19: 0000000000001000 x18: 0000000004400000 [ 80.351016] x17: 0000ffff8ad55c50 x16: ffff000008252a10 [ 80.351792] x15: 0000000002000000 x14: 0000000000000000 [ 80.352569] x13: 6000000000010002 x12: 6000000000010001 [ 80.353345] x11: 5000000000000000 x10: 0000000000000040 [ 80.354121] x9 : ffff8000773ac038 x8 : ffff80007600e238 [ 80.354898] x7 : ffff80007600e260 x6 : 0000000000000000 [ 80.355675] x5 : ffff000008dac000 x4 : 0000000000000004 [ 80.356314] x3 : ffff00000954cfff x2 : ffff800077130508 [ 80.356874] x1 : 00000000ffffffff x0 : ffff0000096a5000 [ 80.357435] Process kmscube (pid: 1719, stack limit = 0x0000000068f68c62) [ 80.358145] Call trace: [ 80.358408] vunmap+0x30/0x38 [ 80.358726] __dma_free+0x78/0xb0 [ 80.359087] lima_vm_unmap_page_table.part.0+0xd4/0x128 [lima] [ 80.359705] lima_vm_unmap+0x48/0xb8 [lima] [ 80.360151] lima_gem_va_unmap+0xec/0x120 [lima] [ 80.360643] lima_ioctl_gem_va+0x3c/0x60 [lima] [ 80.361124] drm_ioctl_kernel+0x6c/0xf0 [ 80.361530] drm_ioctl+0x188/0x390 [ 80.361894] do_vfs_ioctl+0xa4/0x8d8 [ 80.362272] ksys_ioctl+0x84/0xb8 [ 80.362625] SyS_ioctl+0xc/0x18 [ 80.362960] el0_svc_naked+0x30/0x34 [ 80.363341] Code: 97ffff8a a8c17bfd d65f03c0 d503201f (d4210000) [ 80.363982] ---[ end trace d453994c58b74cb0 ]--- [ 80.364470] Kernel panic - not syncing: Aiee, killing interrupt handler! [ 80.365174] SMP: stopping secondary CPUs [ 80.365592] Kernel Offset: disabled [ 80.365962] CPU features: 0x40802004 [ 80.366339] Memory Limit: none [ 80.366666] ---[ end Kernel panic - not syncing: Aiee, killing interrupt handler! ]---

mmind commented 6 years ago

And some more that I just saw:

yuq commented 6 years ago

Thanks for your summary, seems the original problem is gone but there's some new problem. I'll try to reproduce it when I'm able to try the onscreen rendering.

superna9999 commented 6 years ago

(new comment, last comment was with -M smooth) On LePotato - Amlogic S905X using:

yuq commented 6 years ago

Good to see it works on at least one mali450 SoC. Maybe the new problem is due to the kernel difference with linux-4.16 and linux-master @mmind ?

mmind commented 6 years ago

Hmm, hard to check, as the current head includes a lot of Rockchip-specific stuff needed for actual display on the rk3328. So it's not really that easy

The bug in the dump above is caused by BUG_ON(in_interrupt()); in vunmap, so possibly some locking issue?

mmind commented 6 years ago

Also when running kmscube (non-rgba variant) and not waiting for the vunmap issue above but instead killing it with Ctrl+C, I always seem to get [ 2600.465103] lima ff300000.gpu: mmu page fault at 0x474800 from bus id 0 of type read on ppmmu1 [ 2600.466394] lima ff300000.gpu: mmu page fault at 0x474800 from bus id 0 of type read on ppmmu0 [ 2600.469451] lima ff300000.gpu: mmu resume [ 2600.470465] lima ff300000.gpu: mmu resume

at the end

anarsoul commented 6 years ago

@yuq with latest mesa I get a lot of these while running kmscube -M rgba

[ 472.894739] lima 1c40000.gpu: mmu page fault at 0x121020 from bus id 0 of type read on ppmmu0 [ 472.913009] lima 1c40000.gpu: mmu page fault at 0x121000 from bus id 0 of type read on ppmmu1 [ 472.930897] lima 1c40000.gpu: mmu resume [ 472.944033] lima 1c40000.gpu: mmu resume

yuq commented 6 years ago

@mmind @anarsoul When refining MM to use TTM, I found the previous VM implementation has bug which will unmap unfinished tasks' BO and cause the MMU fault/PP error. With latest kernel commits, it should be fixed. Ctrl+C problem should also be fixed by waiting task finish before free BOs. I can't reproduce your problems, so maybe you can try on your side to see if it's OK now.

anarsoul commented 6 years ago

I'm currently on vacation. Will check in 1.5 weeks

On Mon, May 7, 2018, 04:52 Qiang Yu notifications@github.com wrote:

@mmind https://github.com/mmind @anarsoul https://github.com/anarsoul When refining MM to use TTM, I found the previous VM implementation has bug which will unmap unfinished tasks' BO and cause the MMU fault/PP error. With latest kernel commits, it should be fixed. Ctrl+C problem should also be fixed by waiting task finish before free BOs. I can't reproduce your problems, so maybe you can try on your side to see if it's OK now.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yuq/mesa-lima/issues/26#issuecomment-387041111, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFAiei6GL3O7VSMzqQsjxgOfmCJBA2_ks5twDVlgaJpZM4Rhkzz .

mmind commented 6 years ago

@yuq it looks like your recent changes really did improve things. The kmscubs is now rotating here on the mali450 in 1080p for like 5 minutes without any glitches, where it failed after some seconds previously.

And it also exits cleanly now.

yuq commented 6 years ago

@mmind Nice to hear it's OK for you now.

anarsoul commented 6 years ago

It seems to work for me now. Issue can be closed.

mmind commented 6 years ago

Everybody seems to be happy now. So clean up the issue list a bit :-)