With kernel 6.2 the following possible deadlock gets reported:
[ 765.679269] ====================================================== [ 765.679270] WARNING: possible circular locking dependency detected
[ 765.679271] 6.2.0-rc5+ #40 Tainted: G O
[ 765.679272] ------------------------------------------------------
[ 765.679272] camerasrc0:src/6354 is trying to acquire lock:
[ 765.679273] ffff992aa6ad8158 (&mm->mmap_lock){++++}-{3:3}, at: ipu_dma_buf_map+0x16c/0x400 [intel_ipu6_psys]
[ 765.679281]
but task is already holding lock:
[ 765.679282] ffff992aa6059dc8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: dma_buf_map_attachment_unlocked+0x3d/0x90
[ 765.679286]
which lock already depends on the new lock.
[ 765.679287]
the existing dependency chain (in reverse order) is:
[ 765.679287]
-> #1 (reservation_ww_class_mutex){+.+.}-{3:3}:
[ 765.679289] __ww_mutex_lock.constprop.0+0xbc/0xfb0
[ 765.679292] ww_mutex_lock_interruptible+0x38/0xa0
[ 765.679293] vm_fault_cpu+0x32/0x1a0 [i915]
[ 765.679396] __do_fault+0x30/0x160
[ 765.679399] do_fault+0x2bf/0x440
[ 765.679400] __handle_mm_fault+0x671/0xfb0
[ 765.679401] handle_mm_fault+0x16b/0x410
[ 765.679403] do_user_addr_fault+0x1e0/0x6b0
[ 765.679406] exc_page_fault+0x7e/0x2b0
[ 765.679408] asm_exc_page_fault+0x22/0x30
[ 765.679411]
-> #0 (&mm->mmap_lock){++++}-{3:3}:
[ 765.679412] __lock_acquire+0x12fd/0x1fd0
[ 765.679415] lock_acquire+0xbf/0x2b0
[ 765.679416] down_read+0x3e/0x50
[ 765.679418] ipu_dma_buf_map+0x16c/0x400 [intel_ipu6_psys]
[ 765.679423] __map_dma_buf+0x1e/0x90
[ 765.679424] dma_buf_map_attachment+0xc3/0x120
[ 765.679426] dma_buf_map_attachment_unlocked+0x47/0x90
[ 765.679427] ipu_psys_mapbuf_locked+0x116/0x430 [intel_ipu6_psys]
[ 765.679431] ipu_psys_ioctl+0x1b1/0x4f0 [intel_ipu6_psys]
[ 765.679435] __x64_sys_ioctl+0x8d/0xd0
[ 765.679436] do_syscall_64+0x58/0x80
[ 765.679438] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 765.679440]
other info that might help us debug this:
[ 765.679441] Possible unsafe locking scenario:
[ 765.679441] CPU0 CPU1
[ 765.679442] ---- ----
[ 765.679442] lock(reservation_ww_class_mutex);
[ 765.679443] lock(&mm->mmap_lock);
[ 765.679444] lock(reservation_ww_class_mutex);
[ 765.679445] lock(&mm->mmap_lock);
[ 765.679446]
*** DEADLOCK ***
This is caused by ipu_psys_mapbuf_locked() calling dma_buf_map_attachment_unlocked() which takes reservation_ww_class_mutex and then calls back into ipu_dma_buf_map() which calls ipu_psys_get_userpages() which takes mm->mmap_lock.
So the IPU6 code takes locks in the following order:
lock(reservation_ww_class_mutex)
lock(&mm->mmap_lock)
But the core memory-management code takes these in the reverse oder causing an ABBA deadlock.
Fix this by moving the ipu_psys_get_userpages() call to attach time (to ipu_dma_buf_attach()) like how other dmabuf code in the kernel does this.
As a bonus this also allows properly propagating the error code from ipu_psys_get_userpages().
With kernel 6.2 the following possible deadlock gets reported:
This is caused by ipu_psys_mapbuf_locked() calling dma_buf_map_attachment_unlocked() which takes reservation_ww_class_mutex and then calls back into ipu_dma_buf_map() which calls ipu_psys_get_userpages() which takes mm->mmap_lock.
So the IPU6 code takes locks in the following order:
But the core memory-management code takes these in the reverse oder causing an ABBA deadlock.
Fix this by moving the ipu_psys_get_userpages() call to attach time (to ipu_dma_buf_attach()) like how other dmabuf code in the kernel does this.
As a bonus this also allows properly propagating the error code from ipu_psys_get_userpages().