Closed kuon closed 2 years ago
Having the same issue on arch with 5.6.19-rt11-1-rt
using the linux-rt
aur package
I think the culprit is
/var/tmp/portage/sys-fs/zfs-kmod-0.8.4-r1/work/zfs-0.8.4/include/spl/sys/mutex.h:63:41: error: ‘struct mutex’ has no member named ‘m_owner’ 63 | #define mutex_owner(mp) (READ_ONCE((mp)->m_owner)) | ^~
It seems that include/linux/mutex.h in the kernel has been modified and when CONFIG_PREEMT_RT is defined, the struct mutex is defined from include/linux/mutex_rt.h in a very reduced form.
mutex.h:
#ifdef CONFIG_PREEMPT_RT
# include <linux/mutex_rt.h>
#else
/*
* Simple, straightforward mutexes with strict semantics:
*
* - only one task can hold the mutex at a time
* - only the owner can unlock the mutex
* - multiple unlocks are not permitted
* - recursive locking is not permitted
* - a mutex object must be initialized via the API
* - a mutex object must not be initialized via memset or copying
* - task may not exit with mutex held
* - memory areas where held locks reside must not be freed
* - held mutexes must not be reinitialized
* - mutexes may not be used in hardware or software interrupt
* contexts such as tasklets and timers
*
* These semantics are fully enforced when DEBUG_MUTEXES is
* enabled. Furthermore, besides enforcing the above rules, the mutex
* debugging code also implements a number of additional features
* that make lock debugging easier and faster:
*
* - uses symbolic names of mutexes, whenever they are printed in debug output
* - point-of-acquire tracking, symbolic lookup of function names
* - list of all locks held in the system, printout of them
* - owner tracking
* - detects self-recursing locks and prints out all relevant info
* - detects multi-task circular deadlocks and prints out all affected
* locks and tasks (and only those tasks)
*/
struct mutex {
atomic_long_t owner;
spinlock_t wait_lock;
#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
struct optimistic_spin_queue osq; /* Spinner MCS lock */
#endif
struct list_head wait_list;
#ifdef CONFIG_DEBUG_MUTEXES
void *magic;
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
#endif
};
mutex_rt.h:
struct mutex {
struct rt_mutex lock;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
#endif
};
The linux-rt-devel git contains an unmodified mutex.h version:
struct mutex {
atomic_long_t owner;
spinlock_t wait_lock;
#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
struct optimistic_spin_queue osq; /* Spinner MCS lock */
#endif
struct list_head wait_list;
#ifdef CONFIG_DEBUG_MUTEXES
void *magic;
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
#endif
};
/*
* This is the control structure for tasks blocked on mutex,
* which resides on the blocked task's kernel stack:
*/
struct mutex_waiter {
struct list_head list;
struct task_struct *task;
struct ww_acquire_ctx *ww_ctx;
#ifdef CONFIG_DEBUG_MUTEXES
void *magic;
#endif
};
Although, I observe that all recent RT kernels I compiled also had this mutex_rt.h and I have the kernel 5.6.10-rt5 running with zfs-kmod-0.8.4-r0.
-r0 and -r1 packages have negligible differences related to Gentoo package management system. They are both pulling zfs-kmod-0.8.4 without any additional patches.
I confirm that I can build zfs-kmod against linux-5.6.10-rt5 and linux-5.6.14-rt7 but I fail with linux-5.6.17-rt10
I attach the diff for .config. all changes are silent and compulsory by make oldconfig script
home64 /usr/src/linux # diff -u ../linux-5.6.14-rt7/.config .config
--- ../linux-5.6.14-rt7/.config2020-08-19 22:14:48.186694577 +0300
+++ .config 2020-08-19 22:18:27.806446945 +0300
@@ -1,6 +1,6 @@
#
# Automatically generated file; DO NOT EDIT.
-# Linux/x86 5.6.14-rt Kernel Configuration
+# Linux/x86 5.6.17-rt Kernel Configuration
#
#
@@ -755,8 +755,12 @@
CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
# end of GCOV-based kernel profiling
-CONFIG_PLUGIN_HOSTCC=""
+CONFIG_PLUGIN_HOSTCC="g++"
CONFIG_HAVE_GCC_PLUGINS=y
+CONFIG_GCC_PLUGINS=y
+# CONFIG_GCC_PLUGIN_CYC_COMPLEXITY is not set
+# CONFIG_GCC_PLUGIN_LATENT_ENTROPY is not set
+# CONFIG_GCC_PLUGIN_RANDSTRUCT is not set
# end of General architecture-dependent options
CONFIG_RT_MUTEXES=y
@@ -835,6 +839,7 @@
CONFIG_QUEUED_SPINLOCKS=y
CONFIG_ARCH_USE_QUEUED_RWLOCKS=y
CONFIG_QUEUED_RWLOCKS=y
+CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE=y
CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE=y
CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
CONFIG_FREEZER=y
@@ -1992,6 +1997,7 @@
# CONFIG_PATA_LEGACY is not set
CONFIG_MD=y
# CONFIG_BLK_DEV_MD is not set
+# CONFIG_BCACHE is not set
CONFIG_BLK_DEV_DM_BUILTIN=y
CONFIG_BLK_DEV_DM=y
# CONFIG_DM_DEBUG is not set
@@ -4687,6 +4693,10 @@
# Memory initialization
#
CONFIG_INIT_STACK_NONE=y
+# CONFIG_GCC_PLUGIN_STRUCTLEAK_USER is not set
+# CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF is not set
+# CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL is not set
+# CONFIG_GCC_PLUGIN_STACKLEAK is not set
# CONFIG_INIT_ON_ALLOC_DEFAULT_ON is not set
# CONFIG_INIT_ON_FREE_DEFAULT_ON is not set
# end of Memory initialization
Since I identified the good and bad tags, I did the bisect. It landed on:
9a193af85af7a9b3493fe2cbf5bcacd3b17deb23 is the first bad commit
commit 9a193af85af7a9b3493fe2cbf5bcacd3b17deb23
Author: Ahmed S. Darwish <a.darwish@linutronix.de>
Date: Mon Jun 8 02:57:15 2020 +0200
seqlock: Extend seqcount API with associated locks
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the write side critical section.
There is no built-in debugging mechanism to verify that the lock used
for writer serialization is held and preemption is disabled. Some usage
sites like dma-buf have explicit lockdep checks for the writer-side
lock, but this covers only a small portion of the sequence counter usage
in the kernel.
Add new sequence counter types which allows to associate a lock to the
sequence counter at initialization time. The seqcount API functions are
extended to provide appropriate lockdep assertions depending on the
seqcount/lock type.
For sequence counters with associated locks that do not implicitly
disable preemption, preemption protection is enforced in the sequence
counter write side functions. This removes the need to explicitly add
preempt_disable/enable() around the write side critical sections: the
write_begin/end() functions for these new sequence counter types
automatically do this.
Introduce the following seqcount types with associated locks:
seqcount_spinlock_t
seqcount_raw_spinlock_t
seqcount_rwlock_t
seqcount_mutex_t
seqcount_ww_mutex_t
Extend the seqcount read and write functions to branch out to the
specific seqcount_LOCKTYPE_t implementation at compile-time. This avoids
kernel API explosion per each new seqcount_LOCKTYPE_t added. Add such
compile-time type detection logic into a new, internal, seqlock header.
Document the proper seqcount_LOCKTYPE_t usage, and rationale, at
Documentation/locking/seqlock.rst.
If lockdep is disabled, this lock association is compiled out and has
neither storage size nor runtime overhead.
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Documentation/locking/seqlock.rst | 64 +++++-
MAINTAINERS | 2 +-
include/linux/seqlock.h | 354 ++++++++++++++++++++++++++++-----
include/linux/seqlock_types_internal.h | 187 +++++++++++++++++
4 files changed, 555 insertions(+), 52 deletions(-)
create mode 100644 include/linux/seqlock_types_internal.h
I cloned the linux-rt-devel
I missed the fact that not every git commit in kernel is buildable. Some commits refuse to build the kernel so please disregard my previous git bisect result. Doing another bisect now with a corrected script.
There are only 'skip'ped commits left to test.
The first bad commit could be any of:
7661b6a8d21f577fa7c0462bfc2bd52c1ded650b
a068dfcf5c8c7572a836dd63354d808b238cdda3
ca9dcef5342284cae6acf2d6a7a17b73c60e31aa
cb167d22bc247d72a10f4496f129624b81ab00dc
91008b6fb0311c7665d71671abfc32782c02f0b8
c0ec30a148906827091cb01d9cd5a692aee63ce7
43fb0dd9b9e0916d636b7e7c23556f7cdace1385
d84a737eaca84b592ca30cd0947d24a914dafdcf
9f087bac2d9b3be177dd0207b57659fa81403395
c07c01230d3f027e1f6f5b8025bfc77910149cf2
00b57404b507dc5fd4597ace03cd4133ba8a5fa1
f78bb7e0b5946a3e278fbbdd8a92222f490ed1ea
0cd32a8017b46a11b3f66abfb8655f2c4995d6ab
8b02cb9102914f663fc42c6484605c1f554101d6
4d7a7901686f077e9587d2d37d902c3149fa6b32
9a193af85af7a9b3493fe2cbf5bcacd3b17deb23
4a90fbefff00ade907bad97c44c448976943b665
We cannot bisect more!
bisect run cannot continue any more
PF16W6Y2 /usr/src/linux #
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
I am trying to build the zfs dkms for the following kernel:
5.6.17-rt10-1-rt-bfq ( https://aur.archlinux.org/packages/linux-rt-bfq/ )
But it fails with this output:
I don't understand why, because
VERIFY3P
is defined insys/debug.h
(/var/lib/dkms/zfs/0.8.4/build/include/spl/sys/debug.h
)