Open xaiki opened 3 years ago
Thanks for trying this out and reporting the results! I had a look at the code in kernel/futex.c and the only place that futex_init() calls cmpxchg_futex_value_locked() is in futex_detect_cmpxchg(). It is passed a NULL, and the error seems to be a null dereference, but the NULL is clearly intentional and has been there for a long time. I would probably try to git-bisect and find the patch which causes it to stop working. It can be a slow laborious process.
i've tried your 5.6 branch but it got me to the same point (i remember succesfully runing it before) so I looked a bit more into it, the NULL call to cmpxchg_futex_value_locked is:
pagefault_disable();
ret = futex_atomic_cmpxchg_inatomic(curval, uaddr, uval, newval);
pagefault_enable();
I couldn't find where pagefault_disable() is defined.
the pagefault (that is intentional) should not happen, the comentay says:
/*
* This will fail and we want it. Some arch implementations do
* runtime detection of the futex_atomic_cmpxchg_inatomic()
* functionality. We want to know that before we call in any
* of the complex code paths. Also we want to prevent
* registration of robust lists in that case. NULL is
* guaranteed to fault and we get -EFAULT on functional
* implementation, the non-functional ones will return
* -ENOSYS.
*/
i've hacked that call to always return futex_cmpxchg_enabled = 0;
but 5.9.1 hanged later in the boot, i'm now retrying with your 5.7.2
same happens with 5.7.2, with my hack it boots, but hangs before passing me to the busybox shell.
There is a report of similar problem on http://groups.google.com/group/gnubee/t/b21f65a820e43b62 which was resolved by using a different version of the compiler. I'm using gcc-7.2.0 and bin-utils 2.29.1.20170915 without problems. I can build a boot 5.10.1 without this crash. What versions are you using?
I've tried building 5.7, and it panics on boot: