Open johnny-mnemonic opened 1 year ago
The problem for IP22 also happens with non-ports binaries:
indy# ls /bin
[ cp domainname kill mt rm sleep
cat cpio echo ksh mv rmdir stty
chgrp csh ed ln pax sh sync
chio date eject ls ps sha1 tar
chmod dd expr md5 pwd sha256 test
cksum df hostname mkdir rksh sha512
indy# ls
.Xdefaults .config .cshrc .cvsrc .login .profile .ssh
indy# ls /
.cshrc bin dev mnt swap usr
.profile bsd etc root sys var
altroot bsd.booted home sbin tmp
indy# time sha256 /bsd
panic: kernel diagnostic assertion "!ISSET(bp->b_flags, B_DMA)" failed: file "/
usr/src/sys/kern/vfs_bio.c", line 388
Stopped at db_enter+0x4: jr ra
db_enter+0x8: nop
TID PID UID PRFLAGS PFLAGS CPU COMMAND
*473577 31906 0 0x100003 0 0 sha256
db_enter+0x4 (56daa729608a8de9,900000001fbd9880,900000001fbd9830,0) ra 0xfffff
fff889393d8 sp 0xffffffff8fc38ed0, sz 0
panic+0x178 (56daa729608a8de9,ffffffff88b48408,ffffffff88b48ae8,ffffffff88b488d
0) ra 0xffffffff8893b1d8 sp 0xffffffff8fc38ed0, sz 112
__assert+0x38 (56daa729608a8de9,ffffffff88b48408,ffffffff88b48ae8,ffffffff88b48
8d0) ra 0xffffffff88814404 sp 0xffffffff8fc38f40, sz 32
buf_flip_high+0x2d4 (56daa729608a8de9,ffffffff88b48408,ffffffff88b48ae8,fffffff
f88b488d0) ra 0xffffffff8881519c sp 0xffffffff8fc38f60, sz 48
bufcache_recover_dmapages+0xec (56daa729608a8de9,ffffffff88b48408,ffffffff88b48
ae8,ffffffff88b488d0) ra 0xffffffff88815630 sp 0xffffffff8fc38f90, sz 96
bufadjust+0x168 (56daa729608a8de9,ffffffff88b48408,ffffffff88b48ae8,ffffffff88b
488d0) ra 0xffffffff88817674 sp 0xffffffff8fc38ff0, sz 48
bufbackoff+0xcc (56daa729608a8de9,ffffffff88b48408,ffffffff88b48ae8,ffffffff88b
488d0) ra 0xffffffff88acde48 sp 0xffffffff8fc39020, sz 80
buf_realloc_pages+0x108 (56daa729608a8de9,ffffffff88b48408,ffffffff88b48ae8,fff
fffff88b488d0) ra 0xffffffff888141c8 sp 0xffffffff8fc39070, sz 96
buf_flip_high+0x98 (56daa729608a8de9,ffffffff88b48408,ffffffff88b48ae8,ffffffff
88b488d0) ra 0xffffffff8881519c sp 0xffffffff8fc390d0, sz 48
bufcache_recover_dmapages+0xec (56daa729608a8de9,ffffffff88b48408,ffffffff88b48
ae8,ffffffff88b488d0) ra 0xffffffff88815630 sp 0xffffffff8fc39100, sz 96
bufadjust+0x168 (56daa729608a8de9,ffffffff88b48408,ffffffff88b48ae8,ffffffff88b
488d0) ra 0xffffffff88817674 sp 0xffffffff8fc39160, sz 48
bufbackoff+0xcc (56daa729608a8de9,ffffffff88b48408,ffffffff88b48ae8,ffffffff88b
488d0) ra 0xffffffff88acde48 sp 0xffffffff8fc39190, sz 80
buf_realloc_pages+0x108 (56daa729608a8de9,ffffffff88b48408,ffffffff88b48ae8,fff
fffff88b488d0) ra 0xffffffff888141c8 sp 0xffffffff8fc391e0, sz 96
buf_flip_high+0x98 (56daa729608a8de9,ffffffff88b48408,ffffffff88b48ae8,ffffffff
88b488d0) ra 0xffffffff8881519c sp 0xffffffff8fc39240, sz 48
User-level: pid 31906
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports. Insufficient info makes it difficult to find and fix bugs.
ddb>
After checking various configurations it looks like the IP22 problem is not new, because older kernels (7.2, 7.1 and 7.0 with matching octeon FSes and 6.9 with matching sgi FS) are also affected and show the same result. So maybe this is just an effect of using a NFS root FS or a bug that exists since a while.
UPDATE: Testing on a R4600 Indy showed that it is not affected by this, so it could be that this issue is specific to the R4400 of the Indy I originally tested. I am unsure if it is related to the errata mentioned in https://github.com/the-machine-hall/openbsd-src/commit/64ac1c5a7e13fbe4130b9b53f956c4ebff13c665 and https://github.com/the-machine-hall/openbsd-src/commit/833ab59f79f5195f7dcd0b5b888b8d2f3335eac5.
An update for IP28:
After bisecting the problem for IP28 it turned out that it is actually two-fold and related to the introduction of the clockintr(9) subsystem and specifically to the two following commits:
While the first one can be worked around by not using clockintr for IP28:
diff --git a/sys/arch/mips64/include/_types.h b/sys/arch/mips64/include/_types.h
index 535abead1de..3b30986770c 100644
--- a/sys/arch/mips64/include/_types.h
+++ b/sys/arch/mips64/include/_types.h
@@ -35,7 +35,9 @@
#ifndef _MIPS64__TYPES_H_
#define _MIPS64__TYPES_H_
+#if !( defined(TGT_INDIGO2) && defined(CPU_R10000) )
#define __HAVE_CLOCKINTR
+#endif
/*
* _ALIGN(p) rounds p (pointer or byte index) up to a correctly-aligned
diff --git a/sys/arch/mips64/mips64/mips64_machdep.c b/sys/arch/mips64/mips64/mips64_machdep.c
index be07540f045..f4412f73c9c 100644
--- a/sys/arch/mips64/mips64/mips64_machdep.c
+++ b/sys/arch/mips64/mips64/mips64_machdep.c
@@ -349,12 +349,14 @@ cpu_initclocks(void)
(*md_startclock)(ci);
}
void
setstatclockrate(int newhz)
{
+#ifdef __HAVE_CLOCKINTR
clockintr_setstatclockrate(newhz);
+#endif
}
/*
* Decode instruction and figure out type.
*/
...for the second one no workaround nor solution is available yet.
It is also unclear, why the other machines I can test (Indy (IP22), Origin200 (IP27), Octane/Octane2 (IP30), O2 (IP32)) are unaffected by the two commits mentioned above.
Comparing the kernel configuration files for IP22 and IP28 uncovered a small difference between both, namely the existence of a clock0
"device" on IP22 and none of that on IP28. Actually all of the other SGI systems I have available for testing and supported by OpenBSD/sgi use a clock0
device. Checking the history of those files I came across:
commit 64ac1c5a7e13fbe4130b9b53f956c4ebff13c665
Author: miod <miod@openbsd.org>
Date: Sat Jul 14 19:53:27 2012 +0000
A known errata of R4000 and R4400 processors, is that reading the internal
counter register close to a trigger of the counter interrupt, may cause the
interrupt not to be generated. This makes it a bad idea to use the internal
counter both for the scheduling clock and for delay().
Therefore, on IP22 systems (and IP28 because it makes my life easier), use
one of the two 8254 timers connected to the onboard interrupt controller as
the scheduling clock source.
Adapted from NetBSD.
...which switched both IP22 and IP28 from using clock0
to the timers connected to int0
. This was soon after took back for the Indy with:
commit 833ab59f79f5195f7dcd0b5b888b8d2f3335eac5
Author: miod <miod@openbsd.org>
Date: Wed Jul 18 19:56:02 2012 +0000
According to Linux, and just verified the hard way, the 8254 timer does not
interrupt on Indy; do not use it on such systems. Then, bring back a clock0 at
mainbus attachment to IP22 kernels, and attach it late in the autoconf process
if no other device has claimed the clock yet.
This means R4000 and R4400 based Indy may experience the lost clock interrupt
processor errata again, until a better way to skirt it is found.
And "bring[ing] back a clock0" to IP28 with:
diff --git a/sys/arch/sgi/conf/GENERIC-IP28 b/sys/arch/sgi/conf/GENERIC-IP28
index 9918a08414c..afcae927626 100644
--- a/sys/arch/sgi/conf/GENERIC-IP28
+++ b/sys/arch/sgi/conf/GENERIC-IP28
@@ -37,6 +37,7 @@ config bsd swap generic
#
mainbus0 at root
cpu* at mainbus0
+clock0 at mainbus0
int0 at mainbus0 # Interrupt Controller and scheduling clock
imc0 at mainbus0 # Memory Controller
diff --git a/sys/arch/sgi/conf/RAMDISK-IP28 b/sys/arch/sgi/conf/RAMDISK-IP28
index e07ea14fbe7..389b0d3655d 100644
--- a/sys/arch/sgi/conf/RAMDISK-IP28
+++ b/sys/arch/sgi/conf/RAMDISK-IP28
@@ -31,6 +31,7 @@ config bsd root on rd0a swap on rd0b
mainbus0 at root
cpu* at mainbus0
+clock0 at mainbus0
int0 at mainbus0 # Interrupt Controller and scheduling clock
imc0 at mainbus0 # Memory Controller
diff --git a/sys/arch/sgi/localbus/int.c b/sys/arch/sgi/localbus/int.c
index c76df00762d..09c06291ce5 100644
--- a/sys/arch/sgi/localbus/int.c
+++ b/sys/arch/sgi/localbus/int.c
@@ -375,8 +375,7 @@ int2_attach(struct device *parent, struct device *self, void *aux)
/*
* The 8254 timer does not interrupt on (some?) IP24 systems.
*/
- if (sys_config.system_type == SGI_IP20 ||
- sys_config.system_subtype == IP22_INDIGO2)
+ if (sys_config.system_type == SGI_IP20)
int_8254_cal();
}
...fixes/works around the breakage caused by the two commits mentioned in https://github.com/the-machine-hall/openbsd-src/issues/2#issuecomment-1510429607. Together with the fix/workaround from #1 the IP28 kernel boots fine again, see https://dmesgd.nycbug.org/index.cgi?do=view&id=7100 for details.
IP22_INDIGO2
actually includes all Indigo²s (i.e. IP22, IP26 and IP28), so removing that from the clause might be a little too much, but as I only rebuild the kernel for IP28 with this patch applied it makes no difference for IP22 and IP26. I might later enable the timers again for IP22 and IP26 but expect them to break similarly to IP28 w/o the patch. So in the end it might be the better solution to use a clock0
on IP20, IP22 and IP26, too, or fixing the 8254 related code in regard to the commits mentioned in https://github.com/the-machine-hall/openbsd-src/issues/2#issuecomment-1510429607.
Something went broke between OpenBSD/sgi 7.2 and 7.3 for IP22 (tested on Indy) and IP28 (R10000 Indigo²).
7za
the kernel panics:login: [...] indy# 7za b -md=2m panic: kernel diagnostic assertion "!ISSET(bp->b_flags, B_DMA)" failed: file "/ usr/src/sys/kern/vfs_bio.c", line 388 Stopped at db_enter+0x4: jr ra db_enter+0x8: nop TID PID UID PRFLAGS PFLAGS CPU COMMAND *521961 35108 0 0x3 0 0 7za db_enter+0x4 (56daa729608a8de9,900000001fbd9880,900000001fbd9830,0) ra 0xfffff fff88972788 sp 0xffffffff88460cb0, sz 0 panic+0x178 (56daa729608a8de9,ffffffff88b46380,ffffffff88bae2a8,ffffffff88bae0b 0) ra 0xffffffff88974578 sp 0xffffffff88460cb0, sz 112 __assert+0x38 (56daa729608a8de9,ffffffff88b46380,ffffffff88bae2a8,ffffffff88bae 0b0) ra 0xffffffff88b319b4 sp 0xffffffff88460d20, sz 32 buf_flip_high+0x2d4 (56daa729608a8de9,ffffffff88b46380,ffffffff88bae2a8,fffffff f88bae0b0) ra 0xffffffff88b3274c sp 0xffffffff88460d40, sz 48 bufcache_recover_dmapages+0xec (56daa729608a8de9,ffffffff88b46380,ffffffff88bae 2a8,ffffffff88bae0b0) ra 0xffffffff88b32be0 sp 0xffffffff88460d70, sz 96 bufadjust+0x168 (56daa729608a8de9,ffffffff88b46380,ffffffff88bae2a8,ffffffff88b ae0b0) ra 0xffffffff88b34c24 sp 0xffffffff88460dd0, sz 48 bufbackoff+0xcc (56daa729608a8de9,ffffffff88b46380,ffffffff88bae2a8,ffffffff88b ae0b0) ra 0xffffffff88a6dff8 sp 0xffffffff88460e00, sz 80 buf_realloc_pages+0x108 (56daa729608a8de9,ffffffff88b46380,ffffffff88bae2a8,fff fffff88bae0b0) ra 0xffffffff88b31778 sp 0xffffffff88460e50, sz 96 buf_flip_high+0x98 (56daa729608a8de9,ffffffff88b46380,ffffffff88bae2a8,ffffffff 88bae0b0) ra 0xffffffff88b3274c sp 0xffffffff88460eb0, sz 48 bufcache_recover_dmapages+0xec (56daa729608a8de9,ffffffff88b46380,ffffffff88bae 2a8,ffffffff88bae0b0) ra 0xffffffff88b32be0 sp 0xffffffff88460ee0, sz 96 bufadjust+0x168 (56daa729608a8de9,ffffffff88b46380,ffffffff88bae2a8,ffffffff88b ae0b0) ra 0xffffffff88b34c24 sp 0xffffffff88460f40, sz 48 bufbackoff+0xcc (56daa729608a8de9,ffffffff88b46380,ffffffff88bae2a8,ffffffff88b ae0b0) ra 0xffffffff88a6dff8 sp 0xffffffff88460f70, sz 80 buf_realloc_pages+0x108 (56daa729608a8de9,ffffffff88b46380,ffffffff88bae2a8,fff fffff88bae0b0) ra 0xffffffff88b31778 sp 0xffffffff88460fc0, sz 96 buf_flip_high+0x98 (56daa729608a8de9,ffffffff88b46380,ffffffff88bae2a8,ffffffff 88bae0b0) ra 0xffffffff88b3274c sp 0xffffffff88461020, sz 48 User-level: pid 35108 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb>
[...] OpenBSD 7.3 (GENERIC-IP28) #0: Thu Mar 30 17:57:07 CEST 2023 root@octane.machine-hall.org:/usr/src/sys/arch/sgi/compile/GENERIC-IP28 real mem = 268435456 (256MB) rsvd mem = 1064960 (2MB) avail mem = 259670016 (247MB) warning: no entropy supplied by boot loader random: boothowto does not indicate good seed mainbus0 at root: POWER Indigo2 R10000 cpu0 at mainbus0: MIPS R10000 CPU rev 2.5 194 MHz, R10000 FPU rev 0.0 cpu0: cache L1-I 32KB D 32KB 2 way, L2 1024KB 2 way [...] dsclock0 at hpc0 offset 0x00060000 eisa0 at imc0 irq 27
Trap cause = 13 Frame 0x9800000020007e18 Trap PC 0xa8000000200991d8 RA 0xa80000002029b4fc fault 0xc000000000808648 0xa8000000200990e0 (a8000000204a85c0,28,0,0) ra 0xa80000002029b4fc sp 0x9800000020007f70, sz 0 0xa80000002029ab28 (a8000000204a85c0,28,0,0) ra 0x0 sp 0x9800000020007f70, sz 0 User-level: pid 0 stopped on non ddb fault Stopped at 0xa8000000200991d8: teq v1,zero ddb>