Open tsutsui opened 2 years ago
Seems triggered by write(2) system calls against file system?
syscall: 293
syscall: 116
syscall: 121
syscall: 4
Data access fault (Write Violation) v = 0x13ae48, frame 0x6888b48
R00-05: 0x00000000 0x000bbbb0 0xffffd188 0x062c2e00 0x000001f8 0x726f6f74
R06-11: 0x00000000 0x00000004 0x00000000 0xf9d3ff88 0x00000000 0x00000998
R12-17: 0x060627d0 0x00000001 0x00000000 0x06888e88 0x00100000 0x0606c1a4
R18-23: 0x00000000 0x000b0000 0x060627d0 0x062c2e00 0x00000200 0x06888f00
R24-29: 0x06888f20 0x00000200 0x00000000 0x00000000 0x00000000 0x00000000
R30-31: 0x06887058 0x06888c48
sxip 13ae4a snip 13ae4e sfip 13ae52
dmt0 42bf dmd0 726f6f74 dma0 62c2e00
dmt1 4be dmd1 40 dma1 cff94
dmt2 0 dmd2 ccccccc dma2 ccccccc
fault type 7
[DMT0=42bf: st.s 726f6f74 to 62c2e00 as 15 not double not xmem]
fpsr 100000 fpcr 0 epsr 900003f0 ssbr 0
fpecr 0 fphs1 abb84 fpls1 0 fphs2 ae00 fpls2 6887058
fppt 7fad8 fprh 105358 fprl 606b4d0 fpit 0
vector 3 mask 0 mode 4 scratch1 1738d4 cpu 0x1a3480
panic: Data Access Exception
Stopped in pid 274.1 (login) at netbsd:cpu_Debugger+0x4: tb0 0
, r0, 0x84
db>
Maybe we should also check vfsops or buf pages etc.
Per investigation of stack addresses, the fault seems triggered via
ffs_write()
-> uiomove()
-> copyin()
-> copyin_right_aligned_to_doubleword()
.
It doesn't help (i.e. the access fault stills occurs) to make copyin()
to always use copyin_byte_only()
.
The following test diff changes the fault address from copyin()
variants to memset()
,
so buffer pages returned from ubc_alloc()
is not writable or properly mapped?
diff --git a/sys/ufs/ufs/ufs_readwrite.c b/sys/ufs/ufs/ufs_readwrite.c
index e862f1bf9b90..ac1c4b37e006 100644
--- a/sys/ufs/ufs/ufs_readwrite.c
+++ b/sys/ufs/ufs/ufs_readwrite.c
@@ -353,6 +353,7 @@ WRITE(void *v)
win = ubc_alloc(&vp->v_uobj, uio->uio_offset, &bytelen,
ubc_alloc_flags);
+memset(win, 0, bytelen);
error = uiomove(win, bytelen, uio);
if (error && extending) {
/*
Smells MD pmap issue.
Now I can reproduce the panic at the same va:
Data access fault (Write Violation) v = 0x17c768, frame 0x688eb68
R00-05: 0x00000000 0x0007fb40 0x060d2000 0x00000000 0x00000c00 0x00000000
R06-11: 0x00000000 0x060d2000 0x000002ff 0x00000004 0x00000001 0x00000c00
R12-17: 0x00000000 0x00000000 0x00000000 0x0688ee68 0x0607f294 0x00000002
R18-23: 0x00000000 0x00000000 0x0607f290 0x00000c00 0x0688eee0 0x0607ae70
R24-29: 0x00000000 0x060d2000 0x00000000 0x00000000 0x00000000 0x00000000
R30-31: 0x0688d058 0x0688ec68
sxip 17c76a snip 17c76e sfip 17c772
dmt0 433f dmd0 0 dma0 60d2000
dmt1 4bc dmd1 ffffc150 dma1 de004
dmt2 0 dmd2 ccccccc dma2 ccccccc
fault type 7
[DMT0=433f: st.s 0 to 60d2000 as 15 not double not xmem]
fpsr 607ae70 fpcr 7f6e8 epsr 900003f0 ssbr 1053a8
fpecr 0 fphs1 d800 fpls1 0 fphs2 0 fpls2 7fb14
fppt 688ee68 fprh 607f294 fprl 2 fpit 0
vector 3 mask 0 mode 4 scratch1 172864 cpu 0x1a2480
panic: Data Access Exception
db>
In that case, va=0x60d2000 is mapped by the following call:
pmap_enter(0x1a2700, 60d2000, 3b11000, 3, 22)
i.e. pmap_enter(9)
is called with
prot=VM_PROT_READ | VM_PROT_WRITE
and flags=PMAP_CANFAIL | VM_PROT_WRITE
.
Inconsistent prot
and flags
seems wrong, but why does this cause Write violation
!?
The pmap_enter()
is called from ubc_fault()
:
pmap_enter(0x1a2700, 60d2000, 3b10000, 3, 22)
panic: pmap_enter at 0x60d2000
Stopped in pid 14.1 (vi) at netbsd:cpu_Debugger+0x4: tb0 0
, r0, 0x84
db> bt
stack base = 0x688e850
(0) netbsd:cpu_Debugger+0x4(stackless)
(1) netbsd:panic+0x170(?, 0xd, 0, 0, 0x84000000, 0x65000000, 18e434, 18e444)
(2) netbsd:pmap_enter+0x280
(3) netbsd:ubc_fault+0x2b8
(4)?0x688ea1c
db>
pmap_enter(9) man page says:
int pmap_enter(pmap_t pmap, vaddr_t va, paddr_t pa, vm_prot_t prot,
u_int flags)
Create a mapping in physical map pmap for the physical
address pa at the virtual address va with protection
specified by bits in prot:
VM_PROT_READ The mapping must allow reading.
VM_PROT_WRITE The mapping must allow writing.
VM_PROT_EXECUTE The page mapped contains
instructions that will be executed
by the processor.
The flags argument contains protection bits (the same bits
as used in the prot argument) indicating the type of access
that caused the mapping to be created. This information
may be used to seed modified/referenced information for the
page being mapped, possibly avoiding redundant faults on
platforms that track modified/referenced information in
software.
So passing args as prot = VM_PROT_READ | VM_PROT_WRITE
and flags = PMAP_CANFAIL | VM_PROT_WRITE
is a valid op.
Anyway the Data access fault (Write Violation)
still occurs even if VM_PROT_READ
is added to flag
,
so we should check the PTE entry for the va is (unintentionally) modified after pmap_enter(9)
call.
ubc_fault()
in sys/uvm/uvm_bio.c
https://github.com/tsutsui/netbsd-src/blob/50aefa500ed0bb6425721f724205bd28dd95a3b7/sys/uvm/uvm_bio.c#L370-L371ubc_alloc()
in sys/ufs/ufs/ufs_readwrite.c
https://github.com/tsutsui/netbsd-src/blob/50aefa500ed0bb6425721f724205bd28dd95a3b7/sys/ufs/ufs/ufs_readwrite.c#L354-L355VM_PROT_READ | VM_PROT_WRITE
first so the PTE is set as PG_M|PG_U|PG_RW|PG_V
, but PG_RW
and PG_M
are cleared later by pmap_changebit()
from pmap_page_protect()
from genfs_putpage()
in sys/miscfs/genfs/genfs_vnop.c
https://github.com/tsutsui/netbsd-src/blob/50aefa500ed0bb6425721f724205bd28dd95a3b7/sys/miscfs/genfs/genfs_vnops.c#L1297-L1299Smells some inconsistency between MI UBC and MD m88k pmap, as #5, but needs more investigation what UBC and genfs actually do..
Looks triggerred on login attempts, but not 100% reproducible:
v = 0x13b2d8 of the kernel is copyin_right_aligned_to_doubleword() ?:
(should be confirmed again)