Closed Kristoff-starling closed 2 years ago
I'm sorry that in the latest version of xv6, this bug has been fixed by an if statement
if (recovering == 0)
bunpin(dbuf);
I refered to an old version (2020) of xv6 and mistakenly proposed the issue. I apologize for my carelessness.
Hello, I'm recently studying the principle of xv6 and benefits a lot from the explicit comments and the textbook. I think that there may be a bug in the logging layer in the file system. I'll describe the details below:
This is the code of install_trans() in
/kernel/log.c
:the line
bunpin(dbuf)
unpinsdbuf
in the buffer cache so that this block can be evicted out of the memory afterbrelse(dbuf)
. The code works correctly if everything goes normally: there is abpin()
inlog_write()
which pins the block in the buffer cache by incrementing its reference count and thebunpin()
here decrements it.However, if a crash happened during the installation of a transaction, after xv6 reboots, it would call
recover_from_log()
ininitlog()
and callinstall_trans
to write logging blocks into their home blocks. At this moment, there isn't abpin()
before, so thebunpin()
here will decrement the reference count to zero and inbrelse()
the reference count will experience an underflow and becomeUINT_MAX
since therefcnt
is auint
type variable. As a result, the block will stay in the buffer cache "forever".We can expose this bug by adding two lines in xv6:
We emulate a crash by adding a
panic()
ininstall_trans()
like this:Here the condition
log.lh.n != 0
ensures that thepanic()
won't be triggered during the recover procedure.We add an assertion in
brelse()
:Before decrementing the reference count,
b->refcnt >= 1
should always hold.After we fire up xv6 for the first time, we'll encounter a "crash", and after rebooting, xv6 will panic at
panic("bug")
. I made a disk image,fs-bug.img
, which records a crashed xv6 using the first step. If we load this image, we'll directly get a buggy scene (and can be detected by the assertion inbrelse()
). The disk image is available here.A possible solution for this problem is to add a
bpin()
inread_head()
so that thebunpin()
during the recover procedure will have correspondingbpin()
.I hope that my suggestions will be helpful. Welcome for further discussions, thanks!