Closed GoogleCodeExporter closed 9 years ago
Original comment by daniel.r...@gmail.com
on 28 Jan 2008 at 6:28
Here is what DanP wrote when we were tracking a bug related to btree delete
last March:
"We need to enter a bug to implement the full solution, which is: fix up
range delete so it exits after dirtying some maximum number of blocks,
commit the transaction after recording the resume point in the commit
block, and continue deleting at the resume point. On restart after
interruption, check whether a delete was in progress and continue the
delete if so.
Range delete already supports resume at a given logical address, it just
needs the logic to exit on maximum dirty blocks, returning the resume
point. Figuring out the exact resume point is a little tricky because the
btree delete algorithm itself is fairly difficult, which is why I would like
to defer this work a little until I have a change to give it the care and
attention it needs. This kind of development absolutely relies on unit
testing, since the corner case can be exceedingly rare and unlikely to
be caught by full system testing."
We haven't got the chance to implement the full solution, but rather just
relied on
the interim fix (set_buffer_dirty_check). Maybe it is time to implement the
full fix
and its unit test.
Jiaying
Original comment by jiahotc...@gmail.com
on 28 Jan 2008 at 7:04
In the attachment is a patch that hopefully will solve the problem. The idea is
to
check if we are within a defined threshold of the journal buffer limit, and
commit
the pending dirty buffers if so. The current code has the similar checking in
delete_tree_range but does not do the checking at any place that might have the
pending dirty buffers go unbounded. I am rerunning the big-copy test with the
fix.
That means we will know if the patch fixes the problem after two days. So any
test
that can trigger the problem faster will be a great help.
Jiaying
Original comment by jiahotc...@gmail.com
on 29 Jan 2008 at 12:17
Attachments:
I think I found some bug in snapshot deleting/squashing that may cause the fatal
problem. Here is what happened in my new test.
Since I used a small snapshot store, old snapshots were automatically squashed
to
free more space when I copied a large volume to zumastor. The snapshot
structures of
these snapshots were not freed though and the recorded number of snapshots was
NOT
decreased. Later when zumastor reached the limit of the specified number of
hourly
snapshots, it tried to delete the oldest hourly snapshot, say snapshot 0. The
problem
is that the current zumastor code first checks the usecount of that snapshot
before
calling 'ddsnap delete'. Because that snapshot was already squashed, 'ddsnap
usecount' returned 0 (see the usecount function in ddsnapd.c). As the result,
zumastor skipped calling 'ddsnap delete' to actually free that snapshot
structure.
Now we relied on the auto_delete feature of 'ddsnap server' when we reached the
64
maximum snapshot number. But there is also a bug there. Here are the related
lines of
code in ddsnapd.c:create_snapshot:
/* check if we are out of snapshots */
if ((snapshots >= MAX_SNAPSHOTS) && auto_delete_snapshot(sb))
return -EFULL;
We call auto_delete_snapshot when we are beyond the maximum number of snapshots
without checking again if we are below the limit after the function returns.
auto_delete_snapshot function returns 0 when it successfully deletes/squashes a
snapshot. Here is actually the squashing case so the number of snapshots was NOT
actually decreased.
Now with these two bugs, the number of snapshots can go beyond the 64 limit and
the
snapshot list/bitmap is corrupted. I guess that is why we saw the invalid
recorded
snapshot number in ddsnap superblock. I am not quite sure how it leads to the
failed
btee checking "probe: Failed assertion "((struct eleaf *)nodebuf->data)->magic
==
0x1eaf". But with the snapshot bitmap corrupted, it is possible that btee was
also
corrupted.
So the question now is if we want to remove the buggy snapshot squashing code
now and
have it for 0.6. The handling of snapshot squashing is in a lot of places. It
may
take several days for us to clean up the code and will take even more time to
test
it. As a quick fix, we can just fix the two bugs mentioned above. Any
suggestions?
Jiaying
Original comment by jiahotc...@gmail.com
on 29 Jan 2008 at 8:01
In the attachment is the patch for the quick fix. I now can reproduce the
problem in
about an hour, with a test that basically takes a snapshot every minute and
generates
a lot of writes at the same time to cause some old snapshots to be squashed.
The test
with the patch has passed. So looks like it solves the problem.
Jiaying
Original comment by jiahotc...@gmail.com
on 30 Jan 2008 at 3:39
Attachments:
Fix committed in r1317 - r1320 to 0.6 and trunk.
Original comment by daniel.r...@gmail.com
on 3 Feb 2008 at 12:22
Original issue reported on code.google.com by
jiahotc...@gmail.com
on 28 Jan 2008 at 6:22