junneyang / zumastor

Automatically exported from code.google.com/p/zumastor
0 stars 1 forks source link

Replication may stop when snapshots are squashed on downstream #121

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The current ddsnap code calculates the usecount of a snapshot by adding the
on-disk persistent usecount to the transient usecount (for devices using
the snapshot) since a change introduced a couple of months ago. But it does
not set the transient usecount to zero when the snapshot is squashed. As a
result, the latter created snapshot that re-uses the same snapshot bit may
have invalid usecount. This can cause problem in replication because we now
 rely on usecount to keep track of kept snapshots on downstream. 

Here is a proposed fix.

Index: ddsnapd.c
===================================================================
--- ddsnapd.c   (revision 1544)
+++ ddsnapd.c   (working copy)
@@ -2280,6 +2280,7 @@
        warn("releasing snapshot %u", victim->tag);
        if (usecount(sb, victim)) {
                err = delete_tree_range(sb, 1ULL << victim->bit, 0);
+               sb->usecount[victim->bit] = 0;
                victim->bit = SNAPSHOT_SQUASHED;
        } else
                err = delete_snap(sb, victim);

A temporary bypass before this change is to restart zumastor if replication
stops.  

Jiaying

Original issue reported on code.google.com by jiayin...@gmail.com on 22 Apr 2008 at 10:04

GoogleCodeExporter commented 9 years ago

Original comment by jiahotc...@gmail.com on 22 Apr 2008 at 10:07

GoogleCodeExporter commented 9 years ago
This fix is good and correct.

Original comment by Daniel.R...@gmail.com on 29 Apr 2008 at 11:28

GoogleCodeExporter commented 9 years ago
Fixed with revision 1615.

Original comment by jiahotc...@gmail.com on 29 Apr 2008 at 11:54