Replication may stop when snapshots are squashed on downstream

GoogleCodeExporter commented 9 years ago

The current ddsnap code calculates the usecount of a snapshot by adding the
on-disk persistent usecount to the transient usecount (for devices using
the snapshot) since a change introduced a couple of months ago. But it does
not set the transient usecount to zero when the snapshot is squashed. As a
result, the latter created snapshot that re-uses the same snapshot bit may
have invalid usecount. This can cause problem in replication because we now
 rely on usecount to keep track of kept snapshots on downstream. 

Here is a proposed fix.

Index: ddsnapd.c
===================================================================
--- ddsnapd.c   (revision 1544)
+++ ddsnapd.c   (working copy)
@@ -2280,6 +2280,7 @@
        warn("releasing snapshot %u", victim->tag);
        if (usecount(sb, victim)) {
                err = delete_tree_range(sb, 1ULL << victim->bit, 0);
+               sb->usecount[victim->bit] = 0;
                victim->bit = SNAPSHOT_SQUASHED;
        } else
                err = delete_snap(sb, victim);

A temporary bypass before this change is to restart zumastor if replication
stops.  

Jiaying

Original issue reported on code.google.com by jiayin...@gmail.com on 22 Apr 2008 at 10:04

GoogleCodeExporter commented 9 years ago

Original comment by jiahotc...@gmail.com on 22 Apr 2008 at 10:07

Added labels: Milestone-Release8
Removed labels: ****

GoogleCodeExporter commented 9 years ago

This fix is good and correct.

Original comment by Daniel.R...@gmail.com on 29 Apr 2008 at 11:28

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Fixed with revision 1615.

Original comment by jiahotc...@gmail.com on 29 Apr 2008 at 11:54

Changed state: Fixed
Added labels: ****
Removed labels: ****

junneyang / zumastor

Replication may stop when snapshots are squashed on downstream #121