Closed GoogleCodeExporter closed 9 years ago
Confirmed with DanP that all but the most recent snapshots are gone on the
downstream, both from zumastor and ddsnap's viewpoints. zumastor should not
have
been throwing away any snapshots, ddsnap should be garbage collecting them as
space
was exhausted instead.
zumastor status --usage
ddsnap status --verbose /var/run/zumastor/servers/smohome4
Snap Creation time Usecnt Prio Chunks Unshared 1X
103638 Wed Jan 2 16:00:05 2008 1 0 0 0 0
103639 Wed Jan 2 16:00:05 2008 1 0 42 42 0
totals 42 42 0
A few minutes earlier the pair of snapshots 103636 and 103637 were present on
this
downstream replica.
Due to the 64 snapshot limit though, it would be more useful to have zumastor
involved in the decision of which snapshots to remove, so longer-term backups
are
preserved as intended (weekly, monthly, rather than just 64 hours of backups).
Bumping to higher priority since we need this or something similar to be useful
as a
storage solution with integrated backup.
Original comment by drake.di...@gmail.com
on 3 Jan 2008 at 12:03
New test to try this using the existing interfaces. Since this is not
supported at
present, the test of course fails, but it describes one natural interface to
doing
this. Currently fails by timeout at step 4, defining the slave source after
snapshots have already been set up for the slave. Patch based off
replication-zumastor.sh:
--- cbtb/tests/2/replication-snapshots-zumastor.sh (revision 1262)
+++ cbtb/tests/2/replication-snapshots-zumastor.sh (working copy)
@@ -2,14 +2,12 @@
#
# $Id$
#
-# Set up origin and snapshot store on master and secondary machine on
-# raw disks. Begin replication cycle between machines
-# and wait until it arrives and can be verified.
-# Modify the origin and verify that the modification also arrives at the
-# backup.
+# A test of simultaneous snapshots and replication. This is not
+# supported, and an issue is filed, so this test is expected to fail.
+# Most likely the test will need to change to match the new configuration
+# commands that will be part of adding this feature.
#
-# Requires that the launch environment (eg. test-zuma-dapper-i386.sh) export
-# both $IPADDR and $IPADDR2 to the paramter scripts.
+# http://code.google.com/p/zumastor/issues/detail?id=26
#
# Copyright 2007 Google Inc. All rights reserved
# Author: Drake Diedrich (dld@google.com)
@@ -22,6 +20,10 @@
HDBSIZE=4
HDCSIZE=8
+# Feature request. http://code.google.com/p/zumastor/issues/detail?id=26
+EXPECT_FAIL=1
+
+
slave=${IPADDR2}
SSH='ssh -o StrictHostKeyChecking=no -o BatchMode=yes'
@@ -76,6 +78,7 @@
${SCP} ${HOME}/.ssh/known_hosts root@${slave}:${HOME}/.ssh/known_hosts
${SSH} root@${slave} hostname slave
${SSH} root@${slave} zumastor define volume testvol /dev/sdb /dev/sdc --initialize
+${SSH} root@${slave} zumastor define master testvol -h 24 -d 7
${SSH} root@${slave} zumastor status --usage
echo ok 2 - slave testvol set up
@@ -108,8 +111,25 @@
echo ok 6 - replication manually from master
fi
+# take a snapshot of the empty volume on the slave and wait for it
+$SSH root@${slave} 'sync ; zumastor snapshot testvol hourly'
+slavehourly0=/var/run/zumastor/snapshot/testvol/hourly.0
+if timeout_file_wait 30 root@${slave} $slavehourly0
+then
+ $SSH root@${slave} "df -h ; mount"
+ $SSH root@${slave} ls -alR /var/run/zumastor
+ $SSH root@${slave} zumastor status --usage
+ $SSH root@${slave} tail -200 /var/log/syslog
+
+ echo not ok 7 - first slave snapshot
+ exit 7
+else
+ slavesnap0=`readlink $slavehourly0`
+ echo ok 7 - first slave snapshot
+fi
+
date >>/var/run/zumastor/mount/testvol/testfile
sync
zumastor snapshot testvol hourly
@@ -119,13 +139,13 @@
ls -alR /var/run/zumastor
zumastor status --usage
tail -200 /var/log/syslog
- echo not ok 7 - testfile written, synced, and snapshotted
- exit 7
+ echo not ok 8 - testfile written, synced, and snapshotted
+ exit 8
else
- echo ok 7 - testfile written, synced, and snapshotted
+ echo ok 8 - testfile written, synced, and snapshotted
fi
-hash=`md5sum /var/run/zumastor/mount/testvol/testfile`
+hash=`md5sum /var/run/zumastor/mount/testvol/testfile|cut -f1 -d\ `
#
# schedule an immediate replication cycle
@@ -141,10 +161,10 @@
$SSH root@${slave} zumastor status --usage
$SSH root@${slave} tail -200 /var/log/syslog
- echo not ok 8 - testvol has migrated to slave
- exit 8
+ echo not ok 9 - testvol has migrated to slave
+ exit 9
else
- echo ok 8 - testvol has migrated to slave
+ echo ok 9 - testvol has migrated to slave
fi
# check separately for the testfile
@@ -154,13 +174,13 @@
$SSH root@${slave} zumastor status --usage
$SSH root@${slave} tail -200 /var/log/syslog
- echo not ok 9 - testfile has migrated to slave
- exit 9
+ echo not ok 10 - testfile has migrated to slave
+ exit 10
else
- echo ok 9 - testfile has migrated to slave
+ echo ok 10 - testfile has migrated to slave
fi
-rhash=`${SSH} root@${slave} md5sum /var/run/zumastor/mount/testvol/testfile`
|| \
+rhash=`${SSH} root@${slave} md5sum
/var/run/zumastor/mount/testvol/testfile|cut -f1
-d\ ` || \
${SSH} root@${slave} <<EOF
mount
df
@@ -170,9 +190,9 @@
if [ "$rhash" = "$hash" ] ; then
- echo ok 10 - origin and slave testfiles are in sync
+ echo ok 11 - origin and slave testfiles are in sync
else
- echo not ok 10 - origin and slave testfiles are in sync
+ echo not ok 11 - origin and slave testfiles are in sync
mount
df
ls -lR /var/run/zumastor/
@@ -184,7 +204,34 @@
ls -lR /var/run/zumastor/
tail -200 /var/log/syslog
EOF
- exit 10
+ exit 11
fi
+
+# take a new snapshot on the slave and wait for it
+$SSH root@${slave} zumastor snapshot testvol hourly
+if timeout_remote_file_wait 30 root@${slave} \
+ /var/run/zumastor/snapshot/testvol/hourly.1
+then
+ $SSH root@${slave} "df -h ; mount"
+ $SSH root@${slave} ls -alR /var/run/zumastor
+ $SSH root@${slave} zumastor status --usage
+ $SSH root@${slave} tail -200 /var/log/syslog
+
+ echo not ok 12 - second slave snapshot
+ exit 12
+else
+ echo ok 12 - second slave snapshot
+fi
+
+
+rhash0=`${SSH} root@${slave} md5sum
/var/run/zumastor/snapshot/testvol/hourly.0/testfile|cut -f1 -d\ ` || \
+ ${SSH} root@${slave} <<EOF
+ mount
+ df
+ ls -lR /var/run/zumastor/
+ tail -200 /var/log/syslog
+EOF
+
+
exit 0
Original comment by drake.di...@gmail.com
on 10 Jan 2008 at 11:39
It will be nice if we can simply run 'zumastor master' on downstream to take
hourly/weekly snapshots. But one problem is there may be some ongoing
replication
when we take a snapshot. Since we apply all the delta changes to the origin
volume on
downstream, quite possible the snapshot we take is not valid.
As a quick solution, maybe we can add a '--backup <snapshots>' option to
'zumastor
define source' that specifies how many replicated snapshots we want to keep on
downstream. It is not as desirable as preserving snapshots as intended (weekly,
monthly, etc.), but may be enough for users who run replication as cron jobs.
We can
also add --hourly,--daily, and --weekly options to the 'zumastor define target'
command to handle the setting up of cron jobs, just as we have for 'zumastor
define
master'.
Original comment by jiahotc...@gmail.com
on 16 Jan 2008 at 10:58
A week ago flips and I had a conversation about this on IRC. I'm attaching the
interesting part of the log. A few extracts:
[21:55] <pgquiles> I have 4 servers
[21:56] <pgquiles> two at the central office, 1 at a branch office, and 1 at
anothe
branch office
[21:56] <pgquiles> I'd like everybody to see exactly the same data, regardless
where
they are
[21:57] <flips> what if your churn on the volume creates a delta that takes
more than
an hour to send?
[21:57] <pgquiles> if someone deletes a file in office 1 and is available in
office 2
but not in office 3 because of differents snapshots, it's a bit of a mess
[...]
(about "are named snapshots needed for remote snapshot replication to work?")
[22:29] <pgquiles> say B and C replicate from A
[22:29] <pgquiles> "A" is taking hourly snapshots
[22:29] <pgquiles> B and C replicate from A every 6 hours
[22:30] <pgquiles> you'd need to replicate revisions N, N+1, ..., N+5
[22:30] <pgquiles> regardless they have a name or not, you just ask A for
revision
number N
[22:30] <pgquiles> or N+1
Original comment by pgqui...@gmail.com
on 17 Jan 2008 at 2:57
We (Drake, Dan, and I) had a short discussion here about how to allow
downstream to
keep some old replicated snapshots. I see Drake's point about making it
flexible for
the downstream to choose how many snapshots it wants to keep for a particular
kind
(hourly, weekly, etc.) of replicated snapshots. E.g., we may want to take hourly
replication but only keep four most recently replicated snapshots on
downstream.
Thinking about how to support this, we may add a '--schedule' option to the
'zumastor
define source' command and the 'zumastor replicate' command. The new commands
will
look like this:
zumastor define source $volume $host --schedule $kind:$limit
zumastor replicate volume --schedule $kind
$kind can be any kind of user-defined name, e.g. 'hourly', 'daily', or
'per-20minutes'. $limit will be the number of replicated snapshots we want to
keep
for this replication schedule. And we can take multiple schedule options. The
'zumastor replicate --schedule $kind' command will let the new snapshot be kept
for
that schedule and zumastor will automatically delete the oldest replicated
snapshot
of that schedule. This extension shouldn't change the default behavior and
should't
be hard to code.
Any thoughts about the proposal?
Jiaying
Original comment by jiahotc...@gmail.com
on 18 Jan 2008 at 11:12
The discussion between Shapor, Jiaying and I this afternoon, produced a simple
model
for downstream snapshot retention.
We will support --hourly/--daily/--weekly/--monthly/--custom parameters to the
"zumastor define source" command, in the same form as "zumastor define master".
The
meaning is, we try to retain the specified number of snapshots downstream of
each
indicated kind. If the snapshot store fills up, ddsnap deals with it by
auto-deleting starting with the oldest snapshot.
Each incoming replicating cycle will be classified by zumastor as belonging to
zero
or more of the different periods. Replication cycles not classified as any of
the
different periods (because they fell between the cracks, for example, several 5
minute cycles in a row) get a use count of 0, others get a use count of 1.
Ddsnap
will then choose the snapshots with zero use count for autodeletion when the
snapshot
store fills up, before falling back to deleting snapshots with use count 1.
So, if
the snapshot store is large enough and the churn on the upstream small enough,
exactly as many snapshots of each specified kind will be retained downstream.
The criteron for classifying a completed replication cycle is that, for each
kind of
interval, the incoming cycle is more than the respective period younger than
the last
snapshot for that kind of interval.
Original comment by Daniel.R...@gmail.com
on 31 Jan 2008 at 2:41
Last I heard, Shapor had volunteered to do this for release 0.7.
Dan P. noted that one can manage the lifetime of
downstream snapshots manually with 'ddsnap usecount'.
Jiaying is throwing together an example shell script for this, for those who
can't wait.
Original comment by daniel.r...@gmail.com
on 13 Feb 2008 at 6:18
I had a short discussion with Chris yesterday. He was working on keeping
snapshot by
increasing snapshot usecount. But we still need some way to set the
configuration on
how to keep snapshots on downstream. The syntax is very similar to 'zumastor
define
master'. So I added a new command 'zumastor define schedule <vol> [-h|--hourly
<n>]
[-d|--daily <n>] [-w|--weekly <n>] [-m|--monthly <n>] [-c|--custom <name> <n>]
[--help]'. The command is supposed to be run after 'zumastor define master' or
'zumastor define source'.
I also changed the code of 'zumastor snapshot' so that it can run on both
upstream
and downstream. With the change, 'zumastor snapshot' will create a temp 'kind'
file
if it is run on downstream and the newly replicated snapshot will be kept as
that
'kind'. The idea is similar as the patch Shapor posted before.
In the attachment is the patch I described. It removes the --hourly, --daily
etc.
options from the 'zumastor define master' cli. So we also need to change the
cbtb
tests. I.e., at the place where 'zumastor define master -h 24 -d 7' is called,
we
need to call 'zumastor define master; zumastor define schedule -h 24 -d 7'
instead.
Please let me know if you like the proposal or not.
Chris, could you post the patch you have? I can merge them together if you want.
Jiaying
Original comment by jiayin...@gmail.com
on 13 Mar 2008 at 10:03
Attachments:
I like the idea of moving the --hourly etc. from 'zumastor define master'
to 'zumastor define schedule'.
I don't understand the 'zumastor snapshot' proposal yet; what do you
mean by temp kind file?
Original comment by daniel.r...@gmail.com
on 13 Mar 2008 at 10:26
We want to keep the newly replicated snapshot upon 'zumastor snapshot'.
We can simply increase the usecount of the current mounted snapshot and
keep it as hourly, or daily, etc. But there is a race condition with the
drop_snapshot at the end of replication_cycle. So rather, the patch just
create a temporary file, 'hourly' or 'daily' etc, under
/var/lib/zumastor/$vol/source/pending.
At the end of each replication, zumastor checks the files under the pending
directory and keeps the newly replicated snapshot accordingly.
Original comment by jiayin...@gmail.com
on 13 Mar 2008 at 10:39
Can anyone please review the patch? Thanks!
Original comment by jiayin...@gmail.com
on 14 Mar 2008 at 1:21
Nice work. I think separating the scheduling from the define master/source
commands
is a great idea. Why didn't I think of that? :)
A few comments:
1) keep_snapshot is copy and paste of new_snapshot, should be generalized
2) cli changes should be reflected in the man page
3) $CRONS/$vol gets removed in stop_nag but never gets created anywhere I can
tell.
This would cause the cron jobs to ignore the volume.
4) 358 + echo yes > $VOLUMES/$vol/source/pending/$kind
The style we've been going for is no space between the redirect operator and the
file, like >$VOLUMES/$vol/source/pending/$kind
Shapor
Original comment by sha...@gmail.com
on 14 Mar 2008 at 5:08
Fixed with revision 1459.
Original comment by jiahotc...@gmail.com
on 17 Mar 2008 at 7:19
Original issue reported on code.google.com by
daniel.r...@gmail.com
on 22 Dec 2007 at 2:47