junneyang / zumastor

Automatically exported from code.google.com/p/zumastor
0 stars 1 forks source link

snapshot deletion causes all I/O to block for up to three minutes #138

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Here is a ddsnap usage scenario:

/dev/mapper/vol is a ddsnap device. The original device is a LVM "linear"
volume mapping a RAID 1 device on top of two SATA drives.

That device is 100GB large. The snapstore on one SATA drive is 50 GB large.

The device contains an EXT3 FS with data=ordered and without noatime which
means "read"s imply "write"s to the block dev.

The system is being used but not heavily loaded (runs a few services
including bind, apache and is used as a development platform for the
occasional build of software by a few people)

One snapshot of that device is taken every hour after the oldest snapshot
has been removed.

It's the removal of the oldest snapshot that causes concerns.

Looking at over 130 deletions (so over about 130 hours), the average time
was 36 seconds and could reach up to 3 minutes.

During those periods, if I understand correctly, the ddsnap server is busy
and can't service any "write" request.

That means that any IO to that FS (since reads imply writes) is delayed
until the snapshot is eventually removed.

Here is a ddsnap status output that gives an indication of the allocation
of chunks per snapshot:

$ sudo ddsnap status /dev/ddsnap/server
Snapshot store block size = 4096; 9,742,071 of 13,107,200 chunks free
Origin size: 102,395,543,552 bytes
Write density: 0
Creation time: Tue Apr 22 13:29:31 2008
  Snap            Creation time Usecnt Prio   Chunks Unshared   Shared
    13 Fri Apr 25 00:00:13 2008      1    0  1600267  1030551   569716
     5 Sun Apr 27 00:01:36 2008      1    0   903211   295765   607446
     1 Sun May  4 00:01:23 2008      1    0   387785   200010   187775
     0 Tue May  6 00:00:57 2008      1    0   382601   187937   194664
     4 Wed May  7 00:00:55 2008      1    0   376664   181553   195111
    10 Thu May  8 00:00:19 2008      1    0   339974   185791   154183
     6 Fri May  9 00:01:28 2008      1    0   202366    33803   168563
     8 Fri May  9 01:00:28 2008      1    0   202293    33723   168570
     9 Fri May  9 02:00:28 2008      1    0   202228    33692   168536
    12 Fri May  9 03:00:26 2008      1    0   202174    33683   168491
    18 Fri May  9 04:00:22 2008      1    0   202119    33716   168403
    15 Fri May  9 05:00:16 2008      1    0   202059    33722   168337
    17 Fri May  9 06:00:15 2008      1    0   201950    33706   168244
     3 Fri May  9 07:01:07 2008      1    0   122176    37351    84825
    11 Fri May  9 08:00:50 2008      1    0    42987    33711     9276
    14 Fri May  9 09:00:48 2008      1    0    42894    33783     9111
    16 Fri May  9 10:00:47 2008      1    0    42711    33921     8790
     2 Fri May  9 11:00:51 2008      1    0    36844    35029     1815
     7 Fri May  9 12:00:50 2008      1    0    33974    33833      141
totals                                       3340457  2525280   815177

The creation time of the snapshot gives an indication of the time taken to
delete a snapshot before that.

Original issue reported on code.google.com by s.chaze...@gmail.com on 13 May 2008 at 4:25

GoogleCodeExporter commented 9 years ago

Original comment by daniel.r...@gmail.com on 14 May 2008 at 6:47

GoogleCodeExporter commented 9 years ago
Stephane, can you update us on this?  According to your post
http://groups.google.com/group/zumastor/msg/d2acb62302350b49
mounting with noatime and doubling cache size to 256MB helped
greatly in an initial test.  Is it still helping in more realistic
tests?

Original comment by daniel.r...@gmail.com on 15 May 2008 at 8:15

GoogleCodeExporter commented 9 years ago
To really fix this, we'll need to change the snapshot delete code
so it processes I/O while doing deletes.

Daniel P. has volunteered to tackle that for release 0.10.

Original comment by daniel.r...@gmail.com on 15 May 2008 at 11:07

GoogleCodeExporter commented 9 years ago
Hi Daniel, to answer your question in a previous comment, I'm not under the
impression that increasing the cache size to 256 MB helped a lot. As I had 
mentionned
earlier, I had to wipe out the snapstore, and it's not as full as it was before 
yet,
but deleting a snapshot already takes about 25 seconds. Here's a ddsnap output 
with
Jiaying patch for displaying the metadata usage:

Snapshot store block size = 4096; 11,962,287 of 13,107,200 chunks free
Origin size: 102,395,543,552 bytes
Write density: 0
Creation time: Mon May 12 18:31:19 2008
  Snap            Creation time Usecnt Prio   Chunks Unshared   Shared
     0 Mon May 12 15:00:54 2008      1    0   357796    36597   321199
     9 Tue May 13 00:00:01 2008      1    0   355304    33838   321466
     7 Wed May 14 00:00:12 2008      1    0   346879    37292   309587
     5 Thu May 15 00:00:13 2008      1    0   321111    66198   254913
     1 Thu May 15 22:00:18 2008      1    0   220450    33382   187068
     2 Thu May 15 23:00:18 2008      1    0   220365    33333   187032
     3 Fri May 16 00:00:17 2008      1    0   220247    33326   186921
     4 Fri May 16 01:00:13 2008      1    0   220154    33319   186835
    16 Fri May 16 02:00:13 2008      1    0   220067    33323   186744
     6 Fri May 16 03:00:13 2008      1    0   219955    33321   186634
    15 Fri May 16 04:00:12 2008      1    0   219859    33309   186550
     8 Fri May 16 05:00:12 2008      1    0   219757    33307   186450
    14 Fri May 16 06:00:12 2008      1    0   219663    33326   186337
    10 Fri May 16 07:01:15 2008      1    0   175708    33464   142244
    11 Fri May 16 08:00:22 2008      1    0   138155    33350   104805
    12 Fri May 16 09:00:20 2008      1    0   137995    33351   104644
    13 Fri May 16 10:00:28 2008      1    0    18436     4214    14222
totals                                       1136551   578250   558301
Metadata usage: 34,250,752 bytes

I beleive noatime had a great impact on usability as I haven't experienced yet
hangings, but I'm not using the machine interactively as heavily as I was 
before when
every now and then vim/zsh... would hang and I'd look at the clock and it was 
xx:00.

In any case, I'd consider using noatime as being a workaround rather than a 
solution
as it's trading one feature (access time on files) for another (snapshots).

Original comment by s.chaze...@gmail.com on 16 May 2008 at 9:15

GoogleCodeExporter commented 9 years ago
Yes, noatime is a workaround (though it'll be a common one).
Mounting with atime is a good way to convert a read-only
stress test into a read-write stress test :-)

We plan to address this bug for real in release 0.10.

Original comment by daniel.r...@gmail.com on 16 May 2008 at 12:47