junneyang / zumastor

Automatically exported from code.google.com/p/zumastor
0 stars 1 forks source link

switch to O_DIRECT read in ddsnap replication #92

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The current ddsnap replication code reads snapshot devices in normal mode,
so the read data is kept in the kernel page cache. This seems to be
unnecessary because most of the read data won't be re-read in the future.
It is also bad for performance because a large amount of normal memory is
used for file page caching and other operations become slow. We may want to
use O_DIRECT read instead for replication. Here is a proposed patch for
solving the problem.

Index: ddsnap.c
===================================================================
--- ddsnap.c    (revision 1497)
+++ ddsnap.c    (working copy)
@@ -763,21 +763,23 @@ static int generate_delta_extents(u32 mo
        char *dev1name = NULL, *dev2name = NULL, *progress_tmpfile = NULL;
        int snapdev1 = -1, snapdev2 = -1;
        int err = -ENOMEM;
-       unsigned char *dev1_extent   = malloc(MAX_MEM_SIZE);
-       unsigned char *dev2_extent   = malloc(MAX_MEM_SIZE);
+       unsigned char *dev1_extent;
+       unsigned char *dev2_extent;
        unsigned char *extents_delta = malloc(MAX_MEM_SIZE);
        unsigned char *gzip_delta    = malloc(MAX_MEM_SIZE + 12 +
(MAX_MEM_SIZE >> 9));
        unsigned char *dev2_gzip_extent = malloc(MAX_MEM_SIZE + 12 +
(MAX_MEM_SIZE >> 9));

-       if (!fullvolume && (!(dev1name = malloc_snapshot_name(devstem,
src_snap)) || ((snapdev1 = open(dev1name, O_RDONLY)) < 0))) {
+       if (!fullvolume && (!(dev1name = malloc_snapshot_name(devstem,
src_snap)) || ((snapdev1 = open(dev1name, O_DIRECT|O_RDONLY)) < 0))) {
                warn("unable to open source snapshot: %s", strerror(errno));
                goto out;
        }
-       if (!(dev2name = malloc_snapshot_name(devstem, tgt_snap)) ||
((snapdev2 = open(dev2name, O_RDONLY)) < 0)) {
+       if (!(dev2name = malloc_snapshot_name(devstem, tgt_snap)) ||
((snapdev2 = open(dev2name, O_DIRECT|O_RDONLY)) < 0)) {
                warn("unable to open target snapshot: %s", strerror(errno));
                goto out;
        }

+       posix_memalign(&dev1_extent, SECTOR_SIZE, MAX_MEM_SIZE);
+       posix_memalign(&dev2_extent, SECTOR_SIZE, MAX_MEM_SIZE);
        if (!dev1_extent || !dev2_extent || !extents_delta || !gzip_delta
|| !dev2_gzip_extent) {
                warn("variable memory allocation failed: %s", strerror(-err));
                goto out;

Original issue reported on code.google.com by jiayin...@gmail.com on 25 Mar 2008 at 10:41

GoogleCodeExporter commented 9 years ago

Original comment by daniel.r...@gmail.com on 26 Mar 2008 at 6:12

GoogleCodeExporter commented 9 years ago
The patch looks good for switching to O_DIRECT, but shouldn't we just use 
fadvise
instead if our only goal is preventing us from trashing the page cache?  We'd 
have to
call posix_fadvise(POSIX_FADV_DONTNEED) for every range we read in each 
iteration of
the loop to evict the pages, because sadly Linux doesn't remember the setting 
for
future file operations.

Original comment by sha...@gmail.com on 1 Apr 2008 at 10:28

GoogleCodeExporter commented 9 years ago
Yes. We can just use posix_fadvise for this. O_DIRECT and posix_fadvise seem to 
have
the same performance results according to my measurement. Thanks for your 
suggestion.

In the attachment is the updated patch.

Original comment by jiayin...@gmail.com on 5 Apr 2008 at 1:10

Attachments:

GoogleCodeExporter commented 9 years ago
Looks good to me.

Original comment by sha...@gmail.com on 5 Apr 2008 at 1:17

GoogleCodeExporter commented 9 years ago
Fixed with revision 1525.

Original comment by jiahotc...@gmail.com on 8 Apr 2008 at 1:12