openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.55k stars 1.74k forks source link

MMAP Optimization #225

Closed behlendorf closed 12 years ago

behlendorf commented 13 years ago

The current implementation fully supports mmap but the is some additional cleanup in the implementation which is desirable.

First off it would be best to update the readpage and writepage hooks to use zfs_fillpage and zfs_putapage respectively. This was not done initially because it was easier to use the existing zpl_common_read/write() calls. Updating zfs_fillpage and zfs_putapage to be Linux VM friendly will take a little bit of work, but the end result will be cleaner, faster code. So it's worth doing.

Second, the Linux port still uses the same trick as Solaris for mmap'ed files. That is two copies of the mmap'ed regression are kept, one in the page cache and one in the ARC. The ZFS code is very careful to keep the two synchronized but it is still a wart on the overall design. I believe I may be possible to unify the Linux page cache with the ZFS ARC removing this design issue. This would hopefully cleanup the mmap() code considerably by removing the need to synchronize the caches.

prasad-joshi commented 13 years ago

A proof of concept change to address the First optimization mentioned on this issue. The following code only reads the mmaped file but, similar changes can be easily made for writepage, readpages, writepages interfaces.


FILE NAME: zfs/module/zfs/zpl_file.c
--------------------------------------------------
static int
zpl_readpage(struct file *filp, struct page *pp)
{
        struct inode *ip;
        struct page  *pages[0] = pp;
        cred_t       *cr       = CRED();
        int          error     = 0;

        ASSERT(PageLocked(pp));

        crhold(cr);

        error = zfs_getpage(ip, pages, 1, cr);

        if (error) {
                SetPageError(pp);
                ClearPageUptodate(pp);
        } else {
                ClearPageError(pp);
                SetPageUptodate(pp);
                flush_dcache_page(pp);
        }

        crfree(cr);
        unlock_page(pp);

        return (error);
}

FILE NAME: zfs/module/zfs/zfs_vnops.c
-----------------------------------------------------
int     
zfs_getpage(struct inode *ip, struct page *pl[], unsigned nr_pages, cred_t *cr)
{   
    znode_t     *zp  = ITOZ(ip);
    zfs_sb_t    *zsb = ITOZSB(ip);
    int         err = 0;
    u64         len;
    offset_t    off;

    /* we do our own caching, faultahead is unnecessary */
    if (pl == NULL)
        return (0);

    len = nr_pages << PAGE_CACHE_SHIFT;
    off = page_offset(pl[0]);

    ZFS_ENTER(zsb);
    ZFS_VERIFY_ZP(zp);

    err = zfs_fillpage(ip, off, pl, nr_pages);
    if (err)
        goto out;
out:
    if (!err)
        ZFS_ACCESSTIME_STAMP(zsb, zp);

    ZFS_EXIT(zsb);
    return (err);
}

int zfs_fillpage(struct inode *ip, offset_t off, struct page *pl[], int nr_pages)
{
    znode_t     *zp  = ITOZ(ip);
    zfs_sb_t    *zsb = ITOZSB(ip);
    objset_t    *os  = zsb->z_os;
    struct page *cur_pp;
    u_offset_t  io_off, total;
    size_t      io_len;
    unsigned    page_idx;
    loff_t      i_size;
    int         err;

    i_size = i_size_read(ip);
    io_off = page_offset(pl[0]);

    io_len = nr_pages << PAGE_CACHE_SHIFT;
    if (io_off + io_len > i_size)
        io_len = i_size - io_off;

    /* fill each page individually */
    page_idx = 0;
    cur_pp   = pl[0];
    for (total = io_off + io_len; io_off < total; io_off += PAGESIZE) {
        caddr_t va;

        ASSERT3U(io_off, ==, cur_pp->p_offset);
        va  = kmap(cur_pp);
        err = dmu_read(os, zp->z_id, io_off, PAGESIZE, va,
            DMU_READ_PREFETCH);
        kunmap(cur_pp);
        if (err) {
            /* convert checksum errors into IO errors */
            if (err == ECKSUM)
                err = EIO;
            return (err);
        }
        cur_pp = pl[page_idx];
        page_idx++;
    }

    /* fill 0's in remaining pages??????? */

    return (0);
}

Here is an example test. The similar test was carried out with file of size 10MB.


root@prasad-desktop:~/programs/c/mmap# ./mmap_read /tank/mmap_read.c 
#include 
#include 
#include 
#include 
#include 
#include 

int main (int argc, char *argv[])
{
    struct stat sb;
    off_t len;
    char *p;
    int fd;

    if (argc < 2) {
        fprintf (stderr, "usage: %s \n", argv[0]);
        return 1;
    }

    fd = open (argv[1], O_RDONLY);
    if (fd == -1) {
        perror ("open");
        return 1;
    }

    if (fstat (fd, &sb) == -1) {
        perror ("fstat");
        return 1;
    }

    if (!S_ISREG (sb.st_mode)) {
        fprintf (stderr, "%s is not a file\n", argv[1]);
        return 1;
    }

    p = mmap (0, sb.st_size, PROT_READ, MAP_SHARED, fd, 0);
    if (p == MAP_FAILED) {
        perror ("mmap");
        return 1;
    }

    if (close (fd) == -1) {
        perror ("close");
        return 1;
    }

    for (len = 0; len < sb.st_size; len++)
        putchar (p[len]);

    if (munmap (p, sb.st_size) == -1) {
        perror ("munmap");
        return 1;
    }

    return 0;
} 
prasad-joshi commented 13 years ago

I tried hard to make the code snippet look like code but, some part is still not shown as code block. /me reading Markdown language

behlendorf commented 13 years ago

Hi Prasad,

It's good to hear from you! I'm glad to see your still working on ZFS.

Your prototype patch looks like a reasonable first step. If your interested in working on this I'd suggest expanding the scope a little. I'd love to see the functions in module/zfs/zfs_vnops.c which are #ifdef'ed out with the HAVE_MMAP macro updated to be Linux friendly. They could then be used cleanly as the Linux mmap helpers. I'm happy to iterate with you on a patch and review your proposed changes.

prasad-joshi commented 13 years ago

Sent you a pull request for first optimization mentioned on this issue. Please review the code Pull Request: https://github.com/behlendorf/zfs/pull/300

behlendorf commented 12 years ago

This work was actually down a while ago and merged. Closing the issue.

pgassmann commented 4 years ago

Sent you a pull request for first optimization mentioned on this issue. Please review the code Pull Request: https://github.com/behlendorf/zfs/pull/300

Working link to pull request: #300