openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.55k stars 1.74k forks source link

When a ZFS File system is set to CASE INSENSITIVE we should be calling d_add_ci on the dentry #4136

Closed RichardSharpe closed 8 years ago

RichardSharpe commented 8 years ago

Hi folks,

We just ran into an issue that caused a rename deadlock in Linux.

Kernel 3.10.0, but the same will happen, I believe with the latest kernel.

We use the casesensitivity=insensitive attribute.

Because ZoL does not call d_add_ci with the canonical case (at least, I cannot find any such calls) Linux ends up with two dentries pointing to the same inode if you present two paths with different case.

This happened to us and causes a self-deadlock in lock_rename because it is passed two different dentries that point to the same inode.

I think the fix has to be in zpl_inode.c:zpl_lookup where we check if the file system is case insensitive and if so call d_add_ci(dentry, ip, realpnp) and add a realpnp parameter to the zfs_lookup call we make, or something like that.

For now, we can work around the issue in Samba, I believe, however, it would be good to have a real solution to this problem.

behlendorf commented 8 years ago

@RichardSharpe nice analysis. It looks like the fix your proposing can be made entirely in the Linux specific zpl_lookup() function. You can check how the filesystem is mounted by consulting zsb->z_case and because it's set once at mount time you know it can't change unless the filesystem is remounted. Then as you suggest you just need to use d_add_ci() to add a case-exact name.

It would be great if you could propose a patch for this and open a pull request.

RichardSharpe commented 8 years ago

@behlendorf

I will work something up over the next few days.

RichardSharpe commented 8 years ago

Simple test case:

  1. Create an fs with casesensitivity=insensitive
  2. mkdir /mnt-point/test
  3. touch /mnt-point/test/test.txt
  4. mv /mnt-point/test/test.txt /mnt-point/Test/test1.txt

Hung and deadlocked at this point.

RichardSharpe commented 8 years ago

Here is a first attempt at this. It is untested (although it compiles), and I have at least one reservation:

diff --git a/module/zfs/zpl_inode.c b/module/zfs/zpl_inode.c
index e0e8ee3..ccbe68a 100644
--- a/module/zfs/zpl_inode.c
+++ b/module/zfs/zpl_inode.c
@@ -44,13 +44,24 @@ zpl_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
        struct inode *ip;
        int error;
        fstrans_cookie_t cookie;
+       pathname_t *ppn = NULL;
+       pathname_t pn;
+       zfs_sb_t *zsb = dentry->d_sb->s_fs_info;
+
+       /* What about mixed case? */
+       /* If we are a case insensitive fs, we need the real name */
+       if (zsb->z_case == ZFS_CASE_INSENSITIVE) {
+               pn.pn_bufsize = ZFS_MAXNAMELEN;
+               pn.pn_buf = kmem_zalloc(ZFS_MAXNAMELEN, KM_SLEEP);
+               ppn = &pn;
+       }

        if (dlen(dentry) > ZFS_MAXNAMELEN)
                return (ERR_PTR(-ENAMETOOLONG));

        crhold(cr);
        cookie = spl_fstrans_mark();
-       error = -zfs_lookup(dir, dname(dentry), &ip, 0, cr, NULL, NULL);
+       error = -zfs_lookup(dir, dname(dentry), &ip, 0, cr, NULL, ppn);
        spl_fstrans_unmark(cookie);
        ASSERT3S(error, <=, 0);
        crfree(cr);
@@ -69,7 +80,23 @@ zpl_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
                        return (ERR_PTR(error));
        }

-       return (d_splice_alias(ip, dentry));
+       /*
+        * If we are case insensitive, call the correct function
+        * to install the name.
+        */
+       if (ppn) {
+               struct dentry *new_dentry;
+               struct qstr real_name;
+
+               real_name.name = pn.pn_buf;
+               real_name.len = strlen(pn.pn_buf);
+               real_name.hash = full_name_hash(real_name.name, real_name.len);
+               new_dentry = d_add_ci(dentry, ip, &real_name);
+               kmem_free(pn.pn_buf, ZFS_MAXNAMELEN);
+               return (new_dentry);
+       }
+       else
+               return (d_splice_alias(ip, dentry));
 }

 void

My reservation is that we are adding a kmem_alloc and kmem_free in the lookup path.

It would have been nicer if zfs_lookup could return a pointer to the actual name, since we are not going to touch the name.

An additional reservation is that I have not thought about or handled caseensitivity=mixed.

I will try to test this later today.

RichardSharpe commented 8 years ago

With a small change (MAXNAMELEN rather than ZFS_MAXNAMELEN) seems to work.

By work I mean:

  1. No OOPSes.
  2. Lookup still works (ls works etc)
  3. The rename test case no longer deadlocks.

More testing required.

RichardSharpe commented 8 years ago

@ptx0: It might turn up with Roaming Profiles but I don't think it is specifically related to roaming profiles. We found it running FSCT.

RichardSharpe commented 8 years ago

I have run fsstress many times against the changes and dozens of renames where the case differs in the from and to paths (not the last component) and no deadlocks seen.

I created a pull request.

behlendorf commented 8 years ago

@RichardSharpe thanks for tackling this and opening a PR. The heart of the fix looks good, I've added some review comments so we can get it polished for inclusion. Please go ahead and refresh it when you can.

RichardSharpe commented 8 years ago

@behlendorf:

Re:

You mentioned ZFS_MAXNAMELEN caused a panic. What was the cause of it? These values differ only by 1 and ZFS_MAXNAMELEN (255) is smaller compared to MAXNAMELEN (256) so I'm surprised. This should be ZFS_MAXNAMELEN as you originally had it.

Hmmm, I can't remember a panic.

However, I was printing out some names for debugging and was concerned that there might not be enough space for the trailing NULL.

behlendorf commented 8 years ago

@RichardSharpe I must have misinterpreted your comment above.

RichardSharpe commented 8 years ago
  1. I have created a new pull request. I might have mistakenly created two. Can I close one or more?
  2. Using the github interface is there any way to pull the ZoL repository into my fork so I am always at the top?
behlendorf commented 8 years ago

@RichardSharpe when you open a pull request it's recommended that you use a branch per change. This way you can easily pull master from the ZoL repository and rebase your branch on top of those changes. Github provides some nice documentation, I'd suggest closing out #4147 and opening a new PR using a branch if you need to refresh this again.

RichardSharpe commented 8 years ago

OK, I will try to close out 4147.

We hit that memory leak in QA a couple of times. It is quite impressive. 733MB leaked in the malloc-256 slab.

RichardSharpe commented 8 years ago

I created a new pull request with what I think are the fixes but I noticed that two of the checks failed.

However, that seems to be because wget failed ... what can I do about it?

behlendorf commented 8 years ago

@RichardSharpe looks like an unrelated hiccup in the test framework. I'll resubmit them for the sake of completeness but the patch looks good, and tests well, for me. Thanks for pushing this over the finish line, I'll get it merged it master.