Closed RichardSharpe closed 8 years ago
@RichardSharpe nice analysis. It looks like the fix your proposing can be made entirely in the Linux specific zpl_lookup()
function. You can check how the filesystem is mounted by consulting zsb->z_case
and because it's set once at mount time you know it can't change unless the filesystem is remounted. Then as you suggest you just need to use d_add_ci()
to add a case-exact name.
It would be great if you could propose a patch for this and open a pull request.
@behlendorf
I will work something up over the next few days.
Simple test case:
Hung and deadlocked at this point.
Here is a first attempt at this. It is untested (although it compiles), and I have at least one reservation:
diff --git a/module/zfs/zpl_inode.c b/module/zfs/zpl_inode.c
index e0e8ee3..ccbe68a 100644
--- a/module/zfs/zpl_inode.c
+++ b/module/zfs/zpl_inode.c
@@ -44,13 +44,24 @@ zpl_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
struct inode *ip;
int error;
fstrans_cookie_t cookie;
+ pathname_t *ppn = NULL;
+ pathname_t pn;
+ zfs_sb_t *zsb = dentry->d_sb->s_fs_info;
+
+ /* What about mixed case? */
+ /* If we are a case insensitive fs, we need the real name */
+ if (zsb->z_case == ZFS_CASE_INSENSITIVE) {
+ pn.pn_bufsize = ZFS_MAXNAMELEN;
+ pn.pn_buf = kmem_zalloc(ZFS_MAXNAMELEN, KM_SLEEP);
+ ppn = &pn;
+ }
if (dlen(dentry) > ZFS_MAXNAMELEN)
return (ERR_PTR(-ENAMETOOLONG));
crhold(cr);
cookie = spl_fstrans_mark();
- error = -zfs_lookup(dir, dname(dentry), &ip, 0, cr, NULL, NULL);
+ error = -zfs_lookup(dir, dname(dentry), &ip, 0, cr, NULL, ppn);
spl_fstrans_unmark(cookie);
ASSERT3S(error, <=, 0);
crfree(cr);
@@ -69,7 +80,23 @@ zpl_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
return (ERR_PTR(error));
}
- return (d_splice_alias(ip, dentry));
+ /*
+ * If we are case insensitive, call the correct function
+ * to install the name.
+ */
+ if (ppn) {
+ struct dentry *new_dentry;
+ struct qstr real_name;
+
+ real_name.name = pn.pn_buf;
+ real_name.len = strlen(pn.pn_buf);
+ real_name.hash = full_name_hash(real_name.name, real_name.len);
+ new_dentry = d_add_ci(dentry, ip, &real_name);
+ kmem_free(pn.pn_buf, ZFS_MAXNAMELEN);
+ return (new_dentry);
+ }
+ else
+ return (d_splice_alias(ip, dentry));
}
void
My reservation is that we are adding a kmem_alloc and kmem_free in the lookup path.
It would have been nicer if zfs_lookup could return a pointer to the actual name, since we are not going to touch the name.
An additional reservation is that I have not thought about or handled caseensitivity=mixed.
I will try to test this later today.
With a small change (MAXNAMELEN rather than ZFS_MAXNAMELEN) seems to work.
By work I mean:
More testing required.
@ptx0: It might turn up with Roaming Profiles but I don't think it is specifically related to roaming profiles. We found it running FSCT.
I have run fsstress many times against the changes and dozens of renames where the case differs in the from and to paths (not the last component) and no deadlocks seen.
I created a pull request.
@RichardSharpe thanks for tackling this and opening a PR. The heart of the fix looks good, I've added some review comments so we can get it polished for inclusion. Please go ahead and refresh it when you can.
@behlendorf:
Re:
You mentioned ZFS_MAXNAMELEN caused a panic. What was the cause of it? These values differ only by 1 and ZFS_MAXNAMELEN (255) is smaller compared to MAXNAMELEN (256) so I'm surprised. This should be ZFS_MAXNAMELEN as you originally had it.
Hmmm, I can't remember a panic.
However, I was printing out some names for debugging and was concerned that there might not be enough space for the trailing NULL.
@RichardSharpe I must have misinterpreted your comment above.
@RichardSharpe when you open a pull request it's recommended that you use a branch per change. This way you can easily pull master from the ZoL repository and rebase your branch on top of those changes. Github provides some nice documentation, I'd suggest closing out #4147 and opening a new PR using a branch if you need to refresh this again.
OK, I will try to close out 4147.
We hit that memory leak in QA a couple of times. It is quite impressive. 733MB leaked in the malloc-256 slab.
I created a new pull request with what I think are the fixes but I noticed that two of the checks failed.
However, that seems to be because wget failed ... what can I do about it?
@RichardSharpe looks like an unrelated hiccup in the test framework. I'll resubmit them for the sake of completeness but the patch looks good, and tests well, for me. Thanks for pushing this over the finish line, I'll get it merged it master.
Hi folks,
We just ran into an issue that caused a rename deadlock in Linux.
Kernel 3.10.0, but the same will happen, I believe with the latest kernel.
We use the casesensitivity=insensitive attribute.
Because ZoL does not call d_add_ci with the canonical case (at least, I cannot find any such calls) Linux ends up with two dentries pointing to the same inode if you present two paths with different case.
This happened to us and causes a self-deadlock in lock_rename because it is passed two different dentries that point to the same inode.
I think the fix has to be in zpl_inode.c:zpl_lookup where we check if the file system is case insensitive and if so call d_add_ci(dentry, ip, realpnp) and add a realpnp parameter to the zfs_lookup call we make, or something like that.
For now, we can work around the issue in Samba, I believe, however, it would be good to have a real solution to this problem.