openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.63k stars 1.75k forks source link

bad error : disappearing / reappearing file visibility #3224

Closed tom71-zz closed 9 years ago

tom71-zz commented 9 years ago

We have the problem, that files are shown as "No such file or directory", when we try to access them. After a few access tries, they are accessible again. This error is shown locally and via nfs. The error happens after about 10 hours of run time of our software. As we have lots of data, we cannot provide a reproducer script on its own. But we have created a test zfs pool now and try to shorten the time til the errror occurs to a range of minutes. We try to accomplish this by raising the work load by a factor of 10. We will also run zdb -ddddd on the test pool before and after the error. We can provide a login to our machines, so developpers can gather more information.

behlendorf commented 9 years ago

@tom71 if you could provide a reproducer of the issue that would be the most helpful. It would also be helpful to know what version of ZFS you're able to reproduce this issue on.

tom71-zz commented 9 years ago

We are working hard on a reproducer. For our current software it takes about 10 hours for the error to occur. We multiplied the workload by 10 in order to force the error earlier. But we have loads of data, we work on. It is difficult to construct a reproducer without the data. We are using ubuntu-zfs 8 trusty. Kernel is 3.13.0-39-generic. We made a special zfs test pool, where we put the data. We ran zdb -ddddd testpool before the run of the software and will run the same command after that. Are there other commands ,we should run with zdb ? We can provide a test account on our machines.

On 25.03.2015 19:40, Brian Behlendorf wrote:

@tom71 https://github.com/tom71 if you could provide a reproducer of the issue that would be the most helpful. It would also be helpful to know what version of ZFS you're able to reproduce this issue on.

— Reply to this email directly or view it on GitHub https://github.com/zfsonlinux/zfs/issues/3224#issuecomment-86163880.

Dipl.-Ing. Thomas Grossmann Chair for Clinical Bioinformatics Saarland University, University Hospital Building E2.1 66123 Saarbrücken, Germany Ph. +49 681 302 68603

tom71-zz commented 9 years ago

What we found out:

On our machines main2, main3 and main4.

If we create a file e.g. test.txt. We edit the file e.g. insert a line "echo 2222". Save the file. do a "ls" on the file. Works. Change the file e.g. add a line "echo 3333" save the file. Do a ls. Does not work. "No such file or directory. Before we had the file systems local and via nfs and the error occured. Now we have only nfs mounts and the error still occurs. The error does not always show up.

I can provide a login to our servers. Please mail to thomas.grossmann@ccb.uni-saarland.de for the login data.

Thomas

tom71-zz commented 9 years ago

It is hard to generate a reproducer, because the cause of the failure is mostly editing a file, saving it, then it is not there. Then it is there again. We have a new case: a python script is changed. Then it is saved and executed. The interpreter in that script is not found. The python interpreter is on an ext4 file system. Maybe this error gives you more ideas. We have not had any problems like that when being purely on ext4 on all file systems. This happened exactly when we introduced zfs. Is there a problem with zfs <-> vfs operation ? After executing the script a second time everything is fine. We will update to the 0.6.4 zfs release soon. Hopefully we get an improvement here.

tom71-zz commented 9 years ago

This the file after editing it:

ls -la in the dir

-????????? ? ? ? ? ? xxxxx

try that again and it works.

tom71-zz commented 9 years ago

editing file y:

test10@main2:~$ ls y ls: cannot access y: File exists

again: works

test10@main2:~$ ls g ls: cannot access g: No such file or directory

again: works

tom71-zz commented 9 years ago

Thanks a lot for the patch. Our next maintenance is next week. I have already prepared the servers and looking forward to run the machines with it.

jorg85 commented 9 years ago

patch works with Version 0.6.4.1, starting on line 105 in module/zfs/zpl_export.c

But on step "Now back to the original client:" within ca. 10 sec i will see:

nfs_client1: cat junk cat: junk: File exists

After about 10 sec the error is gone away and cat shows the content of junk.

behlendorf commented 9 years ago

@kpande thanks for posting this from the list. I've opened pull request #3404 to get some review and feedback on this proposed fix.