Closed gregorygeller closed 5 years ago
Is this something in GPFS (e.g. some cache coherency thing), like what we see with a delay in visibility of xattrs?
How does the FUSE daemon handle files that were just written in terms of xattrs? If they aren't in place yet, does FUSE pass back the information differently?
As I recall, the process that writes the xattrs sees them immediately. I think that's why FUSE doesn't notice any issue with delayed xattrs.
Otherwise, I think the problems would be dramatically obvious. For example, the last thing we do when closing-off a write is strip off the restart xattr. If there was a delay in fuse seeing this action, an open/read immediately after a write/close should sometimes fail.
Uh oh:
$ echo foo > /campaign.jti/admins/jti/test ; cat /campaign.jti/admins/jti/test cat: /campaign.jti/admins/jti/test: Invalid argument
$ cat /campaign.jti/admins/jti/test foo
Can this be replicated outside of MarFS?
This doesn't seem to have problems:
for i in seq 1 10
; do echo F=/gpfs/..../test/foo; attr -r foo $F; sleep 2; attr -q -s foo -V 1 $F ; attr -q -g foo $F
; done
Maybe there's a difference in behavior between attr(1) and lgetxattr(2) ?
We could have our save_xattrs() / stat_xattrs() do something to assure synchrony, if we knew what that thing was. It feels like a cache coherency issue.
Is the issue with file-size just a variant of the same problem, regarding stat ?
On Aug 24, 2016, at 12:22 PM, David Bonnie notifications@github.com wrote:
How does the FUSE daemon handle files that were just written in terms of xattrs? If they aren't in place yet, does FUSE pass back the information differently?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
New idea: The delay here is (maybe) not GPFS but rather that fuse calls release() with a delay, after returning from file-operations.
It seems likely at this point that this is not because of GPFS, but actually due to the implementation of fuse which calls release()
asynchronously ("documented" here and here). marfs_release()
is where we remove the restart xattr, which allows the file to be opened, and where we truncate the metadata file to the correct size.
It appears the only fuse call that is made before close returns is fuse_flush()
. We cannot, however, simply shift the logic from marfs_release
to marfs_flush
since flush is sometimes called more than once (such as in the case of dup'd fds or following a fork()
).
I believe we could fix the file size issue by truncating the MD file in flush, since repeated truncates should not cause problems as long as we always truncate based on the number of bytes written to the object stream (which we do).
The object stream and xatter operations are more challenging to move. We must only close an object stream once, so that code can't go here. Similarly we must only remove the restart xattr on the last flush, but it is impossible to know whether any given call to flush will be the last one.
Old issue. Not a problem. :)
So, ran into this problem today:
When copying a file from MarFS to GPFS, the correct file size is reported immediately. However, when copying in the other direction (from GPFS to MarFS), I first get a zero byte file and then, within a few seconds (takes longer for bigger files), the correct file size is reported:
From gpfs to MarFS:
Doesn't matter if I do a sync first:
From MarFS to gpfs: