mar-file-system / marfs

MarFS provides a scalable near-POSIX file system by using one or more POSIX file systems as a scalable metadata component and one or more data stores (object, file, etc) as a scalable data component.
Other
96 stars 27 forks source link

Incorrect file size when copying files from gpfs to MarFS #140

Closed gregorygeller closed 5 years ago

gregorygeller commented 8 years ago

So, ran into this problem today:

When copying a file from MarFS to GPFS, the correct file size is reported immediately. However, when copying in the other direction (from GPFS to MarFS), I first get a zero byte file and then, within a few seconds (takes longer for bigger files), the correct file size is reported:

From gpfs to MarFS:

-bash-4.1$ pwd
/campaign.gellergr/admins/gellergr/marfiles
-bash-4.1$ ls -l /gpfs/ccfs1/admins/mdfs/gellergr/pfdest/unifile 
-rw-rw-r-- 1 gellergr gellergr 1048576 Jun 21 10:43 /gpfs/ccfs1/admins/mdfs/gellergr/pfdest/unifile
-bash-4.1$ cp /gpfs/ccfs1/admins/mdfs/gellergr/pfdest/unifile .; ls -l
total 0
-rw-r--r-- 1 gellergr gellergr 0 Jun 21 11:01 unifile
-bash-4.1$ ls -l
total 0
-rw-r--r-- 1 gellergr gellergr 1048576 Jun 21 11:01 unifile

Doesn't matter if I do a sync first:

-bash-4.1$ pwd
/campaign.gellergr/admins/gellergr/marfiles
-bash-4.1$ ls -al /gpfs/ccfs1/admins/mdfs/gellergr/test_file
-rw-rw-r-- 1 gellergr gellergr 1922039808 Jun 21 11:18 /gpfs/ccfs1/admins/mdfs/gellergr/test_file
-bash-4.1$ cp /gpfs/ccfs1/admins/mdfs/gellergr/test_file .; sync; ls -l 
total 0
-rw-r--r-- 1 gellergr gellergr 48 Jun 21 11:23 test_file
-bash-4.1$ ls -l
total 0
-rw-r--r-- 1 gellergr gellergr 48 Jun 21 11:23 test_file
-bash-4.1$ ls -l
total 512
-rw-r--r-- 1 gellergr gellergr 1922039808 Jun 21 11:23 test_file
-bash-4.1$ 

From MarFS to gpfs:

-bash-4.1$ pwd
/campaign.gellergr/admins/gellergr/marfiles
-bash-4.1$ ls -l
total 0
-rw-r--r-- 1 gellergr gellergr 1048576 Jun 21 11:01 unifile
-bash-4.1$ cp unifile /gpfs/ccfs1/admins/mdfs/gellergr/pfdest/; ls -l /gpfs/ccfs1/admins/mdfs/gellergr/pfdest/unifile
-rw-rw-r-- 1 gellergr gellergr 1048576 Jun 21 11:04 /gpfs/ccfs1/admins/mdfs/gellergr/pfdest/unifile
-bash-4.1$ 
jti-lanl commented 8 years ago

Is this something in GPFS (e.g. some cache coherency thing), like what we see with a delay in visibility of xattrs?

thewacokid commented 8 years ago

How does the FUSE daemon handle files that were just written in terms of xattrs? If they aren't in place yet, does FUSE pass back the information differently?

jti-lanl commented 8 years ago

As I recall, the process that writes the xattrs sees them immediately. I think that's why FUSE doesn't notice any issue with delayed xattrs.

Otherwise, I think the problems would be dramatically obvious. For example, the last thing we do when closing-off a write is strip off the restart xattr. If there was a delay in fuse seeing this action, an open/read immediately after a write/close should sometimes fail.

Uh oh:

$ echo foo > /campaign.jti/admins/jti/test ; cat /campaign.jti/admins/jti/test cat: /campaign.jti/admins/jti/test: Invalid argument

$ cat /campaign.jti/admins/jti/test foo

Can this be replicated outside of MarFS?

This doesn't seem to have problems:

for i in seq 1 10; do echo F=/gpfs/..../test/foo; attr -r foo $F; sleep 2; attr -q -s foo -V 1 $F ; attr -q -g foo $F; done

Maybe there's a difference in behavior between attr(1) and lgetxattr(2) ?

We could have our save_xattrs() / stat_xattrs() do something to assure synchrony, if we knew what that thing was. It feels like a cache coherency issue.

Is the issue with file-size just a variant of the same problem, regarding stat ?

On Aug 24, 2016, at 12:22 PM, David Bonnie notifications@github.com wrote:

How does the FUSE daemon handle files that were just written in terms of xattrs? If they aren't in place yet, does FUSE pass back the information differently?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

jti-lanl commented 8 years ago

New idea: The delay here is (maybe) not GPFS but rather that fuse calls release() with a delay, after returning from file-operations.

wfvining commented 8 years ago

It seems likely at this point that this is not because of GPFS, but actually due to the implementation of fuse which calls release() asynchronously ("documented" here and here). marfs_release() is where we remove the restart xattr, which allows the file to be opened, and where we truncate the metadata file to the correct size.

It appears the only fuse call that is made before close returns is fuse_flush(). We cannot, however, simply shift the logic from marfs_release to marfs_flush since flush is sometimes called more than once (such as in the case of dup'd fds or following a fork()).

I believe we could fix the file size issue by truncating the MD file in flush, since repeated truncates should not cause problems as long as we always truncate based on the number of bytes written to the object stream (which we do).

The object stream and xatter operations are more challenging to move. We must only close an object stream once, so that code can't go here. Similarly we must only remove the restart xattr on the last flush, but it is impossible to know whether any given call to flush will be the last one.

thewacokid commented 5 years ago

Old issue. Not a problem. :)