mar-file-system / marfs

MarFS provides a scalable near-POSIX file system by using one or more POSIX file systems as a scalable metadata component and one or more data stores (object, file, etc) as a scalable data component.
Other
96 stars 27 forks source link

Source and destination files end-up zero bytes when copying to DIRECT repository #141

Closed gregorygeller closed 5 years ago

gregorygeller commented 8 years ago

This one is kind of scary:

-bash-4.1$ pwd
/gpfs/ccfs1/admins/mdfs/gellergr/marfiles
-bash-4.1$ ls -l
total 0
-bash-4.1$ dd bs=1M count=$(($RANDOM%max + 1)) if=/dev/zero of=/gpfs/ccfs1/admins/mdfs/gellergr/marfiles/test_file
3+0 records in
3+0 records out
3145728 bytes (3.1 MB) copied, 0.00275883 s, 1.1 GB/s
-bash-4.1$ ls -al
total 4608
drwxr-xr-x 2 gellergr gellergr    4096 Jun 21 11:32 .
drwxr-xr-x 6 gellergr gellergr    4096 Jun 21 11:27 ..
-rw-rw-r-- 1 gellergr gellergr 3145728 Jun 21 11:32 test_file
-bash-4.1$ popd
/campaign.gellergr/admins/gellergr/marfiles
-bash-4.1$ cp /gpfs/ccfs1/admins/mdfs/gellergr/marfiles/test_file .
-bash-4.1$ ls -al
total 0
drwxr-xr-x 2 gellergr gellergr 4096 Jun 21 11:32 .
drwxr-xr-x 6 gellergr gellergr 4096 Jun 21 11:27 ..
-rw-rw-r-- 1 gellergr gellergr    0 Jun 21 11:32 test_file
-bash-4.1$ pushd /gpfs/ccfs1/admins/mdfs/gellergr/marfiles/
/gpfs/ccfs1/admins/mdfs/gellergr/marfiles /campaign.gellergr/admins/gellergr/marfiles
-bash-4.1$ ls -l
total 0
-rw-rw-r-- 1 gellergr gellergr 0 Jun 21 11:32 test_file

So in this case, when I copy test_file from gpfs to MarFS, both the destination AND the source become 0-byte files.

thewacokid commented 8 years ago

Be careful using empty files (I'm a bit surprised you didn't get an error, actually). 'cp' will try to create sparse files on the output side.

Try using cp --sparse=never [src] [dest] and see if this is repeatable.

gregorygeller commented 8 years ago

Just tried with cp --sparse=never and got the same behavior.

Just to be clear, the original file was NOT empty. I copied a NON-empty file from gpfs to MarFS and got empty files in both the source and destination.

jti-lanl commented 8 years ago

Hi Greg,

In your first test-case, you are writing directly into the metadata filesystem underlying MarFS. You won't be able to create MarFS files that way, though it's okay to go look at such files to see what MarFS has done.

You should delete any files you created that way, because MarFs won't like them. (Your files will be easy to find, just look for files without xattrs :-)

You want to first mount fuse, then dd into the corresponding file on the fuse mount, (e.g. /campaign.gellergr/admins/gellergr/marfiles/test_file). then, you can go look at the MDFS (metadata filesystem) files.

Thanks, Jeff

On Jun 21, 2016, at 11:50 AM, Gregory Geller notifications@github.com wrote:

This one is kind of scary:

-bash-4.1$ pwd /gpfs/ccfs1/admins/mdfs/gellergr/marfiles -bash-4.1$ ls -l total 0 -bash-4.1$ dd bs=1M count=$(($RANDOM%max + 1)) if=/dev/zero of=/gpfs/ccfs1/admins/mdfs/gellergr/marfiles/test_file 3+0 records in 3+0 records out 3145728 bytes (3.1 MB) copied, 0.00275883 s, 1.1 GB/s -bash-4.1$ ls -al total 4608 drwxr-xr-x 2 gellergr gellergr 4096 Jun 21 11:32 . drwxr-xr-x 6 gellergr gellergr 4096 Jun 21 11:27 .. -rw-rw-r-- 1 gellergr gellergr 3145728 Jun 21 11:32 test_file -bash-4.1$ popd /campaign.gellergr/admins/gellergr/marfiles -bash-4.1$ cp /gpfs/ccfs1/admins/mdfs/gellergr/marfiles/test_file . -bash-4.1$ ls -al total 0 drwxr-xr-x 2 gellergr gellergr 4096 Jun 21 11:32 . drwxr-xr-x 6 gellergr gellergr 4096 Jun 21 11:27 .. -rw-rw-r-- 1 gellergr gellergr 0 Jun 21 11:32 test_file -bash-4.1$ pushd /gpfs/ccfs1/admins/mdfs/gellergr/marfiles/ /gpfs/ccfs1/admins/mdfs/gellergr/marfiles /campaign.gellergr/admins/gellergr/marfiles -bash-4.1$ ls -l total 0 -rw-rw-r-- 1 gellergr gellergr 0 Jun 21 11:32 test_file So in this case, when I copy test_file from gpfs to MarFS, both the destination AND the source become 0-byte files.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

gregorygeller commented 8 years ago

:(

Sorry for the scare. I'll avoid writing anything to the mdfs directory from now on.

jti-lanl commented 8 years ago

On second thought: those should have been treated as DIRECT files. We've never tested that. Apparently, it's not working perfectly.

So, the upshot is that DIRECT files have this problem you reported.

On Jun 21, 2016, at 12:38 PM, Jeff Inman jti@lanl.gov wrote:

Hi Greg,

In your first test-case, you are writing directly into the metadata filesystem underlying MarFS. You won't be able to create MarFS files that way, though it's okay to go look at such files to see what MarFS has done.

You should delete any files you created that way, because MarFs won't like them. (Your files will be easy to find, just look for files without xattrs :-)

You want to first mount fuse, then dd into the corresponding file on the fuse mount, (e.g. /campaign.gellergr/admins/gellergr/marfiles/test_file). then, you can go look at the MDFS (metadata filesystem) files.

Thanks, Jeff

On Jun 21, 2016, at 11:50 AM, Gregory Geller notifications@github.com wrote:

This one is kind of scary:

-bash-4.1$ pwd /gpfs/ccfs1/admins/mdfs/gellergr/marfiles -bash-4.1$ ls -l total 0 -bash-4.1$ dd bs=1M count=$(($RANDOM%max + 1)) if=/dev/zero of=/gpfs/ccfs1/admins/mdfs/gellergr/marfiles/test_file 3+0 records in 3+0 records out 3145728 bytes (3.1 MB) copied, 0.00275883 s, 1.1 GB/s -bash-4.1$ ls -al total 4608 drwxr-xr-x 2 gellergr gellergr 4096 Jun 21 11:32 . drwxr-xr-x 6 gellergr gellergr 4096 Jun 21 11:27 .. -rw-rw-r-- 1 gellergr gellergr 3145728 Jun 21 11:32 test_file -bash-4.1$ popd /campaign.gellergr/admins/gellergr/marfiles -bash-4.1$ cp /gpfs/ccfs1/admins/mdfs/gellergr/marfiles/test_file . -bash-4.1$ ls -al total 0 drwxr-xr-x 2 gellergr gellergr 4096 Jun 21 11:32 . drwxr-xr-x 6 gellergr gellergr 4096 Jun 21 11:27 .. -rw-rw-r-- 1 gellergr gellergr 0 Jun 21 11:32 test_file -bash-4.1$ pushd /gpfs/ccfs1/admins/mdfs/gellergr/marfiles/ /gpfs/ccfs1/admins/mdfs/gellergr/marfiles /campaign.gellergr/admins/gellergr/marfiles -bash-4.1$ ls -l total 0 -rw-rw-r-- 1 gellergr gellergr 0 Jun 21 11:32 test_file So in this case, when I copy test_file from gpfs to MarFS, both the destination AND the source become 0-byte files.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

brettkettering commented 8 years ago

I renamed the issue to be about DIRECT repos. We should not be supporting DIRECT repos right now. If a repo is listed as DIRECT that should generate an error in the configuration parser. Once we do, we'll need to fix this error.

gransom commented 5 years ago

This is an ancient issue, but I happened to be looking through. It looks to me like the original example of this involved copying a gpfs direct file straight out of gpfs and overwriting the corresponding MarFS file. As MarFS is treating the file as DIRECT, this means it copies the file directly over the corresponding gpfs file, which was used as the source. I may be misinterpreting, as this appears to have been done on a system that doesn't exist any longer. However, if I'm correct, this is a non-issue. It's the equivalent of issuing 'cp' with src == dst, so it's not unexpected that it resulted in an overwrite of the src. I expect that truncating the source to zero (by overwriting the marfs file) then meant 'cp' had nothing to copy. Hence, both (really just one) files were left at zero length.