mar-file-system / marfs

MarFS provides a scalable near-POSIX file system by using one or more POSIX file systems as a scalable metadata component and one or more data stores (object, file, etc) as a scalable data component.
Other
96 stars 27 forks source link

PFTool N to N copies from marfs to marfs fail due to assertion violation #157

Closed wfvining closed 8 years ago

wfvining commented 8 years ago

When performing an N to N copy from a MarFS path to a MarFS path, an assertion is violated in s3_set_host_r. The files being copied are the maximum possible Uni size ( - - 1). We have recreated this error with both 7.5k files and 20k files of this size in a single directory running PFTool with 19 and 35 processes.

The error occurs at different times (as early as 20GB into the copy, or as late as 400GB in the test we ran). Below is the output from pftool (paths have been edited).

$ mpirun -np 35 ./pftool/install/bin/pftool -w 0 -r -p /marfs/foo/7NN -c /marfs/foo/7NN_cp
INFO  HEADER   ========================  TestJob  ============================
INFO  HEADER   Starting Path: /marfs/foo/7NN
INFO  HEADER   Source-type: MARFS_Path
INFO  HEADER   Dest-type:   MARFS_Path
INFO ACCUM  files/chunks:   0        data:      0 B / 119.2 GB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:   0        data:      0 B / 357.6 GB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:   0        data:      0 B / 596.0 GB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:   0        data:      0 B / 834.5 GB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:   0        data:      0 B / 953.7 GB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:   0        data:      0 B /   1.2 TB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:   0        data:      0 B /   1.4 TB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:   0        data:      0 B /   1.5 TB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:   0        data:      0 B /   1.7 TB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:   0        data:      0 B /   2.0 TB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:   0        data:      0 B /   2.1 TB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:   0        data:      0 B /   2.3 TB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:   0        data:      0 B /   2.6 TB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:   0        data:      0 B /   2.7 TB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:   0        data:      0 B /   2.9 TB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:   0        data:      0 B /   3.1 TB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:   0        data:      0 B /   3.4 TB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:   0        data:      0 B /   3.5 TB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:   0        data:      0 B /   3.7 TB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:   0        data:      0 B /   4.0 TB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:   0        data:      0 B /   4.1 TB       avg BW:      0 B/s      errs: 0
INFO ACCUM  files/chunks:  11        data:  10.2 GB /   4.3 TB       avg BW:  47.4 MB/s      errs: 0
INFO ACCUM  files/chunks:  28        data:  26.1 GB /   4.5 TB       avg BW: 115.5 MB/s      errs: 0
INFO ACCUM  files/chunks:  30        data:  27.9 GB /   4.8 TB       avg BW: 118.3 MB/s      errs: 0
pftool: aws4c.c:905: s3_set_host_r: Assertion `!ctx->inside' failed.
--------------------------------------------------------------------------
mpirun noticed that process rank 34 with PID 21085 on node **** exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
gsparrow commented 8 years ago

Today @wfvining and I have been able to crash the fuse mount. A stack trace with GDB revealed that both PFTool and the Fuse mount are failing on the same assertion in s3_set_host_r() in the AWS4C library. At the time of becoming reliably able to reproduce the Fuse mount crash, we were copying 20TB with PFTool from MarFS to /dev/null with 19 processes. We then ran the MarFS suite of tests in Jenkins, which failed and the transport endpoint became disconnected. We then fully unmounted manually, remounted, and produced the same error. PFTool was operating at an average 775MB/s at the time of the crash, and on the system, the maximum we have seen is 1017MB/s. The MarFS test suite puts a very light write load on MarFS (less than 10GB) through Fuse, and @wfvining was able to get the error with only writing 3 characters.

wfvining commented 8 years ago

I have been able to recreate the error, and capture core dumps, in gdb. When recreating the bug in FUSE (while marfs is under a read load from Jenkins), the bug occurs under the following conditions.

$ cd /campaign/namespace
$ echo foo >foo.txt
$

the write appears to have succeeded at this point and the shell prompt is returned.

A few seconds later (this doesn't always happen, and the timing varies) a breakpoint on marfs_release is triggered. Continuing through that leads to a SIGABRT from the same assertion violation described above. The stack trace follows (note that marfs_release has not returned yet).

(gdb) bt
#0  0x00000032c7032625 in raise () from /lib64/libc.so.6
#1  0x00000032c7033e05 in abort () from /lib64/libc.so.6
#2  0x00000032c702b74e in __assert_fail_base () from /lib64/libc.so.6
#3  0x00000032c702b810 in __assert_fail () from /lib64/libc.so.6
#4  0x000000000040b635 in s3_set_host_r (str=0x0, ctx=0x7fce1c000da0)
    at aws4c.c:905
#5  0x000000000040b07a in aws_context_release_r (ctx=0x7fce1c000da0)
    at aws4c.c:652
#6  0x000000000040b0c7 in aws_context_free_r (ctx=0x7fce1c000da0)
    at aws4c.c:659
#7  0x000000000040beb6 in aws_iobuf_reset_hard (b=0x7fce1c023f50)
    at aws4c.c:1253
#8  0x0000000000437776 in marfs_release (
    path=0x7fce1c000c90 "/namespace/foo.txt", fh=0x7fce1c022fc0)
    at marfs_ops.c:2055
#9  0x0000000000406433 in fuse_release ()

The other relevant thread (in streaming_ readfunc or writefunc) is in a sem_wait when this occurs.

#0  0x00000032c740d9b0 in sem_wait () from /lib64/libpthread.so.0
#1  0x000000000042a5b0 in streaming_writefunc (ptr=0x7fce140011fd, size=1, 
    nmemb=152, stream=0x7fce1c023f50) at object_stream.c:563
#2  0x00000032dd418378 in Curl_client_write () from /usr/lib64/libcurl.so.4
#3  0x00000032dd42b076 in Curl_readwrite () from /usr/lib64/libcurl.so.4
#4  0x00000032dd42cc58 in Curl_perform () from /usr/lib64/libcurl.so.4
#5  0x000000000040ea5a in s3_do_get (b=0x7fce1c023f50, signature=0x0, 
...
jti-lanl commented 8 years ago

Great bug report, and some sleuthing with Will, seems to have revealed the problem. There's a tentative fix in the "issue-157" branch.

wfvining commented 8 years ago

@gsparrow and I have run a fair number of tests on this fix including a 7TB copy from marfs to marfs with 35 processes that was successful. Combined with the testing that @thewacokid has done I am confident that this fixed the problem and didn't break anything else.

jti-lanl commented 8 years ago

Merged into master.