plfs / plfs-core

LANL no longer develops PLFS. Feel free to fork and develop as you wish.
41 stars 36 forks source link

PLFS/FUSE fails with fsx #326

Open lionkov opened 10 years ago

lionkov commented 10 years ago

Running the File System Excersizer (http://codemonkey.org.uk/projects/fsx/ltp-fsx.c) on a PLFS/FUSE fails:

hop0:~/tmp/plfs$ fsx -P /tmp -W -R te mapped writes DISABLED truncating to largest ever: 0x32740 READ BAD DATA: offset = 0xa935, size = 0x85fd OFFSET GOOD BAD RANGE 0x10f13 0x0d02 0x016b 0x 1ff8 operation# (mod 256) for the bad data may be 107 LOG DUMP (13 total operations): 1(1 mod 256): WRITE 0x1f7e6 thru 0x2250e (0x2d29 bytes) HOLE *WWWW 2(2 mod 256): WRITE 0x10f13 thru 0x16b11 (0x5bff bytes) WWWW 3(3 mod 256): READ 0x1f48a thru 0x2250e (0x3085 bytes) 4(4 mod 256): READ 0x16511 thru 0x2250e (0xbffe bytes) 5(5 mod 256): READ 0x21699 thru 0x2250e (0xe76 bytes) 6(6 mod 256): READ 0x300 thru 0xe2e8 (0xdfe9 bytes) 7(7 mod 256): READ 0x891 thru 0x9083 (0x87f3 bytes) 8(8 mod 256): READ 0x19c62 thru 0x20cbd (0x705c bytes) 9(9 mod 256): TRUNCATE UP from 0x2250f to 0x32740 10(10 mod 256): READ 0xad21 thru 0xd152 (0x2432 bytes) 11(11 mod 256): WRITE 0x293c2 thru 0x34b07 (0xb746 bytes) EXTEND 12(12 mod 256): TRUNCATE DOWN from 0x34b08 to 0x1977a 13(13 mod 256): READ 0xa935 thru 0x12f31 (0x85fd bytes) *RRRR** Correct content saved for comparison (maybe hexdump "te" vs "te.fsxgood")

brettkettering commented 10 years ago

So, after searching for various strings I'm guessing that the failure was that it read bad data. This output isn't very intuitive to figure out what test failed, what the expected result was versus what result was obtained. Oh, how I wish people wrote some English explanation of what they're trying to do in their code rather than letting the source be the documentation. I don't enjoy playing compiler.

So, Lucho, what do you want us to do with this information?

Brett

From: lionkov notifications@github.com<mailto:notifications@github.com> Reply-To: plfs/plfs-core reply@reply.github.com<mailto:reply@reply.github.com> Date: Tuesday, November 5, 2013 2:53 PM To: plfs/plfs-core plfs-core@noreply.github.com<mailto:plfs-core@noreply.github.com> Subject: [plfs-core] PLFS/FUSE fails with fsx (#326)

Running the File System Excersizer (http://codemonkey.org.uk/projects/fsx/ltp-fsx.c) on a PLFS/FUSE fails:

hop0:~/tmp/plfs$ fsx -P /tmp -W -R te mapped writes DISABLED truncating to largest ever: 0x32740 READ BAD DATA: offset = 0xa935, size = 0x85fd OFFSET GOOD BAD RANGE 0x10f13 0x0d02 0x016b 0x 1ff8 operation# (mod 256) for the bad data may be 107 LOG DUMP (13 total operations): 1(1 mod 256): WRITE 0x1f7e6 thru 0x2250e (0x2d29 bytes) HOLE _WWWW 2(2 mod 256): WRITE 0x10f13 thru 0x16b11 (0x5bff bytes) _WWWW 3(3 mod 256): READ 0x1f48a thru 0x2250e (0x3085 bytes) 4(4 mod 256): READ 0x16511 thru 0x2250e (0xbffe bytes) 5(5 mod 256): READ 0x21699 thru 0x2250e (0xe76 bytes) 6(6 mod 256): READ 0x300 thru 0xe2e8 (0xdfe9 bytes) 7(7 mod 256): READ 0x891 thru 0x9083 (0x87f3 bytes) 8(8 mod 256): READ 0x19c62 thru 0x20cbd (0x705c bytes) 9(9 mod 256): TRUNCATE UP from 0x2250f to 0x32740 10(10 mod 256): READ 0xad21 thru 0xd152 (0x2432 bytes) 11(11 mod 256): WRITE 0x293c2 thru 0x34b07 (0xb746 bytes) EXTEND 12(12 mod 256): TRUNCATE DOWN from 0x34b08 to 0x1977a 13(13 mod 256): READ 0xa935 thru 0x12f31 (0x85fd bytes) RRRR Correct content saved for comparison (maybe hexdump "te" vs "te.fsxgood")

— Reply to this email directly or view it on GitHubhttps://github.com/plfs/plfs-core/issues/326.

lionkov commented 10 years ago

Download and run fsx. Unlike some other tests, it produces both the "bad" and "good" file contents, as well as a log of the file operations that produced the error.

On Tue, Nov 5, 2013 at 3:13 PM, Brett Kettering notifications@github.comwrote:

So, after searching for various strings I'm guessing that the failure was that it read bad data. This output isn't very intuitive to figure out what test failed, what the expected result was versus what result was obtained. Oh, how I wish people wrote some English explanation of what they're trying to do in their code rather than letting the source be the documentation. I don't enjoy playing compiler.

So, Lucho, what do you want us to do with this information?

Brett

From: lionkov notifications@github.com<mailto:notifications@github.com> Reply-To: plfs/plfs-core <reply@reply.github.com<mailto: reply@reply.github.com>> Date: Tuesday, November 5, 2013 2:53 PM To: plfs/plfs-core <plfs-core@noreply.github.com<mailto: plfs-core@noreply.github.com>> Subject: [plfs-core] PLFS/FUSE fails with fsx (#326)

Running the File System Excersizer ( http://codemonkey.org.uk/projects/fsx/ltp-fsx.c) on a PLFS/FUSE fails:

hop0:~/tmp/plfs$ fsx -P /tmp -W -R te mapped writes DISABLED truncating to largest ever: 0x32740 READ BAD DATA: offset = 0xa935, size = 0x85fd OFFSET GOOD BAD RANGE 0x10f13 0x0d02 0x016b 0x 1ff8 operation# (mod 256) for the bad data may be 107 LOG DUMP (13 total operations): 1(1 mod 256): WRITE 0x1f7e6 thru 0x2250e (0x2d29 bytes) HOLE _WWWW 2(2 mod 256): WRITE 0x10f13 thru 0x16b11 (0x5bff bytes) _WWWW 3(3 mod 256): READ 0x1f48a thru 0x2250e (0x3085 bytes) 4(4 mod 256): READ 0x16511 thru 0x2250e (0xbffe bytes) 5(5 mod 256): READ 0x21699 thru 0x2250e (0xe76 bytes) 6(6 mod 256): READ 0x300 thru 0xe2e8 (0xdfe9 bytes) 7(7 mod 256): READ 0x891 thru 0x9083 (0x87f3 bytes) 8(8 mod 256): READ 0x19c62 thru 0x20cbd (0x705c bytes) 9(9 mod 256): TRUNCATE UP from 0x2250f to 0x32740 10(10 mod 256): READ 0xad21 thru 0xd152 (0x2432 bytes) 11(11 mod 256): WRITE 0x293c2 thru 0x34b07 (0xb746 bytes) EXTEND 12(12 mod 256): TRUNCATE DOWN from 0x34b08 to 0x1977a 13(13 mod 256): READ 0xa935 thru 0x12f31 (0x85fd bytes) RRRR Correct content saved for comparison (maybe hexdump "te" vs "te.fsxgood")

— Reply to this email directly or view it on GitHub< https://github.com/plfs/plfs-core/issues/326>.

— Reply to this email directly or view it on GitHubhttps://github.com/plfs/plfs-core/issues/326#issuecomment-27818346 .

lionkov commented 10 years ago

Additional information:

Run on a single node, the underlying file system is ext4.

Configuration file .plfsrc:

thewacokid commented 10 years ago

Looks like a failure in truncate somewhere (most likely due to calling a partial truncate on a file that's open in read/write mode).

Do we care?

I get different results, btw, on my Macbook:

pn1245359:plfs_n1 dbonnie$ ./a.out -P /tmp -W -R test.file mapped writes DISABLED truncating to largest ever: 0x32740 Size error: expected 0x1977a stat 0x17000 seek 0x17000 LOG DUMP (12 total operations): 1(1 mod 256): WRITE 0x1f7e6 thru 0x2250e (0x2d29 bytes) HOLE 2(2 mod 256): WRITE 0x10f13 thru 0x16b11 (0x5bff bytes) 3(3 mod 256): READ 0x1f48a thru 0x2250e (0x3085 bytes) 4(4 mod 256): READ 0x16511 thru 0x2250e (0xbffe bytes) 5(5 mod 256): READ 0x21699 thru 0x2250e (0xe76 bytes) 6(6 mod 256): READ 0x300 thru 0xe2e8 (0xdfe9 bytes) 7(7 mod 256): READ 0x891 thru 0x9083 (0x87f3 bytes) 8(8 mod 256): READ 0x19c62 thru 0x20cbd (0x705c bytes) 9(9 mod 256): TRUNCATE UP from 0x2250f to 0x32740 10(10 mod 256): READ 0xad21 thru 0xd152 (0x2432 bytes) 11(11 mod 256): WRITE 0x293c2 thru 0x34b07 (0xb746 bytes) EXTEND 12(12 mod 256): TRUNCATE DOWN from 0x34b08 to 0x1977a Correct content saved for comparison (maybe hexdump "test.file" vs "test.file.fsxgood")

pn1245359:plfs_n1 dbonnie$ hexdump test.file 0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 0019770

pn1245359:plfs_n1 dbonnie$ hexdump /tmp//test.file.fsxgood 0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0010f10 00 00 00 0d 02 6a 02 6b 02 17 02 df 02 79 02 e6 0010f20 02 4b 02 5e 02 4e 02 13 02 f0 02 93 02 2e 02 cd 0010f30 02 2a 02 fd 02 3c 02 4d 02 50 02 f2 02 6c 02 04 0010f40 02 fc 02 53 02 ff 02 6f 02 e1 02 43 02 34 02 2f . . . SNIPPED FOR BREVITY . . . 0016af0 02 a0 02 f2 02 d7 02 82 02 71 02 21 02 31 02 91 0016b00 02 e9 02 9b 02 bd 02 04 02 a1 02 f5 02 fc 02 e2 0016b10 02 68 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0016b20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0019770

thewacokid commented 10 years ago

The test passed completely with the "-L" flag to ignore truncate operations.

brettkettering commented 10 years ago

We tell people to use O_RDONLY or O_WRONLY, but not O_RDWR. So, as long as we can open a file with a O_TRUNC and then write it, I think we're OK.

Brett

From: David Bonnie notifications@github.com<mailto:notifications@github.com> Reply-To: plfs/plfs-core reply@reply.github.com<mailto:reply@reply.github.com> Date: Wednesday, November 6, 2013 9:07 AM To: plfs/plfs-core plfs-core@noreply.github.com<mailto:plfs-core@noreply.github.com> Cc: Brett Kettering brettk@lanl.gov<mailto:brettk@lanl.gov> Subject: Re: [plfs-core] PLFS/FUSE fails with fsx (#326)

Looks like a failure in truncate somewhere (most likely due to calling a partial truncate on a file that's open in read/write mode).

Do we care?

I get different results, btw, on my Macbook:

pn1245359:plfs_n1 dbonnie$ ./a.out -P /tmp -W -R test.file mapped writes DISABLED truncating to largest ever: 0x32740 Size error: expected 0x1977a stat 0x17000 seek 0x17000 LOG DUMP (12 total operations): 1(1 mod 256): WRITE 0x1f7e6 thru 0x2250e (0x2d29 bytes) HOLE 2(2 mod 256): WRITE 0x10f13 thru 0x16b11 (0x5bff bytes) 3(3 mod 256): READ 0x1f48a thru 0x2250e (0x3085 bytes) 4(4 mod 256): READ 0x16511 thru 0x2250e (0xbffe bytes) 5(5 mod 256): READ 0x21699 thru 0x2250e (0xe76 bytes) 6(6 mod 256): READ 0x300 thru 0xe2e8 (0xdfe9 bytes) 7(7 mod 256): READ 0x891 thru 0x9083 (0x87f3 bytes) 8(8 mod 256): READ 0x19c62 thru 0x20cbd (0x705c bytes) 9(9 mod 256): TRUNCATE UP from 0x2250f to 0x32740 10(10 mod 256): READ 0xad21 thru 0xd152 (0x2432 bytes) 11(11 mod 256): WRITE 0x293c2 thru 0x34b07 (0xb746 bytes) EXTEND 12(12 mod 256): TRUNCATE DOWN from 0x34b08 to 0x1977a Correct content saved for comparison (maybe hexdump "test.file" vs "test.file.fsxgood")

pn1245359:plfs_n1 dbonnie$ hexdump test.file 0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 0019770

pn1245359:plfs_n1 dbonnie$ hexdump /tmp//test.file.fsxgood 0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0010f10 00 00 00 0d 02 6a 02 6b 02 17 02 df 02 79 02 e6 0010f20 02 4b 02 5e 02 4e 02 13 02 f0 02 93 02 2e 02 cd 0010f30 02 2a 02 fd 02 3c 02 4d 02 50 02 f2 02 6c 02 04 0010f40 02 fc 02 53 02 ff 02 6f 02 e1 02 43 02 34 02 2f . . . SNIPPED FOR BREVITY . . . 0016af0 02 a0 02 f2 02 d7 02 82 02 71 02 21 02 31 02 91 0016b00 02 e9 02 9b 02 bd 02 04 02 a1 02 f5 02 fc 02 e2 0016b10 02 68 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0016b20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0019770

— Reply to this email directly or view it on GitHubhttps://github.com/plfs/plfs-core/issues/326#issuecomment-27886387.

dshrader commented 10 years ago

We already have tests that test O_TRUNC, don't we? I'll check with Alfred. If we do, then I'm not sure what this test provides beyond what we have already (regression suite and posix test suite).

lionkov commented 10 years ago

If a test fails and the tests you already have don't, obviously it provides something beyond what you already have :)

On Wed, Nov 6, 2013 at 9:29 AM, David notifications@github.com wrote:

We already have tests that test O_TRUNC, don't we? I'll check with Alfred. If we do, then I'm not sure what this test provides beyond what we have already (regression suite and posix test suite).

— Reply to this email directly or view it on GitHubhttps://github.com/plfs/plfs-core/issues/326#issuecomment-27888471 .

johnbent commented 10 years ago

I think we prefer to be as posix compliant as possible without sacrificing performance for our core workloads. And without massive internal code complexity. This should be low on priority list but it's higher than zero I think.

On Nov 6, 2013, at 9:22 AM, Brett Kettering notifications@github.com wrote:

We tell people to use O_RDONLY or O_WRONLY, but not O_RDWR. So, as long as we can open a file with a O_TRUNC and then write it, I think we're OK.

Brett

From: David Bonnie notifications@github.com<mailto:notifications@github.com> Reply-To: plfs/plfs-core reply@reply.github.com<mailto:reply@reply.github.com> Date: Wednesday, November 6, 2013 9:07 AM To: plfs/plfs-core plfs-core@noreply.github.com<mailto:plfs-core@noreply.github.com> Cc: Brett Kettering brettk@lanl.gov<mailto:brettk@lanl.gov> Subject: Re: [plfs-core] PLFS/FUSE fails with fsx (#326)

Looks like a failure in truncate somewhere (most likely due to calling a partial truncate on a file that's open in read/write mode).

Do we care?

I get different results, btw, on my Macbook:

pn1245359:plfs_n1 dbonnie$ ./a.out -P /tmp -W -R test.file mapped writes DISABLED truncating to largest ever: 0x32740 Size error: expected 0x1977a stat 0x17000 seek 0x17000 LOG DUMP (12 total operations): 1(1 mod 256): WRITE 0x1f7e6 thru 0x2250e (0x2d29 bytes) HOLE 2(2 mod 256): WRITE 0x10f13 thru 0x16b11 (0x5bff bytes) 3(3 mod 256): READ 0x1f48a thru 0x2250e (0x3085 bytes) 4(4 mod 256): READ 0x16511 thru 0x2250e (0xbffe bytes) 5(5 mod 256): READ 0x21699 thru 0x2250e (0xe76 bytes) 6(6 mod 256): READ 0x300 thru 0xe2e8 (0xdfe9 bytes) 7(7 mod 256): READ 0x891 thru 0x9083 (0x87f3 bytes) 8(8 mod 256): READ 0x19c62 thru 0x20cbd (0x705c bytes) 9(9 mod 256): TRUNCATE UP from 0x2250f to 0x32740 10(10 mod 256): READ 0xad21 thru 0xd152 (0x2432 bytes) 11(11 mod 256): WRITE 0x293c2 thru 0x34b07 (0xb746 bytes) EXTEND 12(12 mod 256): TRUNCATE DOWN from 0x34b08 to 0x1977a Correct content saved for comparison (maybe hexdump "test.file" vs "test.file.fsxgood")

pn1245359:plfs_n1 dbonnie$ hexdump test.file 0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

  • 0019770

pn1245359:plfs_n1 dbonnie$ hexdump /tmp//test.file.fsxgood 0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

  • 0010f10 00 00 00 0d 02 6a 02 6b 02 17 02 df 02 79 02 e6 0010f20 02 4b 02 5e 02 4e 02 13 02 f0 02 93 02 2e 02 cd 0010f30 02 2a 02 fd 02 3c 02 4d 02 50 02 f2 02 6c 02 04 0010f40 02 fc 02 53 02 ff 02 6f 02 e1 02 43 02 34 02 2f . . . SNIPPED FOR BREVITY . . . 0016af0 02 a0 02 f2 02 d7 02 82 02 71 02 21 02 31 02 91 0016b00 02 e9 02 9b 02 bd 02 04 02 a1 02 f5 02 fc 02 e2 0016b10 02 68 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0016b20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

  • 0019770

— Reply to this email directly or view it on GitHubhttps://github.com/plfs/plfs-core/issues/326#issuecomment-27886387. — Reply to this email directly or view it on GitHub.

dshrader commented 10 years ago

We actually already have a lot of tests that fail. The POSIX test suite that we benchmark against provides a lot of failures that we will never fix because PLFS is not 100% POSIX compliant and has no plans to be so. The issue of O_RDWR and O_TRUNC is known and reproducible in our existing tests. It seems that, so far, fsx has told us nothing new. We need to analyze the remaining tests that fsx does to find out if it does anything new.

To follow up on the source, O_RDWR is always used when O_TRUNC is used on an open call in the ltp_fsx.c source. So, using the -L command line parameter that David B. pointed out is a good idea unless we modify the source.

lionkov commented 10 years ago

I don't think the issue is only with O_TRUNC, it is with truncate. I removed O_TRUNC from fsx source and it still fails. The problem could be the same that makes plfs fail for O_TRUNC though.

On Wed, Nov 6, 2013 at 9:39 AM, David notifications@github.com wrote:

We actually already have a lot of tests that fail. The POSIX test suite that we benchmark against provides a lot of failures that we will never fix because PLFS is not 100% POSIX compliant and has no plans to be so. The issue of O_RDWR and O_TRUNC is known and reproducible in our existing tests. It seems that, so far, fsx has told us nothing new. We need to analyze the remaining tests that fsx does to find out if it does anything new.

To follow up on the source, O_RDWR is always used when O_TRUNC is used on an open call in the ltp_fsx.c source. So, using the -L command line parameter that David B. pointed out is a good idea unless we modify the source.

— Reply to this email directly or view it on GitHubhttps://github.com/plfs/plfs-core/issues/326#issuecomment-27889496 .

thewacokid commented 10 years ago

Exactly - truncate, used in combination with O_RDWR, is a known issue.

dshrader commented 10 years ago

It looks like every file that ltp_fsx opens except for the log file is opened with O_RDWR. We still should take a look at what else ltp_fsx does to make sure we have all the functionality somewhere. I don't know if we can include fsx directly in our testing suite due to licensing (I really hate licensing red tape), but it would still be good to make sure we test the same things.

lionkov commented 10 years ago

BTW, I think the right way to "fix" this is to make truncate fail in case of O_RDWR, instead of producing a file with incorrect content.

On Wed, Nov 6, 2013 at 9:43 AM, Latchesar Ionkov lucho@ionkov.net wrote:

I don't think the issue is only with O_TRUNC, it is with truncate. I removed O_TRUNC from fsx source and it still fails. The problem could be the same that makes plfs fail for O_TRUNC though.

On Wed, Nov 6, 2013 at 9:39 AM, David notifications@github.com wrote:

We actually already have a lot of tests that fail. The POSIX test suite that we benchmark against provides a lot of failures that we will never fix because PLFS is not 100% POSIX compliant and has no plans to be so. The issue of O_RDWR and O_TRUNC is known and reproducible in our existing tests. It seems that, so far, fsx has told us nothing new. We need to analyze the remaining tests that fsx does to find out if it does anything new.

To follow up on the source, O_RDWR is always used when O_TRUNC is used on an open call in the ltp_fsx.c source. So, using the -L command line parameter that David B. pointed out is a good idea unless we modify the source.

— Reply to this email directly or view it on GitHubhttps://github.com/plfs/plfs-core/issues/326#issuecomment-27889496 .