plfs / plfs-core

LANL no longer develops PLFS. Feel free to fork and develop as you wish.
41 stars 36 forks source link

POSIX Suite truncate test can crash PLFS #82

Open brettkettering opened 12 years ago

brettkettering commented 12 years ago

/plfs_test/tools/pjd-fstest/tests/truncate/00.t: line 50: cd: /tmp/plfs: Not a directory /plfs_test/tools/pjd-fstest/tests/truncate/00.t .. Failed 3/21 subtests getconf: pathconf: .: Transport endpoint is not connected

This is a race condition and, as such, does not fail every time.

dshrader commented 10 years ago

Does this still happen with David B's thread-safe updates? eb708e2512dc5303dc5b3c3b5e7def5f6445cd36

atorrez commented 10 years ago

I have not run the POSIX suite since those updates. I will run it today.

Alfred

From: David notifications@github.com<mailto:notifications@github.com> Reply-To: plfs/plfs-core reply@reply.github.com<mailto:reply@reply.github.com> Date: Wed, 6 Nov 2013 09:08:34 -0800 To: plfs/plfs-core plfs-core@noreply.github.com<mailto:plfs-core@noreply.github.com> Subject: Re: [plfs-core] POSIX Suite truncate test can crash PLFS (#82)

Does this still happen with David B's thread-safe updates?

— Reply to this email directly or view it on GitHubhttps://github.com/plfs/plfs-core/issues/82#issuecomment-27892476.

thewacokid commented 10 years ago

I didn't test truncate so I doubt I fixed this issue - but it is possible I caught it when checking the various structures.

From: Alfred Torrez [mailto:notifications@github.com] Sent: Wednesday, November 06, 2013 10:18 AM To: plfs/plfs-core plfs-core@noreply.github.com Subject: Re: [plfs-core] POSIX Suite truncate test can crash PLFS (#82)

I have not run the POSIX suite since those updates. I will run it today.

Alfred

From: David notifications@github.com<mailto:notifications@github.com> Reply-To: plfs/plfs-core reply@reply.github.com<mailto:reply@reply.github.com> Date: Wed, 6 Nov 2013 09:08:34 -0800 To: plfs/plfs-core plfs-core@noreply.github.com<mailto:plfs-core@noreply.github.com> Subject: Re: [plfs-core] POSIX Suite truncate test can crash PLFS (#82)

Does this still happen with David B's thread-safe updates?

— Reply to this email directly or view it on GitHubhttps://github.com/plfs/plfs-core/issues/82#issuecomment-27892476.

— Reply to this email directly or view it on GitHubhttps://github.com/plfs/plfs-core/issues/82#issuecomment-27893407.

dshrader commented 10 years ago

I'm sure we'll still have problems with truncate, but I'm hoping we at least don't have the transport endpoint not connected failure anymore.

atorrez commented 10 years ago

Thinking about this particular issue…I do not recall ever seeing a transport endpoint problem when running the Posix truncate tests. This probably originated when EMC ran the the test. I will run it several times just to verify.

From: David notifications@github.com<mailto:notifications@github.com> Reply-To: plfs/plfs-core reply@reply.github.com<mailto:reply@reply.github.com> Date: Wed, 6 Nov 2013 09:25:59 -0800 To: plfs/plfs-core plfs-core@noreply.github.com<mailto:plfs-core@noreply.github.com> Cc: Alfred Torrez atorrez@lanl.gov<mailto:atorrez@lanl.gov> Subject: Re: [plfs-core] POSIX Suite truncate test can crash PLFS (#82)

I'm sure we'll still have problems with truncate, but I'm hoping we at least don't have the transport endpoint not connected failure anymore.

— Reply to this email directly or view it on GitHubhttps://github.com/plfs/plfs-core/issues/82#issuecomment-27894046.

atorrez commented 10 years ago

Now I remember. I was not running this particular test due to hangs (which turned out to be transport endpoint apparently). I will try it again and see what happens.

From: David notifications@github.com<mailto:notifications@github.com> Reply-To: plfs/plfs-core reply@reply.github.com<mailto:reply@reply.github.com> Date: Wed, 6 Nov 2013 09:25:59 -0800 To: plfs/plfs-core plfs-core@noreply.github.com<mailto:plfs-core@noreply.github.com> Cc: Alfred Torrez atorrez@lanl.gov<mailto:atorrez@lanl.gov> Subject: Re: [plfs-core] POSIX Suite truncate test can crash PLFS (#82)

I'm sure we'll still have problems with truncate, but I'm hoping we at least don't have the transport endpoint not connected failure anymore.

— Reply to this email directly or view it on GitHubhttps://github.com/plfs/plfs-core/issues/82#issuecomment-27894046.

atorrez commented 10 years ago

Still getting hangs on one particular test but I am not sure if transport endpoint problem or not. Will look into.

thewacokid commented 10 years ago

It probably is - I didn't specifically test for truncate and I wouldn't be surprised if it interacts poorly when running a multi-threaded FUSE mount.

johnbent commented 10 years ago

That's a good point. If we ever try to debug this, a good starting point would be to try running it with single-threaded fuse.

On Wed, Nov 6, 2013 at 2:59 PM, David Bonnie notifications@github.comwrote:

It probably is - I didn't specifically test for truncate and I wouldn't be surprised if it interacts poorly when running a multi-threaded FUSE mount.

— Reply to this email directly or view it on GitHubhttps://github.com/plfs/plfs-core/issues/82#issuecomment-27917851 .

atorrez commented 10 years ago

I found the source of my POSIX test suite truncate/00.t test hang and it was not related to plfs. It was a dd that relied on /dev/random. Apparently the entropy pool was empty so it never provided random bytes. This is a common issue with clusters because of no keyboard or mouse attachment. I changed the line to use /dev/urandom (lower quality randomness) and it worked.

With the latest master, I ran about 100 iterations of the POSIX test suite and did not see the transport endpoint problem documented in this issue. I am inclined to close this but let me know what you think.

thewacokid commented 10 years ago

If it passes (without hanging) for 100 iterations I would also be inclined to close the issue. Based on the changes I made it is likely that the test was crashing due to other actions within FUSE going on simultaneously.