linux-test-project / ltp

Linux Test Project (mailing list: https://lists.linux.it/listinfo/ltp)
https://linux-test-project.readthedocs.io/
GNU General Public License v2.0
2.28k stars 999 forks source link

About the diotest4 failure on Android linux kernel 5.15 #998

Open Bad916 opened 1 year ago

Bad916 commented 1 year ago

Hello,I ported ltp to Android for testing. The diotest4 test on the Linux kernel 5.15 device failed. The file system was corrected when the test was run to test3. The correct values were passed to the next layer. The following are logs. How can I solve this problem? The same problem does not occur in other linux kernel versions.


Testcase Result Exit Value


dio01 PASS 0
dio02 PASS 0
dio03 PASS 0
dio04 FAIL 33
dio05 PASS 0
dio06 PASS 0
dio07 PASS 0
dio08 PASS 0
dio09 PASS 0
dio10 FAIL 33
dio11 PASS 0
dio12 PASS 0
dio13 PASS 0
dio14 PASS 0
dio15 PASS 0
dio16 PASS 0
dio17 PASS 0
dio18 PASS 0
dio19 PASS 0
dio20 PASS 0
dio21 PASS 0
dio22 PASS 0
dio23 PASS 0
dio24 PASS 0
dio25 PASS 0
dio26 PASS 0
dio27 PASS 0
dio28 PASS 0
dio29 PASS 0
dio30 PASS 0


Total Tests: 30 Total Skipped Tests: 0 Total Failures: 2 Kernel Version: 5.15.41-android13-8-o-g9e85b4b43c76 Machine Architecture: aarch64 Hostname: localhost

<<>> tag=dio04 stime=1640966855 cmdline="diotest4" contacts="" analysis=exit <<>> diotest4 1 TPASS : Negative Offset diotest4 2 TPASS : removed diotest4 3 TFAIL : diotest4.c:114: read allows odd count. returns 1: Success diotest4 4 TFAIL : diotest4.c:129: write allows odd count.returns 1: Success diotest4 5 TFAIL : diotest4.c:180: Odd count of read and write diotest4 6 TPASS : Read beyond the file size diotest4 7 TPASS : Invalid file descriptor diotest4 8 TPASS : Out of range file descriptor diotest4 9 TPASS : Closed file descriptor diotest4 10 TPASS : removed diotest4 11 TCONF : diotest4.c:346: Direct I/O on /dev/null is not supported diotest4 12 TPASS : read, write to a mmaped file diotest4 13 TPASS : read, write to an unmapped file diotest4 14 TPASS : read from file not open for reading diotest4 15 TPASS : write to file not open for writing diotest4 16 TFAIL : diotest4.c:114: read allows nonaligned buf. returns 4096: Success diotest4 17 TFAIL : diotest4.c:129: write allows nonaligned buf.returns 4096: Success diotest4 18 TFAIL : diotest4.c:180: read, write with non-aligned buffer diotest4 19 TPASS : read, write buffer in read-only space diotest4 20 TPASS : read, write in non-existant space diotest4 21 TPASS : read, write for file with O_SYNC diotest4 0 TINFO : 2/15 test blocks failed <<>> initiation_status="ok" duration=0 termination_type=exited termination_id=33 corefile=no cutime=1 cstime=0 <<>> <<>> tag=dio10 stime=1640966859 cmdline="diotest4 -b 65536" contacts="" analysis=exit <<>> diotest4 1 TPASS : Negative Offset diotest4 2 TPASS : removed diotest4 3 TFAIL : diotest4.c:114: read allows odd count. returns 1: Success diotest4 4 TFAIL : diotest4.c:129: write allows odd count.returns 1: Success diotest4 5 TFAIL : diotest4.c:180: Odd count of read and write diotest4 6 TPASS : Read beyond the file size diotest4 7 TPASS : Invalid file descriptor diotest4 8 TPASS : Out of range file descriptor diotest4 9 TPASS : Closed file descriptor diotest4 10 TPASS : removed diotest4 11 TCONF : diotest4.c:346: Direct I/O on /dev/null is not supported diotest4 12 TPASS : read, write to a mmaped file diotest4 13 TPASS : read, write to an unmapped file diotest4 14 TPASS : read from file not open for reading diotest4 15 TPASS : write to file not open for writing diotest4 16 TFAIL : diotest4.c:114: read allows nonaligned buf. returns 4096: Success diotest4 17 TFAIL : diotest4.c:129: write allows nonaligned buf.returns 4096: Success diotest4 18 TFAIL : diotest4.c:180: read, write with non-aligned buffer diotest4 19 TPASS : read, write buffer in read-only space diotest4 20 TPASS : read, write in non-existant space diotest4 21 TPASS : read, write for file with O_SYNC diotest4 0 TINFO : 2/15 test blocks failed <<>> initiation_status="ok" duration=20 termination_type=exited termination_id=33 corefile=no cutime=623 cstime=937 <<>> The following are related traces . ltpftrace.txt

metan-ucw commented 1 year ago

Which filesystem does the test run on? There are certain filesystems that fall back to regular I/O when O_DIRECT is not supported and of course the non-aligned cases does work fine with regular I/O, we already skip NFS, BTRFS and FUSE for the tests that are failing here, so perhaps we just need to add another special case there.

Bad916 commented 1 year ago

测试在哪个文件系统上运行?当不支持 O_DIRECT 时,某些文件系统会回退到常规 I/O,当然非对齐的情况在常规 I/O 下也能正常工作,我们已经跳过 NFS、BTRFS 和 FUSE 以进行此处失败的测试,所以也许我们只需要在那里添加另一个特例。

F2FS

metan-ucw commented 1 year ago

Looking at F2FS source code it indeed falls back to buffered I/O in cases that buffer or length is not aligned, see:

https://github.com/torvalds/linux/blob/master/fs/f2fs/file.c#L4242

So we indeed need to skip F2FS in these few cases. Will you send a patch?

metan-ucw commented 1 year ago

And looks like I've spoken too soon. That code only fall backs if request is aligned to logical block size and not filesystem block size, the diotest4 for test 3 we pass buffer size = 1, which is not aligned by any means and the same for the test 14 where we pass buffer address that is one off the aligned address. So I have no idea why this does not fail.

Bad916 commented 1 year ago

And looks like I've spoken too soon. That code only fall backs if request is aligned to logical block size and not filesystem block size, the diotest4 for test 3 we pass buffer size = 1, which is not aligned by any means and the same for the test 14 where we pass buffer address that is one off the aligned address. So I have no idea why this does not fail.

Is it possible that it is caused by the Linux kernel 5.15? How should we solve this problem?

Bad916 commented 1 year ago

And looks like I've spoken too soon. That code only fall backs if request is aligned to logical block size and not filesystem block size, the diotest4 for test 3 we pass buffer size = 1, which is not aligned by any means and the same for the test 14 where we pass buffer address that is one off the aligned address. So I have no idea why this does not fail.

We identified the problem maybe from Google. This link should be the submission of Google. Can u please help us debug it?https://android.googlesource.com/kernel/common/+/d5ab9ac09f96c55bedba481cd2ff3f6416b72eb0%5E%21/#F2

metan-ucw commented 1 year ago

Not sure what is left to debug, that is an upstream kernel commit that has changed the behavior. What is left is to ask kernel devs if that was an intentional change or not and if they care enough to restore the previous behavior. I guess that the likely outcome is that they do not and that we will end up patching the test here.

@ebiggers can you please have a look?

pevik commented 1 year ago

@patils FYI

ebiggers commented 1 year ago

On the latest upstream kernel (v6.1-rc8), I can reproduce this if I run TMPDIR=$PWD diotest4 in an encrypted directory on either ext4 or f2fs. This is because direct I/O always falls back to buffered I/O on encrypted files, which is a valid behavior of O_DIRECT, but the test assumes EINVAL on misaligned direct I/O.

I cannot reproduce this on either ext4 or f2fs in an unencrypted directory.

diotest4 could be updated to skip the EINVAL tests on encrypted files. It does raise the question of why the test is enforcing just one of the two allowed behaviors of misaligned O_DIRECT; that's going to continuously cause failures that need to be "fixed". But anyway, it could do that.

I don't understand how the kernel commit f2fs: use iomap for direct I/O is relevant. @Bad916, did you verify that that commit actually caused a regression? It was not intended to change user visible behavior. Is it possible that you just saw that that commit changed some of the relevant code, so you assumed that it caused the issue? I think it's more likely that you started running the test in an encrypted directory, whereas previously you were running it in an unencrypted directory.

Bad916 commented 1 year ago

On the latest upstream kernel (v6.1-rc8), I can reproduce this if I run TMPDIR=$PWD diotest4 in an encrypted directory on either ext4 or f2fs. This is because direct I/O always falls back to buffered I/O on encrypted files, which is a valid behavior of O_DIRECT, but the test assumes EINVAL on misaligned direct I/O.

I cannot reproduce this on either ext4 or f2fs in an unencrypted directory.

diotest4 could be updated to skip the EINVAL tests on encrypted files. It does raise the question of why the test is enforcing just one of the two allowed behaviors of misaligned O_DIRECT; that's going to continuously cause failures that need to be "fixed". But anyway, it could do that.

I don't understand how the kernel commit f2fs: use iomap for direct I/O is relevant. @Bad916, did you verify that that commit actually caused a regression? It was not intended to change user visible behavior. Is it possible that you just saw that that commit changed some of the relevant code, so you assumed that it caused the issue? I think it's more likely that you started running the test in an encrypted directory, whereas previously you were running it in an unencrypted directory.

Sorry,I didn't verify that that commit actually caused a regression.I only conducted this test on Android phones with linux kernel 5.15 to find this problem. However, this problem was not found before the Android with linux kernel 5.10 or lower kernel version.

metan-ucw commented 1 year ago

@ebiggers I guess that we can fix diotest4 to do something more sensible, e.g. pass unaligned buffer/offset and expect to get a sane result, either EINVAL or correct data in buffer. I suppose that we do not care that much if the operation falls back to buffered I/O or not that much, but rather than that that we either get error or correct result.

yep000 commented 1 year ago

static bool f2fs_should_use_dio(struct inode inode, struct kiocb iocb, struct iov_iter *iter) { unsigned int align;

if (!(iocb->ki_flags & IOCB_DIRECT))
    return false;

if (f2fs_force_buffered_io(inode, iocb, iter))
    return false;

/*
 * Direct I/O not aligned to the disk's logical_block_size will be
 * attempted, but will fail with -EINVAL.
 *
 * f2fs additionally requires that direct I/O be aligned to the
 * filesystem block size, which is often a stricter requirement.
 * However, f2fs traditionally falls back to buffered I/O on requests
 * that are logical_block_size-aligned but not fs-block aligned.
 *
 * The below logic implements this behavior.
 */
align = iocb->ki_pos | iov_iter_alignment(iter);
if (!IS_ALIGNED(align, i_blocksize(inode)) &&
    IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev)))
    return false;

+trace_printk("f2fs_should_use_dio: %s:%d dump out the related value xxxx", current->comm, current->pid); return true; }

This trace is printed out。 it mean the judgment result decide to use dio 。@ebiggers

yep000 commented 1 year ago

forgot to post the log:
diotest4-9146 [007] ..... 29.825001: f2fs_should_use_dio: f2fs_should_use_dio: diotest4:9146 dump out the related value xxxx diotest4-9146 [007] ..... 29.825013: f2fs_direct_IO_enter: dev = (254,52), ino = 32489 pos = -508714802176 len = 4096 ki_flags = 0 ki_hint = 0 ki_ioprio = 0 rw = 1

ebiggers commented 1 year ago

I verified that f2fs: use iomap for direct I/O did actually make a difference here, since it made the fallback to buffered I/O on encrypted files happen before the request is failed with EINVAL due to not being logical_block_size aligned. Previously these happened in the opposite order.

I don't think this is really interesting, though, since the new order is the same as what ext4 has always done, and it is a valid implementation of O_DIRECT.

So I think the test needs to be updated to not require errors on misaligned direct I/O.

yep000 commented 1 year ago

I verified that f2fs: use iomap for direct I/O did actually make a difference here, since it made the fallback to buffered I/O on encrypted files happen before the request is failed with EINVAL due to not being logical_block_size aligned. Previously these happened in the opposite order.

I don't think this is really interesting, though, since the new order is the same as what ext4 has always done, and it is a valid implementation of O_DIRECT.

So I think the test needs to be updated to not require errors on misaligned direct I/O.

@metan-ucw Did you think so ?This version has no other read and write problems. So I agree

metan-ucw commented 1 year ago

@yep000 I already outlined how I think the test should be fixed in https://github.com/linux-test-project/ltp/issues/998#issuecomment-1338956236

Bad916 commented 1 year ago

@yep000 I already outlined how I think the test should be fixed in #998 (comment) Thanks,and when will diotest4 be revised?

metan-ucw commented 1 year ago

LTP is a community project, the easiest way how to get things fixed fast is to send a patch. Otherwise someone may pick this up later on, however the backlog for LTP is huge, so it likely won't be fast.