linux-test-project / ltp

Linux Test Project (mailing list: https://lists.linux.it/listinfo/ltp)
https://linux-test-project.readthedocs.io/
GNU General Public License v2.0
2.31k stars 1.01k forks source link

memcontrol04 failures on RHEL9 (s390x - LPAR & z/VM) #1117

Closed mpw5421 closed 8 months ago

mpw5421 commented 8 months ago

I opened a bugzilla against RHEL9 for the failures shown below in the memcontrol04 LTP testcase. The RedHat team indicated it appears to be an issue with the memcontrol04 testcase code.

This particular failure is occurring on s390x under LPAR and z/VM environments.

This was their response:

"I have looked at the latest memcontrol04.c file. I believe the problem here is because we have now enabled the memory_recursiveprot option when mounting the cgroup filesystem in RHEL9.

cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate,memory_recursiveprot)

With this option on, even if memory.low=0, low memory protection will still be granted if memory.low of its parent is non-zero. This will cause low events to be triggered. The test_memcontrol self test has been modified to check for the presence of the memory_recursiveprot flag and choose a different cause of action accordingly. So I believe the memcontrol04 ltp test should also be modified to do similar thing.

There is some discussion upstream as to whether a child with memory.low=0 should record low event or not. However, there is no firm conclusion and the current behavior is that low events will still be recorded in this case. So this is more a problem in the test itself than the kernel."

I was wondering if someone could look into updating the memcontrol04 testcase to fix this issue. Thanks in advance.

Testcase output showing failiures:

/ltp/testcases/kernel/controllers/memcg# ./memcontrol04 tst_device.c:93: TINFO: Found free device 0 '/dev/loop0' tst_test.c:1558: TINFO: Timeout per run is 0h 00m 30s tst_supported_fs_types.c:90: TINFO: Kernel supports ext2 tst_supported_fs_types.c:55: TINFO: mkfs.ext2 does exist tst_supported_fs_types.c:90: TINFO: Kernel supports ext3 tst_supported_fs_types.c:55: TINFO: mkfs.ext3 does exist tst_supported_fs_types.c:90: TINFO: Kernel supports ext4 tst_supported_fs_types.c:55: TINFO: mkfs.ext4 does exist tst_supported_fs_types.c:90: TINFO: Kernel supports xfs tst_supported_fs_types.c:55: TINFO: mkfs.xfs does exist tst_supported_fs_types.c:116: TINFO: Filesystem btrfs is not supported tst_supported_fs_types.c:157: TINFO: Skipping vfat as requested by the test tst_supported_fs_types.c:157: TINFO: Skipping exfat as requested by the test tst_supported_fs_types.c:157: TINFO: Skipping ntfs as requested by the test tst_supported_fs_types.c:157: TINFO: Skipping tmpfs as requested by the test tst_test.c:1634: TINFO: === Testing on ext2 === tst_test.c:1093: TINFO: Formatting /dev/loop0 with ext2 opts='' extra opts='' mke2fs 1.46.5 (30-Dec-2021) memcontrol04.c:118: TINFO: Child 2006 in leaf_C: Allocating pagecache: 52428800 memcontrol04.c:118: TINFO: Child 2007 in leaf_D: Allocating pagecache: 52428800 memcontrol04.c:118: TINFO: Child 2008 in leaf_F: Allocating pagecache: 52428800 memcontrol04.c:99: TINFO: Child 2009 in trunk_G: Allocating anon: 155189248 memcontrol04.c:170: TPASS: Expect: (A/B memory.current=54149120) ~= 52428800 memcontrol04.c:176: TPASS: Expect: (A/B/C memory.current=30810112) ~= 34603008 memcontrol04.c:178: TPASS: Expect: (A/B/D memory.current=22183936) ~= 17825792 memcontrol04.c:180: TPASS: Expect: (A/B/E memory.current=0) ~= 0 memcontrol04.c:182: TINFO: A/B/F memory.current=716800 memcontrol04.c:99: TINFO: Child 2010 in trunk_G: Allocating anon: 174063616 memcontrol04.c:195: TINFO: A: low events=1523, oom events=0 memcontrol04.c:195: TINFO: B: low events=1523, oom events=0 memcontrol04.c:195: TINFO: G: low events=0, oom events=0 memcontrol04.c:208: TPASS: Expect: (C oom events=0) == 0 memcontrol04.c:211: TPASS: Expect: (C low events=388) > 0 memcontrol04.c:208: TPASS: Expect: (D oom events=0) == 0 memcontrol04.c:211: TPASS: Expect: (D low events=388) > 0 memcontrol04.c:208: TPASS: Expect: (E oom events=0) == 0 memcontrol04.c:214: TPASS: Expect: (E low events=0) == 0 memcontrol04.c:208: TPASS: Expect: (F oom events=0) == 0 memcontrol04.c:214: TFAIL: Expect: (F low events=374) == 0 tst_test.c:1634: TINFO: === Testing on ext3 === tst_test.c:1093: TINFO: Formatting /dev/loop0 with ext3 opts='' extra opts='' mke2fs 1.46.5 (30-Dec-2021) memcontrol04.c:118: TINFO: Child 2018 in leaf_C: Allocating pagecache: 52428800 memcontrol04.c:118: TINFO: Child 2019 in leaf_D: Allocating pagecache: 52428800 memcontrol04.c:118: TINFO: Child 2020 in leaf_F: Allocating pagecache: 52428800 memcontrol04.c:99: TINFO: Child 2021 in trunk_G: Allocating anon: 155189248 memcontrol04.c:170: TPASS: Expect: (A/B memory.current=54140928) ~= 52428800 memcontrol04.c:176: TPASS: Expect: (A/B/C memory.current=30801920) ~= 34603008 memcontrol04.c:178: TPASS: Expect: (A/B/D memory.current=22188032) ~= 17825792 memcontrol04.c:180: TPASS: Expect: (A/B/E memory.current=0) ~= 0 memcontrol04.c:182: TINFO: A/B/F memory.current=712704 memcontrol04.c:99: TINFO: Child 2022 in trunk_G: Allocating anon: 174063616 memcontrol04.c:195: TINFO: A: low events=1501, oom events=0 memcontrol04.c:195: TINFO: B: low events=1501, oom events=0 memcontrol04.c:195: TINFO: G: low events=0, oom events=0 memcontrol04.c:208: TPASS: Expect: (C oom events=0) == 0 memcontrol04.c:211: TPASS: Expect: (C low events=385) > 0 memcontrol04.c:208: TPASS: Expect: (D oom events=0) == 0 memcontrol04.c:211: TPASS: Expect: (D low events=385) > 0 memcontrol04.c:208: TPASS: Expect: (E oom events=0) == 0 memcontrol04.c:214: TPASS: Expect: (E low events=0) == 0 memcontrol04.c:208: TPASS: Expect: (F oom events=0) == 0 memcontrol04.c:214: TFAIL: Expect: (F low events=366) == 0 tst_test.c:1634: TINFO: === Testing on ext4 === tst_test.c:1093: TINFO: Formatting /dev/loop0 with ext4 opts='' extra opts='' mke2fs 1.46.5 (30-Dec-2021) memcontrol04.c:118: TINFO: Child 2027 in leaf_C: Allocating pagecache: 52428800 memcontrol04.c:118: TINFO: Child 2028 in leaf_D: Allocating pagecache: 52428800 memcontrol04.c:118: TINFO: Child 2029 in leaf_F: Allocating pagecache: 52428800 memcontrol04.c:99: TINFO: Child 2030 in trunk_G: Allocating anon: 155189248 memcontrol04.c:170: TPASS: Expect: (A/B memory.current=54075392) ~= 52428800 memcontrol04.c:176: TPASS: Expect: (A/B/C memory.current=30818304) ~= 34603008 memcontrol04.c:178: TPASS: Expect: (A/B/D memory.current=22265856) ~= 17825792 memcontrol04.c:180: TPASS: Expect: (A/B/E memory.current=0) ~= 0 memcontrol04.c:182: TINFO: A/B/F memory.current=552960 memcontrol04.c:99: TINFO: Child 2031 in trunk_G: Allocating anon: 174063616 memcontrol04.c:195: TINFO: A: low events=1525, oom events=0 memcontrol04.c:195: TINFO: B: low events=1525, oom events=0 memcontrol04.c:195: TINFO: G: low events=0, oom events=0 memcontrol04.c:208: TPASS: Expect: (C oom events=0) == 0 memcontrol04.c:211: TPASS: Expect: (C low events=388) > 0 memcontrol04.c:208: TPASS: Expect: (D oom events=0) == 0 memcontrol04.c:211: TPASS: Expect: (D low events=388) > 0 memcontrol04.c:208: TPASS: Expect: (E oom events=0) == 0 memcontrol04.c:214: TPASS: Expect: (E low events=0) == 0 memcontrol04.c:208: TPASS: Expect: (F oom events=0) == 0 memcontrol04.c:214: TFAIL: Expect: (F low events=375) == 0 tst_test.c:1634: TINFO: === Testing on xfs === tst_test.c:1093: TINFO: Formatting /dev/loop0 with xfs opts='' extra opts='' memcontrol04.c:118: TINFO: Child 2043 in leaf_C: Allocating pagecache: 52428800 memcontrol04.c:118: TINFO: Child 2044 in leaf_D: Allocating pagecache: 52428800 memcontrol04.c:118: TINFO: Child 2045 in leaf_F: Allocating pagecache: 52428800 memcontrol04.c:99: TINFO: Child 2046 in trunk_G: Allocating anon: 155189248 memcontrol04.c:170: TPASS: Expect: (A/B memory.current=54296576) ~= 52428800 memcontrol04.c:176: TPASS: Expect: (A/B/C memory.current=30453760) ~= 34603008 memcontrol04.c:178: TPASS: Expect: (A/B/D memory.current=22556672) ~= 17825792 memcontrol04.c:180: TPASS: Expect: (A/B/E memory.current=0) ~= 0 memcontrol04.c:182: TINFO: A/B/F memory.current=847872 memcontrol04.c:99: TINFO: Child 2047 in trunk_G: Allocating anon: 174063616 memcontrol04.c:195: TINFO: A: low events=1697, oom events=0 memcontrol04.c:195: TINFO: B: low events=1697, oom events=0 memcontrol04.c:195: TINFO: G: low events=0, oom events=0 memcontrol04.c:208: TPASS: Expect: (C oom events=0) == 0 memcontrol04.c:211: TPASS: Expect: (C low events=432) > 0 memcontrol04.c:208: TPASS: Expect: (D oom events=0) == 0 memcontrol04.c:211: TPASS: Expect: (D low events=432) > 0 memcontrol04.c:208: TPASS: Expect: (E oom events=0) == 0 memcontrol04.c:214: TPASS: Expect: (E low events=0) == 0 memcontrol04.c:208: TPASS: Expect: (F oom events=0) == 0 memcontrol04.c:214: TFAIL: Expect: (F low events=417) == 0

coolgw commented 8 months ago

https://lore.kernel.org/linux-mm/20220510174341.GC24172@blackbody.suse.cz/T/#m4366c0c20b6e5748a871929753e99b2f3f7b2f6c

mpw5421 commented 8 months ago

https://lore.kernel.org/linux-mm/20220510174341.GC24172@blackbody.suse.cz/T/#m4366c0c20b6e5748a871929753e99b2f3f7b2f6c

Is there something specific you wanted me to look at on that link? It's quite a long thread and I'm not sure which part applies to the issue I raised.

coolgw commented 8 months ago

https://lore.kernel.org/linux-mm/20220510174341.GC24172@blackbody.suse.cz/T/#m4366c0c20b6e5748a871929753e99b2f3f7b2f6c

Is there something specific you wanted me to look at on that link? It's quite a long thread and I'm not sure which part applies to the issue I raised.

If you check the above link you will see following comments in mail thread, there is an issue exist in current code and trigger the low events > 0 which lead LTP case failed. I have checked current kernel(git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git) self test code, the kernel test code's logic is as same as current LTP implementation. You can check following patch for detail info. https://patchwork.kernel.org/project/linux-mm/patch/20220524162955.8635-3-mkoutny@suse.com/

On Mon, May 09, 2022 at 05:44:24PM -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
> So I think we're OK with [2/5] now.  Unless there be objections, I'll
> be looking to get this series into mm-stable later this week.

I'm sorry, I think the current form of the test reveals an unexpected
behavior of reclaim and silencing the test is not the way to go.
Although, I may be convinced that my understanding is wrong.

On Mon, May 09, 2022 at 11:09:15AM -0400, Johannes Weiner <hannes@cmpxchg.org> wrote:
> My understanding of the issue you're raising, Michal, is that
> protected siblings start with current > low, then get reclaimed
> slightly too much and end up with current < low. This results in a
> tiny bit of float that then gets assigned to the low=0 sibling; 

Up until here, we're on the same page.

> when that sibling gets reclaimed regardless, it sees a low event.
> Correct me if I missed a detail or nuance here.

Here, I'd like to stress that the event itself is just a messenger (whom
my original RFC patch attempted to get rid of). The problem is that if
the sibling with recursive protection is active enough to claim it, it's
effectively stolen from the passive sibling. See the comparison of
'precious' vs 'victim' in [1].
mpw5421 commented 8 months ago

Thanks for providing the relevant comments from the link. I suppose this issue can be closed then based on your feedback.