linux-test-project / ltp

Linux Test Project (mailing list: https://lists.linux.it/listinfo/ltp)
https://linux-test-project.readthedocs.io/
GNU General Public License v2.0
2.31k stars 1.01k forks source link

memcg_stress_test.sh: fix reserved mem calculate #1017

Closed GeHao01994 closed 1 year ago

GeHao01994 commented 1 year ago

When running this test case on a machine with large memory(like 200G) and without swap,existing reserved memory is 8*150 M that is too small for a machine with large memory,and will cause oom,so optimize reserved memory calculate to ten percent of free memory.

Signed-off-by: Hao Ge gehao@kylinos.cn

metan-ucw commented 1 year ago

And did the system actually survive the OOM? It's normal for memcg test to trigger OOM and as long as the test process that allocates the memory is killed it's expected behavior. Looking at the memory computation it does look sane even for swapless systems.

GeHao01994 commented 1 year ago

Hi mentan System will hung up and this test case will failed to end due to can't allocate memory, actually system is stable,we should adapt the case of all system with various memory sizes. So we should adapt according to machine memory size instead of fixing a certain number.

metan-ucw commented 1 year ago

Does the test really fail? As far as I can see the test passes as long as the main process is not killed. The child processes that do allocate memory can be killed by OOM and the test will pass without a problem, or at least I do not see any code that would fail the test if child is killed. What exactly does happen in your case?

I'm not against changing the heuristics that computes the memory that should be left for the kernel data structures, we did that for example in 3875aab599912b980dff6a57781b0f0386167ba1, however so far you haven't described what happens on your system, why exactly this needs to be done, and why did you choose such formula.

GeHao01994 commented 1 year ago

Does the test really fail? As far as I can see the test passes as long as the main process is not killed. The child processes that do allocate memory can be killed by OOM and the test will pass without a problem, or at least I do not see any code that would fail the test if child is killed. What exactly does happen in your case?

I'm not against changing the heuristics that computes the memory that should be left for the kernel data structures, we did that for example in 3875aab, however so far you haven't described what happens on your system, why exactly this needs to be done, and why did you choose such formula.

Hi metan The main process will exit because system have not enough memory to fork a process, and shell always report message like "can't not fork due to can't alloc memory"(I am at home now. I may not express it accurately, but the meaning should be similar)and can't do anything for user. For ten percent,OS with or without swap should be safe, just experience value, and it will not be much higher than min watermark for os.

metan-ucw commented 1 year ago

Right, this may actually happen, I keep telling people that writing cgroup stress tests in shell is wrong, because once the system gets under memory pressure forking a subprocess to run a command will fail. This wouldn't happen if the test was written in C.

GeHao01994 commented 1 year ago

Hi @metan-ucw Yes,this plan is correct. but the current situation,can you help to merge this patch? Maybe you have concerns about the 10%,but I think it's safely for systems that avoid can not fork a subprocess due to memory is exhausted. Thanks.

metan-ucw commented 1 year ago

@GeHao01994 given that this test works perfectly fine with systems with swap I would only change the formula for swapless systems. Also I'm not sure that blindly taking 10% is a good solution. Have you tried different percentages? Which was the percentage that the test started to fail?

GeHao01994 commented 1 year ago

Hi @metan-ucw Only change the formula for swapless systems maybe not enough,like the server I have,available memory is close to 250G,but swap space just only 3 G. In general, 260 GB of memory is sufficient,we don't need to waste too much disk to do the swap function. I have tried 5%.Unfortunately,still not enough. We need to reserve memory, whether the machine has swap or not(maybe it has swap space, but it's too small),So that all machines can adapt to this item. That's why I put forward this patch