Closed GeHao01994 closed 1 year ago
And did the system actually survive the OOM? It's normal for memcg test to trigger OOM and as long as the test process that allocates the memory is killed it's expected behavior. Looking at the memory computation it does look sane even for swapless systems.
Hi mentan System will hung up and this test case will failed to end due to can't allocate memory, actually system is stable,we should adapt the case of all system with various memory sizes. So we should adapt according to machine memory size instead of fixing a certain number.
Does the test really fail? As far as I can see the test passes as long as the main process is not killed. The child processes that do allocate memory can be killed by OOM and the test will pass without a problem, or at least I do not see any code that would fail the test if child is killed. What exactly does happen in your case?
I'm not against changing the heuristics that computes the memory that should be left for the kernel data structures, we did that for example in 3875aab599912b980dff6a57781b0f0386167ba1, however so far you haven't described what happens on your system, why exactly this needs to be done, and why did you choose such formula.
Does the test really fail? As far as I can see the test passes as long as the main process is not killed. The child processes that do allocate memory can be killed by OOM and the test will pass without a problem, or at least I do not see any code that would fail the test if child is killed. What exactly does happen in your case?
I'm not against changing the heuristics that computes the memory that should be left for the kernel data structures, we did that for example in 3875aab, however so far you haven't described what happens on your system, why exactly this needs to be done, and why did you choose such formula.
Hi metan The main process will exit because system have not enough memory to fork a process, and shell always report message like "can't not fork due to can't alloc memory"(I am at home now. I may not express it accurately, but the meaning should be similar)and can't do anything for user. For ten percent,OS with or without swap should be safe, just experience value, and it will not be much higher than min watermark for os.
Right, this may actually happen, I keep telling people that writing cgroup stress tests in shell is wrong, because once the system gets under memory pressure forking a subprocess to run a command will fail. This wouldn't happen if the test was written in C.
Hi @metan-ucw Yes,this plan is correct. but the current situation,can you help to merge this patch? Maybe you have concerns about the 10%,but I think it's safely for systems that avoid can not fork a subprocess due to memory is exhausted. Thanks.
@GeHao01994 given that this test works perfectly fine with systems with swap I would only change the formula for swapless systems. Also I'm not sure that blindly taking 10% is a good solution. Have you tried different percentages? Which was the percentage that the test started to fail?
Hi @metan-ucw Only change the formula for swapless systems maybe not enough,like the server I have,available memory is close to 250G,but swap space just only 3 G. In general, 260 GB of memory is sufficient,we don't need to waste too much disk to do the swap function. I have tried 5%.Unfortunately,still not enough. We need to reserve memory, whether the machine has swap or not(maybe it has swap space, but it's too small),So that all machines can adapt to this item. That's why I put forward this patch
When running this test case on a machine with large memory(like 200G) and without swap,existing reserved memory is 8*150 M that is too small for a machine with large memory,and will cause oom,so optimize reserved memory calculate to ten percent of free memory.
Signed-off-by: Hao Ge gehao@kylinos.cn