ncabatoff / process-exporter

Prometheus exporter that mines /proc to report on selected processes
MIT License
1.72k stars 270 forks source link

read /proc/pid/smaps_rollup will suspend the target process #246

Open kirbyzhou opened 2 years ago

kirbyzhou commented 2 years ago

When process-exporter read /proc/pid/smaps_rollup, kernel will suspend the target process. The pause time depends on the amount of memory occupied by the process.

I checked on Centos-7.9 with kernel-3.10.0-1160.76.1.el7.x86_64 and orlace's uek6 kernel-uek-5.4.17-2136.312.3.4.el7uek.x86_64。

Let the process-exporter run with a process that takes up 200G of memory. This process is suspended for about 4 seconds, every 30 seconds-that is, the cycle of process-exporter reading /proc.

This file reading operation will make kernel spend about 4 seconds scanning the memory structure of the whole target process, at which time the target process will be locked and suspended, especially when the target process calls memory-related syscall such as mmap.

] time cat /proc/37315/smaps_rollup ... Rss: 210410600 kB ... real 0m4.475s user 0m0.000s sys 0m4.456s

Based on this calculation, the process over 4G will be suspended for 100ms, which is unacceptable. So please stop gathering any thing from /proc/pid/smaps_rollup. At least check /proc/pid/status first, read smaps_rollup only if the memory of target process <= 4G.

BTW: Reading /proc/pid/smaps and /proc/pid/numamap will also suspend the target process.

There is a flag "gather-smaps" means "gather metrics from smaps file, which contains proportional resident memory size". Default is true, maybe will set it default to false Or give more warnings and explanations.

acelyc111 commented 2 years ago

Would it better to add an option to decide whether to gather such info or not?

flixr commented 2 years ago

This sounds quite bad and I was not aware that reading that would suspend the process momentarily. @kirbyzhou Do you have any more info on this? I didn't find any mention of this in the kernel docs or the original patch

kirbyzhou commented 2 years ago

The smaps_rollup patch is meaningless to this problem. The hang problem is caused by the old code that supports /proc/pid/smaps. Kernel docs told us: "To see a precise snapshot of a moment, you can see /proc//smaps file and scan page table. It's slow but very precise." I think the reason is the code of smaps need make a snapshot of pagemap.

The timecost of read syscall seems non-linear.

You can easily reproduce the issue by the following code:

== holdmem.py ==

#!/usr/bin/env python3
import time, sys
count = int(sys.argv[1])
big = [ list(range((1<<20)+i)) for i in range(count) ]
print("memory holded")
old=time.time()
i = 0
while True:
    now = time.time()
    diff = now - old
    old = now
    if diff >= 0.1:
            print(f"I have been paused for {diff} seconds")
    i = (i+1)%len(big)
    # the following line is important!!
    big[i] = list(range((1<<20)+int(now)%123217))
    time.sleep(0.05)

Then, run the following:

#] python3 holdmem.py 1000
memory holded

Wait for seconds, it will eat about 32G ram and output "memory holded"

Then open another terminal, run

#] time cat /proc/$(pgrep -f holdmem.py)/smaps_rollup 
...
real    0m0.472s
user    0m0.007s
sys 0m0.451s

The first terminal will output some message like:

I have been paused for 0.43660974502563477 seconds
kirbyzhou commented 1 year ago

Any good news?

skeetmtp commented 5 months ago

We got 4-6s freeze every 15s on our mongodb instance By adding -gather-smaps=false to process-exporter command line, we were able to workaround the issue

Thanks for the OP we were able to find the reason of the freeze