ncabatoff / process-exporter

Prometheus exporter that mines /proc to report on selected processes
MIT License
1.67k stars 265 forks source link

Can we get the top 10 process in the machine which is having high cpu usage #273

Open alagusivakumar opened 1 year ago

alagusivakumar commented 1 year ago

I have a usecase such that the exporter have to find out the top 10 process which is having high usage . i.e) exporter have to filter our the process with high cpu usage among all the process running in the vm

akshat5302 commented 1 year ago

@ncabatoff I have the same use case and if I want to use the config file of the process exporter to do so, is it possible?

deajan commented 1 year ago

Have a look at my solution perhaps. No need to register process groups or anything else. Solution is a bit low tech, but has a very tiny footprint, and works great for me: https://github.com/deajan/node-exporter-textfile-collector-scripts/blob/instant_cpu_usage/instant_per_process_cpu_mem_usage.sh

I can provide the corresponding grafana dashboard: image

akshat5302 commented 1 year ago

@deajan how can we use or integrate with the process or any other kind of exporter so that it gives metrics in /metrics, cause as of now it's only a script file

deajan commented 1 year ago

@akshat5302 I'm using this script with node_exporter's text_collector plugin.

  1. Create /var/lib/node_exporter/textfile_collector dir
  2. Setup node_exportor with --collector.textfile.directory=/var/lib/node_exporter/textfile_collector
  3. Create following entry in crontab
    * * * * * root           /usr/bin/bash /opt/instant_per_process_cpu_mem_usage.sh > /var/lib/node_exporter/textfile_collector/top_process.prom
    * * * * * root sleep 15; /usr/bin/bash /opt/instant_per_process_cpu_mem_usage.sh > /var/lib/node_exporter/textfile_collector/top_process.prom
    * * * * * root sleep 30; /usr/bin/bash /opt/instant_per_process_cpu_mem_usage.sh > /var/lib/node_exporter/textfile_collector/top_process.prom
    * * * * * root sleep 45; /usr/bin/bash /opt/instant_per_process_cpu_mem_usage.sh > /var/lib/node_exporter/textfile_collector/top_process.prom

The latter is a bit ugly hack, but will (quickly) allow to run the script once every 15 seconds. If you need better resolution, you could wrap the script in an while true; do <run_script>; sleep 1; done loop.

akshat5302 commented 1 year ago

Thanks @deajan and any workaround how can we get top 10 read and write bytes processes as well using that script

deajan commented 1 year ago

You could use quite the same awk magic with iotop (you'll need to install it):

iotop -kbn 1 | cut -d"'" -f2 | awk '{
        if ((NR<4) || ($4=="0.00" && $6=="0.00")) { next };
        # Remove (buggy iotop) self process
        if ($1=="-f2\"") { next };
        if (readtype==0) { printf "# TYPE top_process_io_disk_read gauge\n# HELP top_process_io_disk_read iotop gathered disk read stats\n"; readtype=1 };
        printf "top_process_io_disk_read{pid=\""$1"\",process=\""$9"\"} " ($4 * 1000) "\n";
        if (writetype==0) { printf "# TYPE top_process_io_disk_write gauge\n# HELP top_process_io_disk_write iotop gathered disk read stats\n"; writetype=1 };
        print "top_process_io_disk_write{pid=\""$1"\",process=\""$9"\"} " ($6 * 1000) "\n";
}'
deajan commented 1 year ago

Here's a dashboard example that comes with it

cpu_mem_io_usage.zip

deajan commented 1 year ago

Yet a better iotop parse implementation (still a big oneliner):

iotop -kbn 1| awk '{
        # Skip headers
        if (NR<4) { next };
        # Remove pythons binary prefix issued by iotop. \047 is octal byte 39 (the singlequote)
        sub(/^b\047/,"");
        # Get all command arguments
        args=""; for(i = 10; i<= NF; i++) if ($i!="") {args=args" "$i};
        # Sanitize arguments
        gsub("{|}|\"", "", args);
        # Dont keep more than 30 chars for args, since we limited top -w size, we wont need this
        args=substr(args, 1, 30);
        if ($4!="0.00") {
                if (readtype==0) { printf "# TYPE top_process_io_disk_read gauge\n# HELP top_process_io_disk_read iotop gathered disk read stats\n"; readtype=1 };
                printf "top_process_io_disk_read{pid=\""$1"\",process=\""$9"\",sanitized_args=\""args"\"} " ($4 * 1000) "\n";
        }
        if ($6!="0.00") {
                if (writetype==0) { printf "# TYPE top_process_io_disk_write gauge\n# HELP top_process_io_disk_write iotop gathered disk read stats\n"; writetype=1 };
                printf "top_process_io_disk_write{pid=\""$1"\",process=\""$9"\",sanitized_args=\""args"\"} " ($6 * 1000) "\n";
        }
}'
akshat5302 commented 1 year ago

Thanks @deajan for sharing your insights will look into it 👍

deajan commented 1 year ago

Keep in mind that is is just a low-tech solution for quick debugging. But feel free to leave some feedback.

ruey-cheng commented 10 months ago

Thanks @deajan for your script, it works great! but I encountered some issues when running iotop parse. Can you assist with that?

TYPE top_process_io_disk_read gauge

HELP top_process_io_disk_read iotop gathered disk read stats

top_process_io_disk_read{pid="#",process="",sanitized_args=""} 0

TYPE top_process_io_disk_write gauge

HELP top_process_io_disk_write iotop gathered disk read stats

top_process_io_disk_write{pid="#",process="",sanitized_args=""} 0 top_process_io_disk_read{pid="if",process="",sanitized_args=""} 0 top_process_io_disk_write{pid="if",process="",sanitized_args=""} 0 top_process_io_disk_read{pid="#",process="byte",sanitized_args=" 39 (the singlequote)"} 0 top_process_io_disk_write{pid="#",process="byte",sanitized_args=" 39 (the singlequote)"} 0 top_process_io_disk_read{pid="sub(/^b\047/,"");",process="",sanitized_args=""} 0 top_process_io_disk_write{pid="sub(/^b\047/,"");",process="",sanitized_args=""} 0 top_process_io_disk_read{pid="#",process="",sanitized_args=""} 0 top_process_io_disk_write{pid="#",process="",sanitized_args=""} 0 top_process_io_disk_read{pid="args="";",process="",sanitized_args=""} 13000 top_process_io_disk_write{pid="args="";",process="",sanitized_args=""} 0 top_process_io_disk_read{pid="#",process="",sanitized_args=""} 0 top_process_io_disk_write{pid="#",process="",sanitized_args=""} 0 top_process_io_disk_read{pid="gsub("{|}|\\|\"",",process="",sanitized_args=""} 0 top_process_io_disk_write{pid="gsub("{|}|\\|\"",",process="",sanitized_args=""} 0 top_process_io_disk_read{pid="#gsub("{|}|\"",",process="",sanitized_args=""} 0 top_process_io_disk_write{pid="#gsub("{|}|\"",",process="",sanitized_args=""} 0 top_process_io_disk_read{pid="#",process="limited",sanitized_args=" top -w size, we wont need thi"} 0 top_process_io_disk_write{pid="#",process="limited",sanitized_args=" top -w size, we wont need thi"} 30000 top_process_io_disk_read{pid="args=substr(args,",process="",sanitized_args=""} 0 top_process_io_disk_write{pid="args=substr(args,",process="",sanitized_args=""} 0 top_process_io_disk_read{pid="if",process="",sanitized_args=""} 0 top_process_io_disk_write{pid="if",process="",sanitized_args=""} 0 top_process_io_disk_read{pid="if",process="gathered",sanitized_args=" disk read statsn; readtype=1 "} 0 top_process_io_disk_write{pid="if",process="gathered",sanitized_args=" disk read statsn; readtype=1 "} 0 top_process_io_disk_read{pid="printf",process="",sanitized_args=""} 0 top_process_io_disk_write{pid="printf",process="",sanitized_args=""} 1000000 top_process_io_disk_read{pid="}",process="",sanitized_args=""} 0 top_process_io_disk_write{pid="}",process="",sanitized_args=""} 0 top_process_io_disk_read{pid="if",process="",sanitized_args=""} 0 top_process_io_disk_write{pid="if",process="",sanitized_args=""} 0 top_process_io_disk_read{pid="if",process="gathered",sanitized_args=" disk read statsn; writetype=1"} 0 top_process_io_disk_write{pid="if",process="gathered",sanitized_args=" disk read statsn; writetype=1"} 0 top_process_io_disk_read{pid="printf",process="",sanitized_args=""} 0 top_process_io_disk_write{pid="printf",process="",sanitized_args=""} 1000000 top_process_io_disk_read{pid="}",process="",sanitized_args=""} 0 top_process_io_disk_write{pid="}",process="",sanitized_args=""} 0 top_process_io_disk_read{pid="}",process="",sanitized_args=""} 0 top_process_io_disk_write{pid="}",process="",sanitized_args=""} 0

deajan commented 10 months ago

@ruey-cheng Some context will be helpful here. What system are you running ? Can you provide a couple of outputs of iotop -kbn 1 so I can see where there could be a parser error ?

ruey-cheng commented 10 months ago

@ruey-cheng Some context will be helpful here. What system are you running ? Can you provide a couple of outputs of iotop -kbn 1 so I can see where there could be a parser error ?

Hi @deajan , thanks for your help. When I separated awk into an independent file (iotop -bkn 1 | awk -f /opt/iotop.awk), and change process column from $9 to $12, it worked.

OS: CentOS 7.9 (iotop 0.6) iotop -kbn 1 output sample: 21113 be/4 node_exp 0.00 K/s 0.00 K/s 0.00 % 0.00 % node_exporter --collector.textfile.directory /var/lib/node_exporter/textfile_collector 32762 be/4 root 0.00 K/s 0.00 K/s 0.00 % 0.00 % process-exporter --config.path /etc/process-exporter/all.yaml --web.listen-address=:9256