jue89 / node-mqtt-stats

Publishing server stats using MQTT
MIT License
0 stars 0 forks source link

Also interesting metrics #2

Open lemoer opened 7 years ago

lemoer commented 7 years ago
jue89 commented 7 years ago

Thank you! I would like to "solve" this issue with this information:

lemoer commented 7 years ago

I haven't looked at all the references in deep yet, but it looks good.

Have you noticed the difference between cat /proc/stat | grep processes (which is the number of forks since the system has booted) and the number of running processes?

jue89 commented 7 years ago

Topic design:

/cpu/idle
    /user (=user + nice)
    /system
    /iowait
    /irq
    /softirq
    /steal (= steal + guest + guest_nice)
/mem/used
    /free
    /buffers
    /cached
    /swapused
    /swapfree
/processes/forkrate
          /ctxtrate
          /tasks
          /threads
          /kthreads
          /zombies

(Gonna attach some comments later)

lemoer commented 7 years ago

more work for you:

jue89 commented 7 years ago

Some metrics are already implemented:

I think those can be analysed using bpfcountd:

Those are missing and easy to implement:

And those are not that easy to implement (i.e. I don't have a solution that is not a quick'n'dirty one):

lemoer commented 7 years ago

Very nice. I think the cpu of fastd is very important, but we should yield up trying to count the successful dhcp replies. I think its really too complicated.

jue89 commented 7 years ago

Hmm, if we just publish the stats of each cpu core separately, we should get an quite accurate idea how much cpu is consumed by fastd. Don't you think so?

lemoer commented 7 years ago

I'm not sure, because the fastd instance seems to hop a lot on multi cpu systems.

jue89 commented 7 years ago

I see. Gonna find something better to meter cpu usage.

lemoer commented 7 years ago

Maybe this excerpt from man 5 proc helps:?

              (14) utime  %lu
                        Amount  of  time  that  this process has been
                        scheduled in user  mode,  measured  in  clock
                        ticks (divide by sysconf(_SC_CLK_TCK)).  This
                        includes guest time, guest_time  (time  spent
                        running  a  virtual  CPU, see below), so that
                        applications that are not aware of the  guest
                        time  field  do not lose that time from their
                        calculations.

              (15) stime  %lu
                        Amount of time that  this  process  has  been
                        scheduled  in  kernel mode, measured in clock
                        ticks (divide by sysconf(_SC_CLK_TCK)).

              (16) cutime  %ld
                        Amount of time that this process's waited-for
                        children  have  been  scheduled in user mode,
                        measured   in   clock   ticks   (divide    by
                        sysconf(_SC_CLK_TCK)).   (See also times(2).)
                        This includes guest time,  cguest_time  (time
                        spent running a virtual CPU, see below).

              (17) cstime  %ld
                        Amount of time that this process's waited-for
                        children have been scheduled in kernel  mode,
                        measured    in   clock   ticks   (divide   by
                        sysconf(_SC_CLK_TCK)).
jue89 commented 7 years ago

Yes sure. I'm familiar with that ;)

But I don't know how to project this information in mqtt topics. I would integrate this feature in the processes plugin. The trivial projection would be:

org/example/processes/fastd/cpu
org/example/processes/fastd/mem

But what to do if multiple fastd instances are running? Add the PID?

org/example/processes/fastd/1234/cpu
org/example/processes/fastd/1234/mem

But this would be complicated to put in a graph. Wouldn't it?

You got an idea how to solve this?

lemoer commented 7 years ago

I also think, that the pid is not the right way.

Another idea:

Maybe we could solve this by doing it for the systemd services instead of processes. But since there is no 1:1 mapping between processes and services, i would say that we should aggregate all memory and mem statistics. This would be nice for services, which fork childrens.

This reveals the service name:

cat /proc/$PID/cgroup | grep name=systemd
lemoer commented 7 years ago

Maybe other interesting things: