Also interesting metrics

lemoer commented 7 years ago

fork rate: cat /proc/stat | grep processes
cpu usage
memory usage

jue89 commented 7 years ago

Thank you! I would like to "solve" this issue with this information:

CPU Usage: http://www.linuxhowtos.org/System/procstat.htm
Memory Usage: https://www.centos.org/docs/5/html/5.2/Deployment_Guide/s2-proc-meminfo.html
Running Processes: Read /proc/[0-9]+ and then: http://stackoverflow.com/questions/12213445/identifying-kernel-threads
Zombie Processes: http://stackoverflow.com/questions/16382964/detect-if-pid-is-zombie-on-linux

lemoer commented 7 years ago

I haven't looked at all the references in deep yet, but it looks good.

Have you noticed the difference between cat /proc/stat | grep processes (which is the number of forks since the system has booted) and the number of running processes?

jue89 commented 7 years ago

Topic design:

/cpu/idle
    /user (=user + nice)
    /system
    /iowait
    /irq
    /softirq
    /steal (= steal + guest + guest_nice)
/mem/used
    /free
    /buffers
    /cached
    /swapused
    /swapfree
/processes/forkrate
          /ctxtrate
          /tasks
          /threads
          /kthreads
          /zombies

(Gonna attach some comments later)

lemoer commented 7 years ago

more work for you:

conntrack length
batman originators
sucessful dhcp replys (maybe via icmp ping 5 sek später)
droped packets by fastd
log message rate (maybe sorted by daemons and prio?)
packet size histogramm?
cpu auslastung der fastd instanz?
cross traffic (via fastd status socket)

jue89 commented 7 years ago

Some metrics are already implemented:

fork rate
cpu
memory usage
conntrack length
cross traffic

I think those can be analysed using bpfcountd:

batman originators
packet size histogramm

Those are missing and easy to implement:

droped packets by fastd
log message rate

And those are not that easy to implement (i.e. I don't have a solution that is not a quick'n'dirty one):

cpu auslastung der fastd instanz
sucessful dhcp replys (maybe via icmp ping 5 sek später)

lemoer commented 7 years ago

Very nice. I think the cpu of fastd is very important, but we should yield up trying to count the successful dhcp replies. I think its really too complicated.

jue89 commented 7 years ago

Hmm, if we just publish the stats of each cpu core separately, we should get an quite accurate idea how much cpu is consumed by fastd. Don't you think so?

lemoer commented 7 years ago

I'm not sure, because the fastd instance seems to hop a lot on multi cpu systems.

jue89 commented 7 years ago

I see. Gonna find something better to meter cpu usage.

lemoer commented 7 years ago

Maybe this excerpt from man 5 proc helps:?

              (14) utime  %lu
                        Amount  of  time  that  this process has been
                        scheduled in user  mode,  measured  in  clock
                        ticks (divide by sysconf(_SC_CLK_TCK)).  This
                        includes guest time, guest_time  (time  spent
                        running  a  virtual  CPU, see below), so that
                        applications that are not aware of the  guest
                        time  field  do not lose that time from their
                        calculations.

              (15) stime  %lu
                        Amount of time that  this  process  has  been
                        scheduled  in  kernel mode, measured in clock
                        ticks (divide by sysconf(_SC_CLK_TCK)).

              (16) cutime  %ld
                        Amount of time that this process's waited-for
                        children  have  been  scheduled in user mode,
                        measured   in   clock   ticks   (divide    by
                        sysconf(_SC_CLK_TCK)).   (See also times(2).)
                        This includes guest time,  cguest_time  (time
                        spent running a virtual CPU, see below).

              (17) cstime  %ld
                        Amount of time that this process's waited-for
                        children have been scheduled in kernel  mode,
                        measured    in   clock   ticks   (divide   by
                        sysconf(_SC_CLK_TCK)).

jue89 commented 7 years ago

Yes sure. I'm familiar with that ;)

But I don't know how to project this information in mqtt topics. I would integrate this feature in the processes plugin. The trivial projection would be:

org/example/processes/fastd/cpu
org/example/processes/fastd/mem

But what to do if multiple fastd instances are running? Add the PID?

org/example/processes/fastd/1234/cpu
org/example/processes/fastd/1234/mem

But this would be complicated to put in a graph. Wouldn't it?

You got an idea how to solve this?

lemoer commented 7 years ago

I also think, that the pid is not the right way.

Another idea:

Maybe we could solve this by doing it for the systemd services instead of processes. But since there is no 1:1 mapping between processes and services, i would say that we should aggregate all memory and mem statistics. This would be nice for services, which fork childrens.

This reveals the service name:

cat /proc/$PID/cgroup | grep name=systemd

lemoer commented 7 years ago

Maybe other interesting things:

histogram of captured tcp window size
count seen tcp retransmissions (not sure, whether this is handable)

jue89 / node-mqtt-stats

Also interesting metrics #2