munin-monitoring / munin

Main repository for munin master / node / plugins
http://munin-monitoring.org
Other
1.98k stars 473 forks source link

jstat munin graphs not updated in centos7 #979

Closed Tamaradmin closed 3 years ago

Tamaradmin commented 6 years ago

Hello,

The jstat munin graphs are not updated in all centos7 machines. I can run them manually and can spoolfetch them from the master.

All graphs should be have been updated regularly. That happens to all except jstat plugins.

image [root@serverXX plugins]# ll jstat_tradingnode1_gccount lrwxrwxrwx 1 root root 39 Apr 29 11:00 jstat_tradingnode1_gccount -> /usr/share/munin/plugins/jstat__gccount [root@serverXX plugins]# munin-run jstat_tradingnode1_gccount Young_GC.value 823 Full_GC.value 7 [root@serverXX plugins]# tail -f /var/log/munin-node/munin-node.log 2018/07/02-10:37:36 [9552] Error output from jstat_tradingnode6_gccount_full: 2018/07/02-10:37:36 [9552] 81190 not found 2018/07/02-10:37:36 [9552] Error output from jstat_tradingnode6_heap: 2018/07/02-10:37:36 [9552] 81190 not found 2018/07/02-10:37:37 [9552] Error output from jstat_tradingnode7_gccount: 2018/07/02-10:37:37 [9552] 81595 not found 2018/07/02-10:37:37 [9552] Error output from jstat_tradingnode7_gccount_full: 2018/07/02-10:37:37 [9552] 81595 not found 2018/07/02-10:37:38 [9552] Error output from jstat_tradingnode7_heap: 2018/07/02-10:37:38 [9552] 81595 not found

sumpfralle commented 6 years ago

What is the output of munin-run jstat_tradingnode1_gccount autoconf? How about echo fetch jstat_tradingnode1_gccount | nc localhost munin on the node? And: echo config jstat_tradingnode1_gccount | nc localhost munin?

Tamaradmin commented 6 years ago

Hello,

Thank you for your quick reply.

[root@serverXX plugins]# munin-run jstat_tradingnode1_gccount autoconf no (Java version is invalid) [root@serverXX plugins]# echo fetch jstat_tradingnode1_gccount | nc localhost munin Ncat: Invalid port number "munin". QUITTING. [root@serverXX plugins]# echo config jstat_tradingnode1_gccount | nc localhost munin Ncat: Invalid port number "munin". QUITTING.

I got the same output with munin user as well.

Please note that all other plugins are just working fine for the same machine.

sumpfralle commented 6 years ago

[root@serverXX plugins]# munin-run jstat_tradingnode1_gccount autoconf no (Java version is invalid)

OK - thus it looks like your java version is not detected properly. Could you take a look at the munin plugin script and see, if you can find the cause?

Regarding "invalid port number": it looks like your system (/etc/services) lacks the munin entry. Use 4949 instead.

Tamaradmin commented 6 years ago

the java version is setup within the script (attached).

I'm also running the same script in differents machine and no one from centos6 has the issue : [root@serverYY plugins]# munin-run jstat_wingasWSOd_gccount autoconf no (Java version is invalid) [root@serverYY plugins]# cat /etc/redhat-release CentOS release 6.9 (Final)

I run the script in verbose mode to see the output : [root@serverXX plugins]# munin-run jstat_tradingnode1_gccount

sumpfralle commented 6 years ago

Thank you for adding more details!

Please provide the output of the following commands:

Thanks!

Tamaradmin commented 6 years ago

Thanks for you!

Below is the configuration of this plugin : [jstat*] user superx group superx

[jstat_tradingnode1*] env.pidfilepath /www/pid/tradingnode1.pid env.graphtitle tradingnode1

and this is the output of the commands : [superx@edmz-int09 ~]$ java -version java version "1.8.0_172" Java(TM) SE Runtime Environment (build 1.8.0_172-b11) Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode)

[superx@serverXX ~]$ cat /www/pid/tradingnode1.pid 93533 [superx@serverXX ~]$ jstat -gc 93533 S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT 2048.0 2048.0 0.0 1344.0 694784.0 145013.5 1398272.0 414857.1 102400.0 96308.0 12544.0 11240.1 979 20.997 5 2.705 23.702

sumpfralle commented 6 years ago

I prepared a set of changes in a separate branch: https://github.com/sumpfralle/contrib/tree/jstat-plugins/plugins/jvm

Please download the plugin that fails for you and try if it produces a different result. Maybe you will need to set the javahome environment variable while running the plugin:

javahome=/usr/lib/jvm/default-java munin-run jstat_tradingnode1_gccount

(maybe the path of your java environment differs on your system)

Tamaradmin commented 6 years ago

Thank you so much for the script.

I tried it (I named it jstat_tradingnode1_gccount_new) and it returns a result when run it manually but the graph is still showing nan values :

[root@serverXX plugins]# javahome=/tex/java/jdk munin-run jstat_tradingnode1_gccount_new Young_GC.value 38548 Full_GC.value 14 [root@ serverXX plugins]# javahome=/tex/java/jdk munin-run jstat_tradingnode1_gccount Young_GC.value 38562 Full_GC.value 14

[root@ serverXX plugins]# ll jstat_tradingnode1_gccount* lrwxrwxrwx 1 root root 39 Apr 29 11:00 jstat_tradingnode1_gccount -> /usr/share/munin/plugins/jstat__gccount lrwxrwxrwx 1 root root 40 Jul 4 19:15 jstat_tradingnode1_gccount_new -> /usr/share/munin/plugins/jstat__gccount2

sumpfralle commented 6 years ago

Interesting!

A few more attempts:

btw: is /tex/java/jdk really the default path of the Java JDK on Centos? Or did you install a custom package?

Tamaradmin commented 6 years ago

Here you go :+1: [root@serverXXplugins]# munin-run jstat_tradingnode1_gccount_new config graph_title GC Count tradingnode1 graph_args -l 0 graph_vlabel GC Count(times) graph_total total graph_info GC Count graph_category virtualization Young_GC.label Young_GC Young_GC.min 0 Full_GC.label Full_GC Full_GC.min 0

Munin-node.log : (this output is the same for the other machines where the graph work correctly ) 2018/07/04-20:07:30 [55955] Error output from jstat_tradingnode1_gccount: 2018/07/04-20:07:30 [55955] 4155 not found 2018/07/04-20:07:30 [55955] Error output from jstat_tradingnode1_gccount_full: 2018/07/04-20:07:30 [55955] 4155 not found 2018/07/04-20:07:31 [55955] Error output from jstat_tradingnode1_gccount_new: 2018/07/04-20:07:31 [55955] 4155 not found

Munin-update.log: 2018/07/04 20:10:11 [INFO] starting work in 24185 for serverXX.int/ssh://serverXX:4949. 2018/07/04 20:10:11 [INFO] node serverXX.int advertised itself as localhost.localdomain instead. ..... 2018/07/04 20:10:13 [WARNING] Service jstat_tradingnode1_gccount_new on serverXX.int/ssh://serverXX:4949 returned no data for label Full_GC 2018/07/04 20:10:13 [WARNING] Service jstat_tradingnode1_gccount_new on serverXX.int/ssh://serverXX:4949 returned no data for label Young_GC 2018/07/04 20:10:13 [WARNING] Service jstat_tradingnode1_gccount on serverXX.int/ssh://serverXX:4949 returned no data for label Full_GC 2018/07/04 20:10:13 [WARNING] Service jstat_tradingnode1_gccount on serverXX.int/ssh://serverXX:4949 returned no data for label Young_GC .... 2018/07/04 20:10:13 [INFO]: Munin-update finished for node int;serverXX.int (2.43 sec) 2018/07/04 20:10:14 [INFO] Reaping Munin::Master::UpdateWorker<int;serverXX.int>. Exit value/signal: 0/0

sumpfralle commented 6 years ago

What a mystery!

This line is probably the interesting one: 4155 not found.

I guess, that 4155 is the PID stored in /www/pid/tradingnode1.pid? Thus the error is emitted by jstat. Could it just be, that the PID in that file is wrong/outdated? Is there a process with this ID currently running? Or could there be some security mechanism preventing the plugin from accessing this process? (cgroups, SELinux, apparmor, ...)

Tamaradmin commented 6 years ago

I confirm :)

This is the right pid : [root@serverXX plugins]# ps -ef|grep 4155 userx 4155 1 88 09:54 ? 09:39:40 tradingnode1 -Xmx2048M -Xms2048M -server -XX:+TieredCompilation - ....

This error exists also in the other machines and doesnt affect anything : plugins are running correctly and graphs updated.

sumpfralle commented 6 years ago

OK. Do you have any idea regarding potentially disturbing security mechanisms?

For further testing: please change the following line: "${JAVA_HOME}/bin/jstat" -gc "$pid_num" | tail -1 | awk "$awk_script" into this one: strace -o /tmp/jstat.strace -f "${JAVA_HOME}/bin/jstat" -gc "$pid_num" | tail -1 | awk "$awk_script"

Maybe you need to install strace before. Please share the file /tmp/jstat.strace with us. Hopefully it will help us solve the mystery ...

sumpfralle commented 6 years ago

Ping?

Tamaradmin commented 6 years ago

Sorry I missed your last comment.

I attached the strace file.

Thanks jstat.strace.txt .

Tamaradmin commented 6 years ago

Did you something wrong in the trace file?

Tamaradmin commented 5 years ago

Are jstat graphs working for someone in centos 7?

bi3lik commented 5 years ago

For anyone who struggled with this issue. The guilty is PrivateTmp parameter in munin-node systemd service config. On default it is set to True, which cause munin-node process to use /tmp// rather then just /tmp . Setting PrivateTmp to false, makes jstat work correctly.

sumpfralle commented 5 years ago

@bi3lik: thank you for solving that weird mystery!

And now we need to know how to work around this. The PrivateTmp setting is a good thing for almost every situation, thus I do not think, we should remove it from munin-node.service. Do you see a way, how we can trick jstat into not using /tmp? This directory should really never be used for inter-process communication.

Tamaradmin commented 5 years ago

@bi3lik : Thanks a lot for your help! I thought that this issue won't be ever resolved.

@Lars: Thanks Lars for your help so far. FYI, I replied to your question already in the github post but it looks like you didin't get a notification for that : https://github.com/munin-monitoring/munin/issues/979

On Sat, Feb 9, 2019 at 2:35 AM Lars Kruse notifications@github.com wrote:

@bi3lik https://github.com/bi3lik: thank you for solving that weird mystery!

And now we need to know how to work around this. The PrivateTmp setting is a good thing for almost every situation, thus I do not think, we should remove it from munin-node.service. Do you see a way, how we can trick jstat into not using /tmp? This directory should really never be used for inter-process communication.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/munin-monitoring/munin/issues/979#issuecomment-462000711, or mute the thread https://github.com/notifications/unsubscribe-auth/Ac7wLdpsUsLhwo2MSt86TrQI_yD4ek9xks5vLiXngaJpZM4U_F7k .

sumpfralle commented 5 years ago

I replied to your question already in the github post but it looks like you didin't get a notification for that : #979

@Tamaradmin: I am not sure, that I understand what you are aiming at. Could you please elaborate?

From my point of view we reached the conclusion that the PrivateTmp setting in munin-node.service causes the failure of the jstat plugin.

The good approach for solving this is by changing jstat's behavior: it should clearly not use /tmp for inter-process communication.

The second best approach is a manual override of the service settings, e.g.:

  1. run systemctl edit munin-node
  2. add the following lines to the file that just opened up in your favorite text editor:
    [Service]
    PrivateTmp=false
  3. run systemctl daemon-reload
  4. restart the munin-node service
miiichael commented 5 years ago

Having just encountered this issue myself (and was super confused by the "$PID not found" errors, because while munin-run worked, munin-node itself didn't)...

The choice to use /tmp isn't made by jstat, but rather by Java putting stuff (that jstat wants to use) in /tmp/hsperfdata_$USER/$PID, instead of somewhere sensible.

This was fixed in JDK-6938627 (albeit badly - see JDK-6944822) but unfixed in JDK-7009828 (on the sole basis that tools (eg. visualvm) were not updated to actually be told where to find hsperfdata. Go figure).

I therefore suspect this will be a bug that will annoy us until the end of time, and that if we want the jstat plugin to work, we will have to turn off munin-node's (and, if applicable, the Java application's) PrivateTmp so that munin-node sees the same /tmp that Java is using.

sumpfralle commented 3 years ago

Closing this issue, since Munin cannot do anything about the choice of temporary file storage used by the Java interpreter.

Please re-open, if you have an idea, how this could be fixed within Munin.