Closed Tamaradmin closed 3 years ago
What is the output of munin-run jstat_tradingnode1_gccount autoconf
?
How about echo fetch jstat_tradingnode1_gccount | nc localhost munin
on the node?
And: echo config jstat_tradingnode1_gccount | nc localhost munin
?
Hello,
Thank you for your quick reply.
[root@serverXX plugins]# munin-run jstat_tradingnode1_gccount autoconf no (Java version is invalid) [root@serverXX plugins]# echo fetch jstat_tradingnode1_gccount | nc localhost munin Ncat: Invalid port number "munin". QUITTING. [root@serverXX plugins]# echo config jstat_tradingnode1_gccount | nc localhost munin Ncat: Invalid port number "munin". QUITTING.
I got the same output with munin user as well.
Please note that all other plugins are just working fine for the same machine.
[root@serverXX plugins]# munin-run jstat_tradingnode1_gccount autoconf no (Java version is invalid)
OK - thus it looks like your java version is not detected properly. Could you take a look at the munin plugin script and see, if you can find the cause?
Regarding "invalid port number": it looks like your system (/etc/services
) lacks the munin entry. Use 4949 instead.
the java version is setup within the script (attached).
I'm also running the same script in differents machine and no one from centos6 has the issue : [root@serverYY plugins]# munin-run jstat_wingasWSOd_gccount autoconf no (Java version is invalid) [root@serverYY plugins]# cat /etc/redhat-release CentOS release 6.9 (Final)
I run the script in verbose mode to see the output : [root@serverXX plugins]# munin-run jstat_tradingnode1_gccount
Thank you for adding more details!
Please provide the output of the following commands:
java --version
jstat -gc "$(cat /var/run/jsvc.pid)"
Thanks!
Thanks for you!
Below is the configuration of this plugin : [jstat*] user superx group superx
[jstat_tradingnode1*] env.pidfilepath /www/pid/tradingnode1.pid env.graphtitle tradingnode1
and this is the output of the commands : [superx@edmz-int09 ~]$ java -version java version "1.8.0_172" Java(TM) SE Runtime Environment (build 1.8.0_172-b11) Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode)
[superx@serverXX ~]$ cat /www/pid/tradingnode1.pid 93533 [superx@serverXX ~]$ jstat -gc 93533 S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT 2048.0 2048.0 0.0 1344.0 694784.0 145013.5 1398272.0 414857.1 102400.0 96308.0 12544.0 11240.1 979 20.997 5 2.705 23.702
I prepared a set of changes in a separate branch: https://github.com/sumpfralle/contrib/tree/jstat-plugins/plugins/jvm
Please download the plugin that fails for you and try if it produces a different result.
Maybe you will need to set the javahome
environment variable while running the plugin:
javahome=/usr/lib/jvm/default-java munin-run jstat_tradingnode1_gccount
(maybe the path of your java environment differs on your system)
Thank you so much for the script.
I tried it (I named it jstat_tradingnode1_gccount_new) and it returns a result when run it manually but the graph is still showing nan values :
[root@serverXX plugins]# javahome=/tex/java/jdk munin-run jstat_tradingnode1_gccount_new Young_GC.value 38548 Full_GC.value 14 [root@ serverXX plugins]# javahome=/tex/java/jdk munin-run jstat_tradingnode1_gccount Young_GC.value 38562 Full_GC.value 14
[root@ serverXX plugins]# ll jstat_tradingnode1_gccount* lrwxrwxrwx 1 root root 39 Apr 29 11:00 jstat_tradingnode1_gccount -> /usr/share/munin/plugins/jstat__gccount lrwxrwxrwx 1 root root 40 Jul 4 19:15 jstat_tradingnode1_gccount_new -> /usr/share/munin/plugins/jstat__gccount2
Interesting!
A few more attempts:
munin-run ... config
?/var/log/munin/munin-node.log
?/var/log/munin/munin-update.log
? (on the master host)btw: is /tex/java/jdk
really the default path of the Java JDK on Centos? Or did you install a custom package?
Here you go :+1: [root@serverXXplugins]# munin-run jstat_tradingnode1_gccount_new config graph_title GC Count tradingnode1 graph_args -l 0 graph_vlabel GC Count(times) graph_total total graph_info GC Count graph_category virtualization Young_GC.label Young_GC Young_GC.min 0 Full_GC.label Full_GC Full_GC.min 0
Munin-node.log : (this output is the same for the other machines where the graph work correctly ) 2018/07/04-20:07:30 [55955] Error output from jstat_tradingnode1_gccount: 2018/07/04-20:07:30 [55955] 4155 not found 2018/07/04-20:07:30 [55955] Error output from jstat_tradingnode1_gccount_full: 2018/07/04-20:07:30 [55955] 4155 not found 2018/07/04-20:07:31 [55955] Error output from jstat_tradingnode1_gccount_new: 2018/07/04-20:07:31 [55955] 4155 not found
Munin-update.log: 2018/07/04 20:10:11 [INFO] starting work in 24185 for serverXX.int/ssh://serverXX:4949. 2018/07/04 20:10:11 [INFO] node serverXX.int advertised itself as localhost.localdomain instead. ..... 2018/07/04 20:10:13 [WARNING] Service jstat_tradingnode1_gccount_new on serverXX.int/ssh://serverXX:4949 returned no data for label Full_GC 2018/07/04 20:10:13 [WARNING] Service jstat_tradingnode1_gccount_new on serverXX.int/ssh://serverXX:4949 returned no data for label Young_GC 2018/07/04 20:10:13 [WARNING] Service jstat_tradingnode1_gccount on serverXX.int/ssh://serverXX:4949 returned no data for label Full_GC 2018/07/04 20:10:13 [WARNING] Service jstat_tradingnode1_gccount on serverXX.int/ssh://serverXX:4949 returned no data for label Young_GC .... 2018/07/04 20:10:13 [INFO]: Munin-update finished for node int;serverXX.int (2.43 sec) 2018/07/04 20:10:14 [INFO] Reaping Munin::Master::UpdateWorker<int;serverXX.int>. Exit value/signal: 0/0
What a mystery!
This line is probably the interesting one: 4155 not found
.
I guess, that 4155
is the PID stored in /www/pid/tradingnode1.pid
?
Thus the error is emitted by jstat
.
Could it just be, that the PID in that file is wrong/outdated?
Is there a process with this ID currently running?
Or could there be some security mechanism preventing the plugin from accessing this process? (cgroups, SELinux, apparmor, ...)
I confirm :)
This is the right pid : [root@serverXX plugins]# ps -ef|grep 4155 userx 4155 1 88 09:54 ? 09:39:40 tradingnode1 -Xmx2048M -Xms2048M -server -XX:+TieredCompilation - ....
This error exists also in the other machines and doesnt affect anything : plugins are running correctly and graphs updated.
OK. Do you have any idea regarding potentially disturbing security mechanisms?
For further testing: please change the following line:
"${JAVA_HOME}/bin/jstat" -gc "$pid_num" | tail -1 | awk "$awk_script"
into this one:
strace -o /tmp/jstat.strace -f "${JAVA_HOME}/bin/jstat" -gc "$pid_num" | tail -1 | awk "$awk_script"
Maybe you need to install strace
before.
Please share the file /tmp/jstat.strace
with us. Hopefully it will help us solve the mystery ...
Ping?
Did you something wrong in the trace file?
Are jstat graphs working for someone in centos 7?
For anyone who struggled with this issue. The guilty is PrivateTmp parameter in munin-node systemd service config. On default it is set to True, which cause munin-node process to use /tmp/
@bi3lik: thank you for solving that weird mystery!
And now we need to know how to work around this.
The PrivateTmp
setting is a good thing for almost every situation, thus I do not think, we should remove it from munin-node.service
.
Do you see a way, how we can trick jstat
into not using /tmp
? This directory should really never be used for inter-process communication.
@bi3lik : Thanks a lot for your help! I thought that this issue won't be ever resolved.
@Lars: Thanks Lars for your help so far. FYI, I replied to your question already in the github post but it looks like you didin't get a notification for that : https://github.com/munin-monitoring/munin/issues/979
On Sat, Feb 9, 2019 at 2:35 AM Lars Kruse notifications@github.com wrote:
@bi3lik https://github.com/bi3lik: thank you for solving that weird mystery!
And now we need to know how to work around this. The PrivateTmp setting is a good thing for almost every situation, thus I do not think, we should remove it from munin-node.service. Do you see a way, how we can trick jstat into not using /tmp? This directory should really never be used for inter-process communication.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/munin-monitoring/munin/issues/979#issuecomment-462000711, or mute the thread https://github.com/notifications/unsubscribe-auth/Ac7wLdpsUsLhwo2MSt86TrQI_yD4ek9xks5vLiXngaJpZM4U_F7k .
I replied to your question already in the github post but it looks like you didin't get a notification for that : #979
@Tamaradmin: I am not sure, that I understand what you are aiming at. Could you please elaborate?
From my point of view we reached the conclusion that the PrivateTmp
setting in munin-node.service
causes the failure of the jstat
plugin.
The good approach for solving this is by changing jstat
's behavior: it should clearly not use /tmp
for inter-process communication.
The second best approach is a manual override of the service settings, e.g.:
systemctl edit munin-node
[Service]
PrivateTmp=false
systemctl daemon-reload
munin-node
serviceHaving just encountered this issue myself (and was super confused by the "$PID not found" errors, because while munin-run worked, munin-node itself didn't)...
The choice to use /tmp isn't made by jstat, but rather by Java putting stuff (that jstat wants to use) in /tmp/hsperfdata_$USER/$PID, instead of somewhere sensible.
This was fixed in JDK-6938627 (albeit badly - see JDK-6944822) but unfixed in JDK-7009828 (on the sole basis that tools (eg. visualvm) were not updated to actually be told where to find hsperfdata. Go figure).
I therefore suspect this will be a bug that will annoy us until the end of time, and that if we want the jstat plugin to work, we will have to turn off munin-node's (and, if applicable, the Java application's) PrivateTmp so that munin-node sees the same /tmp that Java is using.
Closing this issue, since Munin cannot do anything about the choice of temporary file storage used by the Java interpreter.
Please re-open, if you have an idea, how this could be fixed within Munin.
Hello,
The jstat munin graphs are not updated in all centos7 machines. I can run them manually and can spoolfetch them from the master.
All graphs should be have been updated regularly. That happens to all except jstat plugins.
[root@serverXX plugins]# ll jstat_tradingnode1_gccount lrwxrwxrwx 1 root root 39 Apr 29 11:00 jstat_tradingnode1_gccount -> /usr/share/munin/plugins/jstat__gccount [root@serverXX plugins]# munin-run jstat_tradingnode1_gccount Young_GC.value 823 Full_GC.value 7 [root@serverXX plugins]# tail -f /var/log/munin-node/munin-node.log 2018/07/02-10:37:36 [9552] Error output from jstat_tradingnode6_gccount_full: 2018/07/02-10:37:36 [9552] 81190 not found 2018/07/02-10:37:36 [9552] Error output from jstat_tradingnode6_heap: 2018/07/02-10:37:36 [9552] 81190 not found 2018/07/02-10:37:37 [9552] Error output from jstat_tradingnode7_gccount: 2018/07/02-10:37:37 [9552] 81595 not found 2018/07/02-10:37:37 [9552] Error output from jstat_tradingnode7_gccount_full: 2018/07/02-10:37:37 [9552] 81595 not found 2018/07/02-10:37:38 [9552] Error output from jstat_tradingnode7_heap: 2018/07/02-10:37:38 [9552] 81595 not found