tableau / TabMon

A Tableau Server performance monitoring service
https://tableau.github.io/TabMon/
MIT License
108 stars 50 forks source link

Tabmon does not start if process is unavailable #246

Open razze76 opened 4 years ago

razze76 commented 4 years ago

I have just installed Tabmon 1.4 on a 3 node cluster. One of the backgrounder processes has hanged, and is currently not responding. This non-responding process causes tabmon to no not start and crash in a way that requires task manager to close. When I exclude that process from the tabmon.config, tabmon works perfectly. Will tabmon fail miserably if a process that TabMon is configured to monitor does not respond? Is there a way to set a timeout for a process and just ignore it if it does not respond?

sorry for incomplete logs, the log files are on a remote server with poor web access. after 16.22 I waited 3 minutes and force closed Tabmon and restarted.

Part of tabmon.log 2020-06-08 16:22:24,932 [5] DEBUG TabMon.CounterConfig.CounterConfigLoader - Loading MBean counters.. 2020-06-08 16:22:24,934 [5] DEBUG TabMon.Counters.MBean.MBeanClientFactory - Scanning JMX port(s) "8209, 8762, 8719, 8161" on tableauprod2.. 2020-06-08 16:22:25,689 [5] DEBUG TabMon.Counters.MBean.JmxConnectorProxy - Opened connection to JMX server at tableauprod2:8209. 2020-06-08 16:22:25,689 [5] DEBUG TabMon.Counters.MBean.MBeanClientFactory - Created JMX client for tableauprod2:8209. 2020-06-08 16:22:25,724 [5] DEBUG TabMon.Counters.MBean.JmxConnectorProxy - Opened connection to JMX server at tableauprod2:8762. 2020-06-08 16:22:25,724 [5] DEBUG TabMon.Counters.MBean.MBeanClientFactory - Created JMX client for tableauprod2:8762. 2020-06-08 16:22:25,753 [5] DEBUG TabMon.Counters.MBean.JmxConnectorProxy - Opened connection to JMX server at tableauprod2:8719. 2020-06-08 16:22:25,753 [5] DEBUG TabMon.Counters.MBean.MBeanClientFactory - Created JMX client for tableauprod2:8719. 2020-06-08 16:26:56,472 [5] INFO TabMon.Config.TabMonConfigReader - Loading TabMon user configuration..

relevant output from tsm status -v 'Tableau Server Cluster Controller 0' is running. 'Tableau Server Search And Browse 0' is running. 'Tableau Server Backgrounder 0' is running. 'Tableau Server Backgrounder 1' is running. 'Tableau Server Backgrounder 2' is running. 'Tableau Server Backgrounder 3' status is unavailable. 'Tableau Server Non-Interactive Microservice Container 0' is running.

razze76 commented 4 years ago

Addition: when a process goes down and stops responding to JMX queries, tabmon also stops working again.

danjrahm commented 4 years ago

Hello,

This sounds like a bug with TabMon. I'm not working on TabMon currently but will try to set up some time in the next couple of weeks to get this sorted out.

Thanks, Dan