tableau / TabMon

A Tableau Server performance monitoring service
https://tableau.github.io/TabMon/
MIT License
108 stars 50 forks source link

Tabmon is too slow, all reports taking more than 30 mins #229

Closed ravihoskote closed 5 years ago

ravihoskote commented 5 years ago

I want to create a tde datasource that contains performance metrics, eventually used by my team to analyze server performance. I thought it's a good idea to use TabMon since its doing the same thing. But i am stuck at getting tabmon to work

I installed TabMon last year to monitor my Tableau Server 10.4 From past 2 months, tabmon performance has deteriorated. It's running very slow. I am running the tabmon dashboards that come during install but they run for almost 45 mins before returning results.

Debugging - after a bit of debugging i figured out Postgres db might be the reason for slowness. I see that queries on PostgreSQL are running for long time.

Example: select count(*) from countersamples ; This query has been running for more than 45 min now..

Fixes: I tried some fixes like VACCUM but it didn't improve performance. the table is set for auto analyze too ..

Has anyone else faced this issue ?

danjrahm commented 5 years ago

Hello @ravihoskote,

This is a fairly common occurrence with TabMon. If the poll rate is too aggressive and all of the data is retained, the amount of data stored in TabMon can bloat, causing the queries to be slow.

To mitigate the issue I suggest using a mixture of the following performance strategies:

  1. Decrease the sample rate for TabMon.
  2. Enabled PurgeOldData option in the config. This will delete data after a certain age.
  3. Create an extract and perform incremental refreshes to ensure that full queries aren't run.
  4. If multiple clusters are being polled by a single TabMon instance, break these up into multiple TabMon processes and databases.

Overall, this issue is likely do to an extreme amount of data in the db. My general suggestion is to decrease the amount of data that is being queried.

If you would like more specific suggestions on how to mitigate this issue, please let me know.

Thanks, Dan

ravihoskote commented 5 years ago

hi Dan,

Thanks for your reply. I already completed 1, 2 and 4. Trying out point 3 now.

Here are the changes that i have put in place , please let me know if this is okay or needs changes.

  1. increased poll interval to 300 sec or 5 mins
  2. Enabled to purge data threshold to 60 days
  3. Polling only single cluster (3 nodes)

Thanks, Ravi

danjrahm commented 5 years ago

Hi @ravihoskote,

If the queries are taking 30+ minutes with almost all of the suggested options enabled, there is the distinct possibility that machine that the DB is running on is experiencing performance issues itself.

Have you tried looking at the Postgres machine's performance to verify that it is within an acceptable range?

Thanks, Dan

danjrahm commented 5 years ago

Hello,

Due to inactivity, I am going to close this issue. If this issue is still occurring feel free to reopen this issue.

Thanks, Dan

naveentalari commented 4 years ago

Hi Dan,

I have enabled purge old data and given sample rate for 60 days, but still, TabMon collecting last 9 months data, not sure how this is happening? Do we need to restart the host machine to effect the changes? before enabling purge old data and sample rate, I just stopped TabMon and restarted after enabling it. Please help to fix this.

Thanks, Naveen