tableau / community-tableau-server-insights

Community-built data sources for answering questions about Tableau Server
MIT License
129 stars 52 forks source link

Build mechanism for incremental extract refreshes on TS Background Tasks #54

Closed mcoles closed 3 years ago

mcoles commented 3 years ago

TS Background tasks is getting unwieldly due to the complex Custom SQL required to pull in all the information we want. A solution would be to extract the data, but because this data source is frequently used to check the status of extract refreshes, it must be refreshed fairly frequently to provide useful information (hourly, or faster). The issue there is that using the Completed At (or created at, or started at) date fields to increment on will either catch in-progress jobs, pull them in, then never update them, or if you combine the dates and increment based on that, it'll create duplicate records.

The solution I'm going with has two parts: (1) Combine the greatest of Created At, Started At, and Completed At, and use that as the incremental field. This will ensure that any updated record gets pulled in, but will create duplicates for a single job, so to solve that, I'm introducing (2) an LOD data source filter that hides all but the most recent records. Combined, this allows you to incrementally refresh quite frequently at minimal compute cost (7 to 15 second refreshes on my massive server!), and then run a full refresh weekly or so to purge the old records.