tableau / community-tableau-server-insights

Community-built data sources for answering questions about Tableau Server
MIT License
127 stars 52 forks source link

Add Prep flows into TS Data Connections #40

Open mcoles opened 4 years ago

mcoles commented 4 years ago

During the last release, I tried hard to integrate Prep flows into the TS Data Connections data source. After a long struggle, I couldn't find a logical way to integrate it in. At this point, I can't recall why it was so hard to do. But the basic idea is that Prep flows have both inputs and output connections, and either can be a published data source on Tableau Server.

mcoles commented 3 years ago

Took another look at this today for an impact assessment we wanted to run. The trouble is that with the addition of Prep flows, what used to be a lineage structure of finite depth (Workbook->Published Data Source->Database Connection) now becomes one of (theoretically) infinite depth (Workbook->Published Data Source->Flow->Published Data Source->Flow...->Published Data Source->Database Connection). There's not a great way to represent that in the current TS Data Connection structure.

One approach to solving that problem would be developing a recursive CTE within Custom SQL that iterates over each level of connection depth, then populates the "(underlying)" fields in the data source with the earliest connection information from depth 0. However, the trouble with that is that much like the relationship between Views and Data Sources, where the Postgres repo doesn't really hold information about how they're associated, we have the same problem with Flow inputs and outputs--we know the inputs, the outputs, but not which output depends upon which input. So we can't really associate the connections properly, other than at the Flow level. And one Flow may contain many separate segments doing completely unrelated things.

Another, probably better, approach is to not try and construct a complete lineage, but instead break it off when Flows enter the equation. The way this would work is that we'd consider Flow inputs to be a Data Source just like in a Workbook--it might talk directly to a Data Connection, or it might talk to a Published Data Source, which in turn talks to a Data Connection. In the event that a Flow connects to a Published Data Source published as an output from a separate Flow, the connection would just be to a local Hyper extract. That's where the lineage would break down. Might be worth considering adding a separate field to denote the origin of the (underlying) extract connection as originating from a Prep flow, though.