For any given field in a query, Malloy understands what were all the inputs to that field, and thus we can fully understand the lineage of every column. We should write this lineage to query metadata.
For example, if we have a source:
source: order_items is order_items_table + {
primary_key: id
join_one:
users with user_id
measure:
revenue_per_user is sale_price.sum() / count(distinct users.id)
there should be metadata associated with each column in the resultset that describes the lineage of that column.
In this example, created_month comes from order_items.created_at, and revenue_per_user comes from order_items.sale_price and order_items.users.id. This is a simple example, but if one or more of those fields was another complex field that had multiple inputs, then this lineage forms a tree structure.
I'm not sure @carlineng, but it seems like many databases (e.g., DuckDB, Snowflake) have a COMMENT ON SQL statement that would allow this column lineage information to be added to the metadata for the column, too.
From a conversation today with @lloydtabb --
For any given field in a query, Malloy understands what were all the inputs to that field, and thus we can fully understand the lineage of every column. We should write this lineage to query metadata.
For example, if we have a source:
and a query:
there should be metadata associated with each column in the resultset that describes the lineage of that column.
In this example,
created_month
comes fromorder_items.created_at
, andrevenue_per_user
comes fromorder_items.sale_price
andorder_items.users.id
. This is a simple example, but if one or more of those fields was another complex field that had multiple inputs, then this lineage forms a tree structure.A few examples of tools that support this: