Write column lineage to query metadata

From a conversation today with @lloydtabb --

For any given field in a query, Malloy understands what were all the inputs to that field, and thus we can fully understand the lineage of every column. We should write this lineage to query metadata.

For example, if we have a source:

source: order_items is order_items_table + {
  primary_key: id

  join_one:
    users with user_id

  measure:
    revenue_per_user is sale_price.sum() / count(distinct users.id)

and a query:

query: order_items -> {
  group_by: created_month is created_at.month
  aggregate: revenue_per_user
}

there should be metadata associated with each column in the resultset that describes the lineage of that column.

In this example, created_month comes from order_items.created_at, and revenue_per_user comes from order_items.sale_price and order_items.users.id. This is a simple example, but if one or more of those fields was another complex field that had multiple inputs, then this lineage forms a tree structure.

A few examples of tools that support this:

Coalesce -- dbt competitor
SQLMesh -- another dbt competitor
Datafold -- data observability platform

malloydata / malloy

Write column lineage to query metadata #1141