If this is not a correct understanding of what we are trying to do, let's sort that here.
It is a goal of the Vault to make it easy for providers to designate datasets for use by specific applications in specific ways: what tools have access to what tables, what calculations have access to what columns, and what user roles have access to what rows and what derived data based on data lineage.
For the provider-side demo, sample datasets will consist of rows of fictitious companies with various sample metrics. Queries of these metrics can return data lineage (i.e., for each data element, which table did the element come from, granted by what permission). Calculations based on elements producing derived data can also provide derived lineage. For example:
Unmodified (or trivially modified) data passes up the lineage of the table from which it came with the permission that was used to access the data. Tools that do not have the permissions at least as strong as those referenced in the lineage should report an error when attempting to access such data.
Data selected via a waterfall selection passes up the lineage of the table from which the data was selected, along with the permission used to access that data. The lineage of the branch(es) not selected are not passed up. Tools that do not have the permissions at least as strong as those referenced in the surviving lineage should report an error when attempting to access such data.
A calculation that transforms data from one or more sources passes up a lineage that is the union of all sources. Alternatively, if the calculator is granted permission to issue its own lineage, it can declare a new lineage based on having performed that authorized calculation.
Others?
Providers can describe what columns are accessible to what calculations (which are somehow described in a consistent fashion between tool and provider). Column access so granted is part of the data lineage of access to data elements in that column.
Providers can describe what rows are accessible to what user roles (which are somehow described in a consistent fashion between tool and user authentication/authorization system). Row access so granted is part of the data lineage of access to data elements in that row.
Whatever rule first grants access to a given data element (i.e., a tool can mark a given calculation as public, a row-based rule permits access to all columns, or a column-based rule grants access to the column of an accessible row), that rule is the sole basis of the lineage for that element. This does not preclude the fact that successful access to multiple data elements as part of a calculation needs to either union the lineage of the source elements or declare its own lineage fact that it is authorized to issue for that calculation.
The purpose of this demonstration is two-fold:
Demonstrate how providers can provision data and describe permissions granted to tools and user roles
Demonstrate how data access management and data lineage provide both technical means to restrict and technical means to audit data access
For discussion
If this is not a correct understanding of what we are trying to do, let's sort that here.
It is a goal of the Vault to make it easy for providers to designate datasets for use by specific applications in specific ways: what tools have access to what tables, what calculations have access to what columns, and what user roles have access to what rows and what derived data based on data lineage.
For the provider-side demo, sample datasets will consist of rows of fictitious companies with various sample metrics. Queries of these metrics can return data lineage (i.e., for each data element, which table did the element come from, granted by what permission). Calculations based on elements producing derived data can also provide derived lineage. For example:
Providers can describe what columns are accessible to what calculations (which are somehow described in a consistent fashion between tool and provider). Column access so granted is part of the data lineage of access to data elements in that column.
Providers can describe what rows are accessible to what user roles (which are somehow described in a consistent fashion between tool and user authentication/authorization system). Row access so granted is part of the data lineage of access to data elements in that row.
Whatever rule first grants access to a given data element (i.e., a tool can mark a given calculation as public, a row-based rule permits access to all columns, or a column-based rule grants access to the column of an accessible row), that rule is the sole basis of the lineage for that element. This does not preclude the fact that successful access to multiple data elements as part of a calculation needs to either union the lineage of the source elements or declare its own lineage fact that it is authorized to issue for that calculation.
The purpose of this demonstration is two-fold:
@HeatherAck @LeylaJavadova