microsoft / dbt-fabric

MIT License
79 stars 27 forks source link

dbt cannot work with tables with a security policy. #198

Open alexanderfreller opened 4 months ago

alexanderfreller commented 4 months ago

I would like to manage row-level security with dbt. The goal is to manage it in a way that is scalable and can be integrated with the git review process, even with many users. Unfortunately dbt cannot operate on tables that have a security policy attached.

Therefore the main question is: What is the recommended way to handle row-level security in Fabric with dbt?

Here is my attempt:

This is based on the dimension_employee example data available in Fabric and a single model: test_employee.sql:

with source as (
    select * from {{ source('dbo', 'dim_employee') }}
)

select * from source

Security policies can be created by using dbt post-hooks. This the the full dbt_project.yml:

name: "dbt_assets"
version: "1.0.0"
config-version: 2
profile: dev_afreller

models:
  dbt_assets:
    test_employee:
      +materialized: table
      +post-hook: 
        - |
          CREATE FUNCTION dbo.employee_securitypredicate(@PreferredName AS nvarchar(50))
              RETURNS TABLE
          WITH SCHEMABINDING
          AS
              RETURN SELECT 1 AS tvf_securitypredicate_result
          WHERE @PreferredName = 'Hudson' AND USER_NAME() = 'some_user';
        - |
          CREATE SECURITY POLICY PreferredNameFilter
          ADD FILTER PREDICATE dbo.employee_securitypredicate(PreferredName)
          ON dbo.test_employee
          WITH (STATE = ON);
        - |
          GRANT SELECT ON dbo.test_employee(PreferredName) to "some_user"

This successfully creates the function, security policy and grant when it is first executed. However when it is executed the second time this happens:

06:21:33  Running with dbt=1.7.15
06:21:34  Registered adapter: fabric=1.7.4
06:21:34  Found 1 model, 1 source, 0 exposures, 0 metrics, 428 macros, 0 groups, 0 semantic models
06:21:34  
06:21:37  Concurrency: 16 threads (target='out')
06:21:37  
06:21:37  1 of 1 START sql table model dbo.test_employee ................................. [RUN]
06:21:38  1 of 1 ERROR creating sql table model dbo.test_employee ........................ [ERROR in 1.68s]
06:21:38  
06:21:38  Finished running 1 table model in 0 hours 0 minutes and 4.83 seconds (4.83s).
06:21:38  
06:21:38  Completed with 1 error and 0 warnings:
06:21:38  
06:21:38    Database Error in model test_employee (models/test_employee.sql)
  ('42000', "[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Object 'dbo.test_employee' cannot be renamed because the object participates in enforced dependencies. (15336) (SQLMoreResults)")
06:21:38  
06:21:38  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

This happens because tables with enforced security polices cannot be renamed. It seems there is no way to drop a security policy before dbt tries to rename the table.

Even with pre-hooks it is not possible since they are only executed after renaming the tables. This was the unsuccessful attempt to drop the security policies:

      +pre-hook:
        - |
          DROP SECURITY POLICY IF EXISTS PreferredNameFilter
        - |
          DROP FUNCTION IF EXISTS dbo.employee_securitypredicate

Workaround:

One possible workaround would be to define a dbt operation that first drops all grants and then removes all security policies and functions. For example:

{% macro revoke_all_permissions() %}
{% set sql %}
    REVOKE SELECT ON dbo.test_employee FROM "some_user";
    DROP SECURITY POLICY IF EXISTS PreferredNameFilter;
    DROP FUNCTION IF EXISTS dbo.employee_securitypredicate;
{% endset %}

{% do run_query(sql) %}
{% endmacro %}

This works but has some significant drawbacks:

  1. The whole datawarehouse becomes unavailable to users while the new models are created. This could potentially be a significant amount of time for larger data warehouses.
  2. Great care needs to be taken when revoking the grants to not leak any information. If, for some reason, any grants remain when droping the security polices, then information would be leaked.
  3. Creating and dropping of grants and security polices happens in two different places. This makes it more difficult to review and easier to make mistakes and leak information.

This workaround could be improved by not hard-coding the objects to be dropped but instead loading the from the system tables. For example, loading the list of grants from the system table and dropping them all. This would make the solution a bit more robust but would not eliminate the drawbacks stated above.

Do you have any ideas to overcome these issues? What is the best way to handle row-level security in dbt-fabric?

prdpsvs commented 2 months ago

@alexanderfreller , your issue is addressed in V 1.8.7 release. The way table is renamed changed in v.1.8.7. You can use pre-hooks to drop the objects and use post hooks to create those objects and required permissions. I am attaching a screenshot that shows the difference between v.1.7.4 and v.1.8.7.

Please note that this change is introduced only in v.1.8.7.

image

alexanderfreller commented 2 months ago

Thank you for the update, I can indeed now use pre-hooks to drop any security policies attached to a table and use post-hooks to recreate the security policies. Unfortunately when I do that there is a short time were the old table still exists but the security policy is already deleted. Therefore this approach leaks data, if a user queries the table after the security policy is deleted and before the table is deleted (or renamed).

My config looks roughly like this:

models:
 dbt_assets:
   qlik_views:
     +schema: qlik_views
     +materialized: table
     qlik_views__company:
         +pre-hook:
         - |
           {% do drop_security_policies_of_current_relation() %}
         +post-hook:
         - |
           {% do setup_row_level_security([
             {"principal": "my_entra_id_group", "condition": "@my_column= 'some value'"},
           ]) %}

drop_security_policies_of_current_relation is a macro that gets all the security polices that apply to the current relation and drops them. setup_row_level_security is a macro that creates a function that filters the table rows based on a user or group and sets it as a filter predictate on the current table.

Therefore the questions are:

  1. Why is there a period of time where the security policy does not apply? My understanding was that pre- and post-hooks and creation of the new table happen inside of the same transaction.

  2. Can this be changed such that there is no data leak? Otherwise this feature cannot be used to handle security relevant tasks.

prdpsvs commented 2 months ago

@alexanderfreller, I spoke to DBT team, and I was informed that pre-hooks, model and post-hooks are executed in different transaction context. And by default, auto commit is turned on in Fabric and will anyway run each statement in separate transaction context.

However, you can run pre-hook, model and post hook macros wrapped up with BEGIN TRAN and COMMIT statements. You can run an explicit multi statement transaction and auto commit will be turned off during this transaction and will turn on after a commit/rollback statement is executed.

Would this work?

Ideally it should be possible for the adapter to wrap each model as an explicit transaction, and I am trying to figure out that option with DBT core team as I adding my comment. Hoping to provide an answer for you.

alexanderfreller commented 1 month ago

I tried to manually start the transaction in the pre-hook and commit it in the post hook as follows:

models:
  dbt_assets:
    qlik_views:
      +schema: qlik_views
      +materialized: table
      qlik_views__company:
          +pre-hook:
          - BEGIN TRANSACTION
          - |
            {% do drop_security_policies_of_current_relation() %}

          +post-hook:
          - |
            {% do setup_row_level_security([

              {"principal": "my_entry_group", "condition": "@some_column = 'somevalue'"},

            ]) %}
          - COMMIT

Is that what you meant?

This failed with the following error:

('42000', '[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Transaction failed because this DDL statement is not allowed inside a snapshot isolation transaction. 
Since metadata is not versioned, a metadata change can lead to inconsistency if mixed within snapshot isolation. (3964) (SQLExecDirectW)')

From the debug log I can see that the error happens when CREATE SECURITY POLICY ... is executed inside the setup_row_level_security.

I tried running ALTER DATABASE {{this.database}} SET ALLOW_SNAPSHOT_ISOLATION OFF as a pre-hook as well but that also does not work: ('42000', '[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]ALLOW_SNAPSHOT_ISOLATION is not supported for ALTER DATABASE. (15869) (SQLExecDirectW)')

Do you know how to get around this issue? Did you get a reply from the DBT team?