trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.43k stars 3k forks source link

Delta connector allows circumventing security controls #22935

Open schaffino opened 3 months ago

schaffino commented 3 months ago

I'm deploying trino with the hive and delta connectors into what i plan to be a shared read write cluster for several use cases. I'm using access control to grant users read and write on specific catalogs, schemas and tables, and allowing privileged access to certain schemas only in isolation for certain accounts. In this setup, the nodes have access to all the underlying storage locations, and access control is intended to prevent users accessing data not appropriate to them. Hive has a concept of external tables, where a user can register a table whose location can be arbitrary. This is a security issue for my kind of setup, and it seems this has been thought about with trino as the following options are configurable and prevent trino users from registering tables at arbitrary storage locations, so data only ever resides in the storage defined on the schema.

hive.non-managed-table-creates-enabled=false
hive.non-managed-table-writes-enabled=false
hive.allow-register-partition-procedure=false

If we enable the above settings, we can still allow users to create tables, without worrying about writing out data to storage where it doesn't belong or reading data from storage they shouldn't have access to.

In the case of delta lake, it seems this was initially thought about with the following setting

delta.register-table-procedure.enabled=false

But the delta catalog still allows specifying explicit storage locations in a table create definition with no mitigation I.E.

CREATE TABLE delta_lake.some_schema.my_sneaky_table (my_sensitive_data VARCHAR) WITH ( location = 'abfs://very_public@allaccess.dfs.core.windows.net/nothing_to_see_here', ) AS SELECT 1 a, 2 b, 3 c;

I think there should be some options to disallow explicitly specifying location in table definitions in the delta catalog to provide parity with hive, or at least some options to add checks that the location is within the default location of the schema. Otherwise, if you don't want a malicious user to either write to a storage location that is not secure, or read from a storage location they shouldn't have access to, the only option is to remove table create permissions in the delta catalog

wendigo commented 3 months ago

cc @ebyhr @findinpath