I'm deploying trino with the hive and delta connectors into what i plan to be a shared read write cluster for several use cases. I'm using access control to grant users read and write on specific catalogs, schemas and tables, and allowing privileged access to certain schemas only in isolation for certain accounts. In this setup, the nodes have access to all the underlying storage locations, and access control is intended to prevent users accessing data not appropriate to them. Hive has a concept of external tables, where a user can register a table whose location can be arbitrary. This is a security issue for my kind of setup, and it seems this has been thought about with trino as the following options are configurable and prevent trino users from registering tables at arbitrary storage locations, so data only ever resides in the storage defined on the schema.
If we enable the above settings, we can still allow users to create tables, without worrying about writing out data to storage where it doesn't belong or reading data from storage they shouldn't have access to.
In the case of delta lake, it seems this was initially thought about with the following setting
delta.register-table-procedure.enabled=false
But the delta catalog still allows specifying explicit storage locations in a table create definition with no mitigation I.E.
CREATE TABLE delta_lake.some_schema.my_sneaky_table (my_sensitive_data VARCHAR)
WITH (
location = 'abfs://very_public@allaccess.dfs.core.windows.net/nothing_to_see_here',
)
AS SELECT 1 a, 2 b, 3 c;
I think there should be some options to disallow explicitly specifying location in table definitions in the delta catalog to provide parity with hive, or at least some options to add checks that the location is within the default location of the schema. Otherwise, if you don't want a malicious user to either write to a storage location that is not secure, or read from a storage location they shouldn't have access to, the only option is to remove table create permissions in the delta catalog
I'm deploying trino with the hive and delta connectors into what i plan to be a shared read write cluster for several use cases. I'm using access control to grant users read and write on specific catalogs, schemas and tables, and allowing privileged access to certain schemas only in isolation for certain accounts. In this setup, the nodes have access to all the underlying storage locations, and access control is intended to prevent users accessing data not appropriate to them. Hive has a concept of external tables, where a user can register a table whose location can be arbitrary. This is a security issue for my kind of setup, and it seems this has been thought about with trino as the following options are configurable and prevent trino users from registering tables at arbitrary storage locations, so data only ever resides in the storage defined on the schema.
If we enable the above settings, we can still allow users to create tables, without worrying about writing out data to storage where it doesn't belong or reading data from storage they shouldn't have access to.
In the case of delta lake, it seems this was initially thought about with the following setting
delta.register-table-procedure.enabled=false
But the delta catalog still allows specifying explicit storage locations in a table create definition with no mitigation I.E.
CREATE TABLE delta_lake.some_schema.my_sneaky_table (my_sensitive_data VARCHAR) WITH ( location = 'abfs://very_public@allaccess.dfs.core.windows.net/nothing_to_see_here', ) AS SELECT 1 a, 2 b, 3 c;
I think there should be some options to disallow explicitly specifying location in table definitions in the delta catalog to provide parity with hive, or at least some options to add checks that the location is within the default location of the schema. Otherwise, if you don't want a malicious user to either write to a storage location that is not secure, or read from a storage location they shouldn't have access to, the only option is to remove table create permissions in the delta catalog