prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.97k stars 5.35k forks source link

Fix Issue with Column Masking #18937

Open sriramkotilil opened 1 year ago

sriramkotilil commented 1 year ago

We have identified an issue with Presto code where columns masked using AWS Lake Formation was still visible when the table was queried through Presto.

This particular issue was found in a scenario in which we are connecting to Amazon S3 using glue as hive metastore and hive-hadoop2 as connector and 1 or more column in a table has been restricted for a user using AWS Lake Formation. Issue : If we are querying a table in which 1 or more columns are masked using AWS Lakeformation, the masked column name is getting removed from the results, but the masked data is coming under the wrong column and original column data is missing as well. For Example : We have a table 'Persons' with 5 columns (personid , lastname , firstname , address , city) Original Data : select from "Persons"; personid | lastname | firstname | address | city ----------+-----------+-----------+---------+------ 1 | Doe | John | abc | TVM 2 | Smith | Rachel | def | EKM 3 | Hernandez | Manuel | ghi | TSR 4 | Emmanuel | Aria | jkl | CLT (4 rows) Using AWS Lake Formation we are restricting (masking) the access to the column 'address' for a particular user, then if we are calling the below query to fetch all columns we are expecting the result without address column and data as below, Expected Result : select from "Persons"; personid | lastname | firstname | city ----------+-----------+-----------+------ 3 | Hernandez | Manuel | TSR 2 | Smith | Rachel | EKM 4 | Emmanuel | Aria | CLT 1 | Doe | John | TVM (4 rows) But in the actual result we are getting the data of the address column instead of city column Actual Results : select * from "Persons"; personid | lastname | firstname | city ----------+-----------+-----------+------ 1 | Doe | John | abc 3 | Hernandez | Manuel | ghi 2 | Smith | Rachel | def 4 | Emmanuel | Aria | jkl (4 rows)

prestored commented 1 year ago

@sriramkotilil Could you share the DDL used along with glue catalog config.

imjalpreet commented 1 year ago

Hi @sriramkotilil Currently, Presto is not integrated with AWS Lake Formation. So, it will not respect any policies that are defined in AWS Lake Formation. We are planning to add the support in Presto and we will soon be creating an issue with all the details around the integration. I will link this issue once the new issue is created.