prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.77k stars 5.29k forks source link

Table structure in Presto select query is not proper when columns are masked using AWS Lake Formation policy #18740

Open tbinoy opened 1 year ago

tbinoy commented 1 year ago

In the below AWS table , the column "address" is masked by AWS Lake Formation policy. Then the select query from presto is hiding the column "address" but values are seen under next column "city" presto:isl-lhservice-db> select * from "/Persons"; personid | lastname | firstname | address | city ----------+-----------+-----------+---------+------ 1 | Doe | John | abc | TVM
2 | Smith | Rachel | def | EKM
3 | Hernandez | Manuel | ghi | TSR
4 | Emmanuel | Aria | jkl | CLT

(4 rows)

After masking the column address, the column city is shifted to address column

presto:isl-lhservice-db> select * from "/Persons"; personid | lastname | firstname | city ----------+-----------+-----------+------ 2 | Smith | Rachel | def
3 | Hernandez | Manuel | ghi
4 | Emmanuel | Aria | jkl
1 | Doe | John | abc

(4 rows)

Note : Reference : https://docs.aws.amazon.com/lake-formation/latest/dg/limitations.html The above documentation says : Although Lake Formation makes available metadata about column permissions to integrated services, the actual filtering of columns in query responses is the responsibility of the integrated service Presto needs to handle the Lake Formation policies .

imjalpreet commented 1 year ago

Hi @tbinoy, Presto doesn't fully support Lake Formation out of the box yet. There are changes required to add complete support.

But AWS Glue APIs are internally integrated with AWS Lake Formation(Database, Table and Column level permissions only), due to which when Presto interacts with them they would return only the columns that the IAM User/Role has permission to based on LF policies. Since Presto is not integrated yet with Lake Formation completely, the required checks are not happening at that time data is being read from S3 and the data is returned to the client.

Presto by default reads the columns by order and thus you are seeing the values of the address column in the city column. There is a way you can overcome this by making Presto access columns by name rather than order (if it's possible based on the table format you are using) but the ideal solution would be once Lake Formation support is implemented in Presto.

Although Lake Formation makes available metadata about column permissions to integrated services, the actual filtering of columns in query responses is the responsibility of the integrated service Presto needs to handle the Lake Formation policies .

That's right, this support needs to be added in Presto.