We are on spark 2.4, and using this ranger spark plugin. What we noticed is, after enabling the plugin, the partition pruning is not working as expected on spark tables. The entire table is getting scanned even though we mention filter on partition column. Once we disable the ranger plugin, it is working as expected and scanning only the given partition.
Can you please help to fix this issue.
Below is the test case:
withUser("bob") {
//spark.sql("CREATE TABLE IF NOT EXISTS DEFAULT.SRV_VIEW (NAME STRING, VALUE STRING, BATCH_ID STRING) USING PARQUET PARTITIONED BY (BATCH_ID)")
val df1 = spark.sql("select * from default.srv_view where batch_id=123");
df1.explain()
}
With RowFilter and Masking enabled:
== Physical Plan ==
(1) Filter (isnotnull(batch_id#2) && (cast(batch_id#2 as int) = 123))
+- (1) FileScan parquet default.srv_view[NAME#0,VALUE#1,BATCH_ID#2] Batched: true, Format: Parquet, Location: CatalogFileIndex[file:/Users/prashanth_sabbidi/IdeaProjects/IDF/SP-RATINGS-ACL-SparkRanger/spark-..., PartitionCount: 0, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<NAME:string,VALUE:string>
We are on spark 2.4, and using this ranger spark plugin. What we noticed is, after enabling the plugin, the partition pruning is not working as expected on spark tables. The entire table is getting scanned even though we mention filter on partition column. Once we disable the ranger plugin, it is working as expected and scanning only the given partition.
Can you please help to fix this issue.
Below is the test case:
With RowFilter and Masking enabled: == Physical Plan == (1) Filter (isnotnull(batch_id#2) && (cast(batch_id#2 as int) = 123)) +- (1) FileScan parquet default.srv_view[NAME#0,VALUE#1,BATCH_ID#2] Batched: true, Format: Parquet, Location: CatalogFileIndex[file:/Users/prashanth_sabbidi/IdeaProjects/IDF/SP-RATINGS-ACL-SparkRanger/spark-..., PartitionCount: 0, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<NAME:string,VALUE:string>
Without RowFilter and Masking:
== Physical Plan == *(1) FileScan parquet default.srv_view[NAME#0,VALUE#1,BATCH_ID#2] Batched: true, Format: Parquet, Location: PrunedInMemoryFileIndex[], PartitionCount: 0, PartitionFilters: [isnotnull(BATCH_ID#2), (cast(BATCH_ID#2 as int) = 123)], PushedFilters: [], ReadSchema: struct<NAME:string,VALUE:string>