yaooqinn / spark-ranger

已经合入(apache/incubator-kyuubi) ACL Management for Apache Spark SQL with Apache Ranger.
https://yaooqinn.github.io/spark-ranger/
Apache License 2.0
54 stars 56 forks source link

Partition pruning not working with Ranger Spark Plugin #22

Closed spreddy closed 4 years ago

spreddy commented 4 years ago

We are on spark 2.4, and using this ranger spark plugin. What we noticed is, after enabling the plugin, the partition pruning is not working as expected on spark tables. The entire table is getting scanned even though we mention filter on partition column. Once we disable the ranger plugin, it is working as expected and scanning only the given partition.

Can you please help to fix this issue.

Below is the test case:

withUser("bob") {
      //spark.sql("CREATE TABLE IF NOT EXISTS DEFAULT.SRV_VIEW (NAME STRING, VALUE STRING, BATCH_ID STRING) USING PARQUET PARTITIONED BY (BATCH_ID)")
      val df1 = spark.sql("select *  from default.srv_view where batch_id=123");
      df1.explain()
    }

With RowFilter and Masking enabled: == Physical Plan == (1) Filter (isnotnull(batch_id#2) && (cast(batch_id#2 as int) = 123)) +- (1) FileScan parquet default.srv_view[NAME#0,VALUE#1,BATCH_ID#2] Batched: true, Format: Parquet, Location: CatalogFileIndex[file:/Users/prashanth_sabbidi/IdeaProjects/IDF/SP-RATINGS-ACL-SparkRanger/spark-..., PartitionCount: 0, PartitionFilters: [], PushedFilters: [], ReadSchema: struct<NAME:string,VALUE:string>

Without RowFilter and Masking:

ext.injectOptimizerRule(RangerSparkAuthorizerExtension)
    //ext.injectOptimizerRule(RangerSparkRowFilterExtension)
    //ext.injectOptimizerRule(RangerSparkMaskingExtension)
    //ext.injectPlannerStrategy(RangerSparkPlanOmitStrategy)

== Physical Plan == *(1) FileScan parquet default.srv_view[NAME#0,VALUE#1,BATCH_ID#2] Batched: true, Format: Parquet, Location: PrunedInMemoryFileIndex[], PartitionCount: 0, PartitionFilters: [isnotnull(BATCH_ID#2), (cast(BATCH_ID#2 as int) = 123)], PushedFilters: [], ReadSchema: struct<NAME:string,VALUE:string>

spreddy commented 4 years ago

Just realized its a duplicate of issue # 17

spreddy commented 4 years ago

Duplicate of issue#17

yaooqinn commented 4 years ago

thanks, feel free to ask for help from the apache submarine team later.