yaooqinn / spark-authorizer

A Spark SQL extension which provides SQL Standard Authorization for Apache Spark | This repo is contributed to Apache Kyuubi | 项目已迁移至 Apache Kyuubi
https://yaooqinn.github.io/spark-authorizer/
Apache License 2.0
172 stars 79 forks source link

Please suggest: Where do we need to create --proxy-user? is it required to be created in Ranger UI only or under hadoop as well? #26

Open RamSinha opened 5 years ago

RamSinha commented 5 years ago

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. Configurations
  2. Environments
  3. Operations
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

RamSinha commented 5 years ago

so i just created an user in hadoop side and one user with same name is created in ranger. But the policies enforced in ranger aren't being applied. I am kinda of confused about how the proxy-user is recognized by ranger. whats the logical mapping between two users.

yaooqinn commented 5 years ago

you can either user proxy user or login user, if you specify --proxy-user UserA , the runtime sparkUser will be UserA, otherwise it will use the user part of spark.yarn.principal configuration. If you are using other authentication method, just pay attention to the value of SparkContext.sparkUser

RamSinha commented 5 years ago

Thanks for reply. In our case we are running spark on AWS EMR, spark.yarn.principal configuration is not set anywhere.

scala> spark.conf.get("spark.yarn.principal") java.util.NoSuchElementException: spark.yarn.principal at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1992) at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1992) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.internal.SQLConf.getConfString(SQLConf.scala:1992) at org.apache.spark.sql.RuntimeConfig.get(RuntimeConfig.scala:74) ... 54 elided

But the setting is not able to enforce ranger policies for this user. We have enable all the xml settings as mentioned in the blog. and using below command to start shell. spark-shell --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://127.0.0.1:10000/default" --driver-java-options="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -Dlog4j.configuration=file:/home/hadoop/optimus-log4j.properties" --jars /home/hadoop/spark-authorizer-2.2.0.jar --proxy-user ram

Any pointer would be really helpful.

yaooqinn commented 5 years ago

https://yaooqinn.github.io/spark-authorizer/docs/install_plugin.html please follow this doc to setup

RamSinha commented 5 years ago

Thanks for the pointer, BTW we are already using above installation guidelines. Just one update: We are on below versions. spark 2.4 hive 2.3.4 ranger 0.7.1

Would it cause any problems?

yaooqinn commented 5 years ago

for spark 2.4 https://github.com/yaooqinn/spark-authorizer/pull/14 for hive 2.3.4 ensure the built in hive metastore client of spark has no incompatibility issues with it for ranger 0.7.1 ,you may need to fix incompatibility issues in the ranger-hive-plugin module

RamSinha commented 5 years ago

Thanks for the pointers, i am looking the ranger-hive-plugin module now. One question: In the setup document mentioned above there is no mention of ranger-hive-plugin installation. Does that mean- we don't need to install ranger-hive-plugin-module separately?

yaooqinn commented 5 years ago

Just follow the doc's section 2 Applying Plugin to Apache Spark

RamSinha commented 5 years ago

Still no luck. I build new EMR for spark2.3 now and followed all the instruction. Though i don't see any error but still the policies aren't being applied. Also on the ranger UI under audit section i don't see anything.

Also: When i tried installing ranger-hive-plugin on different EMR using below link Ranger+Installation+Guide Policies are being enforced on hive queries (started from hive cli).

RamSinha commented 5 years ago

When i try to build locally i get below warning. Authorizable.scala:57: fruitless type test: a value of type org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener cannot also be a org.apache.spark.sql.hive.HiveExternalCatalog [WARNING] case _: HiveExternalCatalog =>

Also when i tried to run the same code from the spark-shell i get below error. error: not found: type HiveExternalCatalog method.invoke(externalCatalog).asInstanceOf[HiveExternalCatalog]

Seems like from spark-shell its not able to access package private class HiveExternalCatalog