Open RamSinha opened 5 years ago
so i just created an user in hadoop side and one user with same name is created in ranger. But the policies enforced in ranger aren't being applied. I am kinda of confused about how the proxy-user is recognized by ranger. whats the logical mapping between two users.
you can either user proxy user or login user, if you specify --proxy-user UserA , the runtime sparkUser will be UserA, otherwise it will use the user part of spark.yarn.principal configuration. If you are using other authentication method, just pay attention to the value of SparkContext.sparkUser
Thanks for reply.
In our case we are running spark on AWS EMR, spark.yarn.principal configuration
is not set anywhere.
scala> spark.conf.get("spark.yarn.principal") java.util.NoSuchElementException: spark.yarn.principal at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1992) at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1992) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.internal.SQLConf.getConfString(SQLConf.scala:1992) at org.apache.spark.sql.RuntimeConfig.get(RuntimeConfig.scala:74) ... 54 elided
But the setting is not able to enforce ranger policies for this user.
We have enable all the xml settings as mentioned in the blog.
and using below command to start shell.
spark-shell --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://127.0.0.1:10000/default" --driver-java-options="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -Dlog4j.configuration=file:/home/hadoop/optimus-log4j.properties" --jars /home/hadoop/spark-authorizer-2.2.0.jar --proxy-user ram
Any pointer would be really helpful.
https://yaooqinn.github.io/spark-authorizer/docs/install_plugin.html please follow this doc to setup
Thanks for the pointer, BTW we are already using above installation guidelines.
Just one update:
We are on below versions.
spark 2.4 hive 2.3.4 ranger 0.7.1
Would it cause any problems?
for spark 2.4 https://github.com/yaooqinn/spark-authorizer/pull/14 for hive 2.3.4 ensure the built in hive metastore client of spark has no incompatibility issues with it for ranger 0.7.1 ,you may need to fix incompatibility issues in the ranger-hive-plugin module
Thanks for the pointers, i am looking the ranger-hive-plugin module now. One question: In the setup document mentioned above there is no mention of ranger-hive-plugin installation. Does that mean- we don't need to install ranger-hive-plugin-module separately?
Just follow the doc's section 2 Applying Plugin to Apache Spark
Still no luck. I build new EMR for spark2.3 now and followed all the instruction. Though i don't see any error but still the policies aren't being applied. Also on the ranger UI under audit section i don't see anything.
Also: When i tried installing ranger-hive-plugin on different EMR using below link Ranger+Installation+Guide Policies are being enforced on hive queries (started from hive cli).
When i try to build locally i get below warning.
Authorizable.scala:57: fruitless type test: a value of type org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener cannot also be a org.apache.spark.sql.hive.HiveExternalCatalog [WARNING] case _: HiveExternalCatalog =>
Also when i tried to run the same code from the spark-shell i get below error.
error: not found: type HiveExternalCatalog method.invoke(externalCatalog).asInstanceOf[HiveExternalCatalog]
Seems like from spark-shell its not able to access package private class HiveExternalCatalog
Describe the bug A clear and concise description of what the bug is.
To Reproduce Steps to reproduce the behavior:
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Additional context Add any other context about the problem here.