yaooqinn / spark-authorizer

A Spark SQL extension which provides SQL Standard Authorization for Apache Spark | This repo is contributed to Apache Kyuubi | 项目已迁移至 Apache Kyuubi
https://yaooqinn.github.io/spark-authorizer/
Apache License 2.0
171 stars 79 forks source link

java.lang.NoSuchMethodError: org.apache.spark.sql.internal.SharedState.externalCatalog()Lorg/apache/spark/sql/catalyst/catalog/ExternalCatalog; #3

Open NithK45 opened 6 years ago

NithK45 commented 6 years ago

Describe the bug I am running standalone spark version 2.3 in an ec2-instance. And i have Hive standalone on the same instance, ranger-hive-plugin is setup and policies are working fine with hive connection.

I carefully followed your instructions to setup ranger for spark-sql. Only thing i did not perform is modifying ExpermientalMethods.scala presuming it is not required for testing.

Also, i built the spark-authorizer jar using "maven clean package -Pspark-2.3"

To Reproduce Steps to reproduce the behavior:

  1. copied ranger.xml conf files to spark_home/conf, copied ranger-hive.jars to spark_home/jars along with spark-authorizer jar. gave full permissions to all xml and jar files. Modified conf xml files as advised.
  2. Environments are spark standalone, hive, ranger. All are on ec2.
  3. Tried running show databases command in spark-shell,
  4. See error image

Anything i missed?

My final goal is that spark-sql will be used from different sql clients such as "squirrel sql client" or "cassandra" etc. And hive policies should be enforced when they query the data. All clients will connect to spark-sql using a string that looks like jdbc:hive2://hostname:10015/databasename;ssl=true;sslTrustStore=/pathtofile.jks;trustStorePassword=abcd

yaooqinn commented 6 years ago

sorry…master branch is not stable yet for most of cases I verified are agaist Spark 2.1.2. I would love to have it fixed as soon as my vacation end. For now,you may switch to branch 2.3 or use the package i deployed for 2.3,more details in branch 2.3 readme

yaooqinn commented 6 years ago

@NithK45 I have it tested by spark-shell on yarn using maven clean package -Pspark-2.1 and it works fine with spark 2.1.2/2.2.1/2.3.0. I don't have an standalone env for testing, but i guess you are using cluster mode and standalone seems don't have an module like 'Yarn Distribute Cache' for deploy jars, which may be manually copied to all work nodes.

NithK45 commented 6 years ago

@yaooqinn Thanks for testing that. I am using a ec2 large single node containing standalone spark 2.3, hive 2.4 setup. Data is in s3. We don't have yarn and hadoop setup. I copied your jar into spark_home/jars/ directory, restarted spark and hive services.

I also modified your code to comment out the validation that is causing the above error and rebuilt and tested, it did not throw the error this time but no enforcement of ranger policies happened from spark-shell.

I would also like to know whether you tested this by connecting to spark-sql through a SQL client (like Squirrel sql client or SQL developer tools) with a jdbc connection string?

yaooqinn commented 6 years ago

For SQL clients, I use Kyuubi, which is a multi-tenant JDBC/ODBC Server powered by Spark SQL

jacibreiro commented 5 years ago

Hi!

Same error here... NoSuchMethodError :-(. Is there any solution? Thanks!

yaooqinn commented 5 years ago

@jacibreiro please try v2.1.1 and follow the doc https://yaooqinn.github.io/spark-authorizer/docs/install_plugin.html

jacibreiro commented 5 years ago

@yaooqinn, thanks for you quick answer. I'm using v2.1.1 (builded from master branch)... I'm also using hive 2.3.2, spark 2.4 and ranger 1.2... Maybe the problem is the ranger version? Have you tested ranger versions higher than 0.5?

jacibreiro commented 5 years ago

With pyspark shell I don't obtain the NoSuchMethodError, but still it doesn't work... I have followed all the steps of the manual but it seems that the plugin is not connecting with ranger (maybe because of the version issue I comment in the previous post). I think is not connecting because I can't see the policy cached. With hive there is a script to enable the plugin, but here I don't know when the communication between spark and ranger start...:-S Maybe is there any extra step that is not in the documentation? By the way, this is my ranger-hive-security-xml:

ranger.plugin.hive.policy.rest.url http://ranger-admin:6080 ranger.plugin.hive.service.name cl1_hive ranger.plugin.hive.policy.cache.dir /tmp/cl1_hive/policycache ranger.plugin.hive.policy.pollIntervalMs 5000 ranger.plugin.hive.policy.source.impl org.apache.ranger.admin.client.RangerAdminRESTClient

Do you see something wrong?

Thanks!

yaooqinn commented 5 years ago

@jacibreiro you are right. Higher versions of ranger are built with higher hive client jars than spark(1.2.1). We may have it fixed in https://issues.apache.org/jira/browse/RANGER-2128 later

jacibreiro commented 5 years ago

@yaooqinn I have used and old ranger version (0.5.3) but still doesn't work... I don't see anything either in the policy cache dir nor in the audit plugins sheet (in ranger). So it seems to be something related with communication between spark and ranger because is not able to load the policies. I have followed all the steps described in https://yaooqinn.github.io/spark-authorizer/docs/install_plugin.html . Watching my ranger-hive-security-xml (previous post) Do you miss something?

yaooqinn commented 5 years ago

@jacibreiro could you please detail about "still doesn't work..."

jacibreiro commented 5 years ago

sure @yaooqinn, I mean that I have followed every single step in the manual:

yaooqinn commented 5 years ago

Maybe you should check the Ranger Admin is reachable first and let's start with spark-sql script to see if there is anything went wrong.

alcpinto commented 5 years ago

@yaooqinn

I am getting the same error described in this thread.

image

Installation details:

image

Without spark-authorizer I can connect with any problem.

I followed your steps but maybe I am missing something... Do you have any advice?

Thanks!

yaooqinn commented 5 years ago

Hi @alcpinto

Spark 2.4 is not supported by now. I proposed a pull request #14 to support 2.4.

You can build that branch with cmd mvn clean package -Pspark-2.4 and try again

Neilxzn commented 5 years ago

Hi @yaooqinn I am try it in spark.2.4, too. But it doesn't work out. I use superuser hadoop to use any databases, it always throw Permission error. And In spark.2.2, it work!

Neilxzn commented 5 years ago

hi @alcpinto , do you solve the problem about spark2.4? how to solve it?

alcpinto commented 5 years ago

Hi Yaooqinn,

Unfortunately is not working for me. I needed to shift to another project and I might return to this subject in two weeks time.

However, I couldn't figure out how to fix it...

Abilio Pinto

On Tue, 5 Mar 2019 at 03:54, Neil notifications@github.com wrote:

hi @alcpinto https://github.com/alcpinto , do you solve the problem about spark2.4? how to solve it?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yaooqinn/spark-authorizer/issues/3#issuecomment-469527017, or mute the thread https://github.com/notifications/unsubscribe-auth/AOZvGAed97aUcEeW4BuxetPhAeC36gPqks5vTeppgaJpZM4UleQA .

chaogefeng commented 5 years ago

我这边测试是可以的 spark2.4.3 hive2.3.3 ranger1.1.0 把patch 合并进去就不报这个错误了