DurgaChill commented 4 years ago

Hi,

I have installed spark-2.3.0-bin-hadoop2.7, Hive-3.1.2 and ranger-2.0.0 locally. I have established a connection between spark thrift server and Hive metastore before enabling spark ranger plugin. Now I have followed the same steps as mentioned in the Readme along with the known issues for ranger-2.0.0. But after starting thriftserver, when I am running spark beeline shell, I can see an exception in beeline:

2020-04-09 11:17:46 ERROR HiveConnection:593 - Error opening session org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376) at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453) at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435) at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:156) at org.apache.hive.service.cli.thrift.TCLIService$Client.OpenSession(TCLIService.java:143) at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:583) at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:192) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:208) at org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:142) at org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:207) at org.apache.hive.beeline.Commands.connect(Commands.java:1149) at org.apache.hive.beeline.Commands.connect(Commands.java:1070) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:52) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:970) at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:813) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:771) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:484) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:467) Error: Could not establish connection to jdbc:hive2://localhost:10000: null (state=08S01,code=0) 0: jdbc:hive2://localhost:10000 (closed)>

After digging into thrifserver logs, I found the exception -

enableTagEnricherWithLocalRefresher: false, disableTrieLookupPrefilter: false, optimizeTrieForRetrieval: false, cacheAuditResult: false } 2020-04-09 11:17:46 INFO AuditProviderFactory:493 - RangerAsyncAuditCleanup: Waiting to audit cleanup start signal Exception in thread "HiveServer2-Handler-Pool: Thread-76" java.lang.ExceptionInInitializerError at org.apache.spark.sql.catalyst.optimizer.RangerSparkAuthorizerExtension.apply(RangerSparkAuthorizerExtension.scala:62) at org.apache.spark.sql.catalyst.optimizer.RangerSparkAuthorizerExtension.apply(RangerSparkAuthorizerExtension.scala:36) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84) at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124) at scala.collection.immutable.List.foldLeft(List.scala:84) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76) at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:66) at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:66) at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3248) at org.apache.spark.sql.Dataset.(Dataset.scala:190) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694) at org.apache.spark.sql.hive.thriftserver.SparkSQLSessionManager.openSession(SparkSQLSessionManager.scala:68) at org.apache.hive.service.cli.CLIService.openSessionWithImpersonation(CLIService.java:202) at org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:351) at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:246) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1253) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1238) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011) at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006) at org.apache.ranger.plugin.service.RangerBasePlugin.init(RangerBasePlugin.java:223) at org.apache.ranger.authorization.spark.authorizer.RangerSparkPlugin.init(RangerSparkPlugin.scala:39) at org.apache.ranger.authorization.spark.authorizer.RangerSparkPlugin$Builder.getOrCreate(RangerSparkPlugin.scala:71) at org.apache.ranger.authorization.spark.authorizer.RangerSparkAuthorizer$.(RangerSparkAuthorizer.scala:43) at org.apache.ranger.authorization.spark.authorizer.RangerSparkAuthorizer$.(RangerSparkAuthorizer.scala) ... 34 more

Any thought on this ?? Any help would be much appreciated. Thanks in advance.

Note: I have installed ranger-hive plugin(ranger and hiveserver2) before with same setup and it works perfectly fine. Now I want ranger to interact with spark thrift server with an underlying hive metastore.

yaooqinn commented 4 years ago

Does Spark community support hive 3.1 since 2.3？

DurgaChill commented 4 years ago

I can interact with Hive metastore via spark thrift server(spark-beeline shell) with this set up before spark-ranger plugin. So I assumed that spark 2.3 can connect to Hive meta store v-3.1.2.

I replaced the spark-ranger jar with original-spark-ranger jar, now I am facing another issue:

Exception in thread "HiveServer2-Handler-Pool: Thread-99" java.lang.NoClassDefFoundError: org/apache/ranger/plugin/policyengine/RangerAccessResultProcessor

Any thoughts on this ?

One more question: Does spark-authorizer work in this use case ? connecting to an external hive metastore ?

Apologies for mulitple questions. Thanks in advance :)

yaooqinn commented 4 years ago

spark-authorizer is limited to ranger-0.5.3 and spark with built-in hive(1.2.1.spark2)

ExceptionInInitializerError is more likely related to another NoClassDefFoundError.

Maybe you can try the Apache submarine spark security plugin as you can find in README. This project is about to be archived and many fixes are far behind here.

If you have any questions, send them to the Apache submarine JIRA or the user mailing list will be a better choice.

MobinRanjbar commented 4 years ago

You do not need to replace the spark-ranger.jar. Let it be there. Did you copy spark-ranger.jar to all machines? By the way, that is not your root exception. please provide full ERROR level exception in the thriftserver logs.

DurgaChill commented 4 years ago

Hi,

Thanks for your reply.

When I build this project, I can see two jar files - spark-ranger.jar and original-spark-ranger.jar. I tried with both of these jars.

Attaching my thriftserver logs. Kindly let me know if there is anything that can help me (this log is for original-spark-ranger.jar). My first comment was for spark-ranger.jar You can find the entire exception there. Let me know if there is anything you need to know from my end.

Also, I have started hivemetastore before starting thriftserver. Thanks in advance.

Spark Command: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.222.b10-1.fc31.x86_64/jre/bin/java -cp /home/DURGACHILLAKURU/spark-2.3.0-bin-hadoop2.7/conf/:/home/DURGACHILLAKURU/spark-2.3.0-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.SparkSubmit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift JDBC/ODBC Server spark-internal

2020-04-09 12:23:02 WARN Utils:66 - Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 10.0.0.128 instead (on interface wlp4s0) 2020-04-09 12:23:02 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address 2020-04-09 12:23:02 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2020-04-09 12:23:02 INFO HiveThriftServer2:2608 - Started daemon with process name: 31689@localhost.localdomain 2020-04-09 12:23:02 INFO SignalUtils:54 - Registered signal handler for TERM 2020-04-09 12:23:02 INFO SignalUtils:54 - Registered signal handler for HUP 2020-04-09 12:23:02 INFO SignalUtils:54 - Registered signal handler for INT 2020-04-09 12:23:02 INFO HiveThriftServer2:54 - Starting SparkContext 2020-04-09 12:23:02 INFO SparkContext:54 - Running Spark version 2.3.0 2020-04-09 12:23:02 INFO SparkContext:54 - Submitted application: Thrift JDBC/ODBC Server 2020-04-09 12:23:02 INFO SecurityManager:54 - Changing view acls to: DURGACHILLAKURU 2020-04-09 12:23:02 INFO SecurityManager:54 - Changing modify acls to: DURGACHILLAKURU 2020-04-09 12:23:02 INFO SecurityManager:54 - Changing view acls groups to: 2020-04-09 12:23:02 INFO SecurityManager:54 - Changing modify acls groups to: 2020-04-09 12:23:02 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(DURGACHILLAKURU); groups with view permissions: Set(); users with modify permissions: Set(DURGACHILLAKURU); groups with modify permissions: Set() 2020-04-09 12:23:03 INFO Utils:54 - Successfully started service 'sparkDriver' on port 35149. 2020-04-09 12:23:03 INFO SparkEnv:54 - Registering MapOutputTracker 2020-04-09 12:23:03 INFO SparkEnv:54 - Registering BlockManagerMaster 2020-04-09 12:23:03 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 2020-04-09 12:23:03 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up 2020-04-09 12:23:03 INFO DiskBlockManager:54 - Created local directory at /tmp/blockmgr-e927c857-a52e-4ba8-a896-cf010b14e1a5 2020-04-09 12:23:03 INFO MemoryStore:54 - MemoryStore started with capacity 366.3 MB 2020-04-09 12:23:03 INFO SparkEnv:54 - Registering OutputCommitCoordinator 2020-04-09 12:23:03 INFO log:192 - Logging initialized @2389ms 2020-04-09 12:23:03 INFO Server:346 - jetty-9.3.z-SNAPSHOT 2020-04-09 12:23:03 INFO Server:414 - Started @2544ms 2020-04-09 12:23:03 INFO AbstractConnector:278 - Started ServerConnector@55a8dc49{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 2020-04-09 12:23:03 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040. 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@23c650a3{/jobs,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1b1637e1{/jobs/json,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@18151a14{/jobs/job,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@169da7f2{/jobs/job/json,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3c1e23ff{/stages,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@ceb4bd2{/stages/json,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@60297f36{/stages/stage,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@df4b72{/stages/stage/json,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2ba45490{/stages/pool,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@37ff4054{/stages/pool/json,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@894858{/storage,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7af707e0{/storage/json,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@737edcfa{/storage/rdd,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3ecedf21{/storage/rdd/json,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@74cf8b28{/environment,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@36c54a56{/environment/json,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3359c978{/executors,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7ba63fe5{/executors/json,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@73386d72{/executors/threadDump,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6f330eb9{/executors/threadDump/json,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@125c082e{/static,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1929425f{/,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@718607eb{/api,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@676f0a60{/jobs/job/kill,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5d10455d{/stages/stage/kill,null,AVAILABLE,@Spark} 2020-04-09 12:23:03 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://10.0.0.128:4040 2020-04-09 12:23:04 INFO Executor:54 - Starting executor ID driver on host localhost 2020-04-09 12:23:04 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35203. 2020-04-09 12:23:04 INFO NettyBlockTransferService:54 - Server created on 10.0.0.128:35203 2020-04-09 12:23:04 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 2020-04-09 12:23:04 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, 10.0.0.128, 35203, None) 2020-04-09 12:23:04 INFO BlockManagerMasterEndpoint:54 - Registering block manager 10.0.0.128:35203 with 366.3 MB RAM, BlockManagerId(driver, 10.0.0.128, 35203, None) 2020-04-09 12:23:04 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, 10.0.0.128, 35203, None) 2020-04-09 12:23:04 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, 10.0.0.128, 35203, None) 2020-04-09 12:23:04 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4b2a30d{/metrics/json,null,AVAILABLE,@Spark} 2020-04-09 12:23:04 INFO SharedState:54 - Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/home/DURGACHILLAKURU/spark-2.3.0-bin-hadoop2.7/spark-warehouse/'). 2020-04-09 12:23:04 INFO SharedState:54 - Warehouse path is 'file:/home/DURGACHILLAKURU/spark-2.3.0-bin-hadoop2.7/spark-warehouse/'. 2020-04-09 12:23:04 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@72fe8a4f{/SQL,null,AVAILABLE,@Spark} 2020-04-09 12:23:04 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@43effd89{/SQL/json,null,AVAILABLE,@Spark} 2020-04-09 12:23:04 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1e9804b9{/SQL/execution,null,AVAILABLE,@Spark} 2020-04-09 12:23:04 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@543e593{/SQL/execution/json,null,AVAILABLE,@Spark} 2020-04-09 12:23:04 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4f186450{/static/sql,null,AVAILABLE,@Spark} 2020-04-09 12:23:04 INFO HiveUtils:54 - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes. 2020-04-09 12:23:05 INFO metastore:376 - Trying to connect to metastore with URI thrift://10.0.0.128:9083 2020-04-09 12:23:05 INFO metastore:472 - Connected to metastore. 2020-04-09 12:23:06 INFO SessionState:641 - Created local directory: /tmp/c23c7553-418a-4843-a53a-711bd9e66004_resources 2020-04-09 12:23:06 INFO SessionState:641 - Created HDFS directory: /tmp/hive/DURGACHILLAKURU/c23c7553-418a-4843-a53a-711bd9e66004 2020-04-09 12:23:06 INFO SessionState:641 - Created local directory: /tmp/DURGACHILLAKURU/c23c7553-418a-4843-a53a-711bd9e66004 2020-04-09 12:23:06 INFO SessionState:641 - Created HDFS directory: /tmp/hive/DURGACHILLAKURU/c23c7553-418a-4843-a53a-711bd9e66004/_tmp_space.db 2020-04-09 12:23:06 INFO HiveClientImpl:54 - Warehouse location for Hive client (version 1.2.2) is file:/home/DURGACHILLAKURU/spark-2.3.0-bin-hadoop2.7/spark-warehouse/ 2020-04-09 12:23:06 INFO StateStoreCoordinatorRef:54 - Registered StateStoreCoordinator endpoint 2020-04-09 12:23:06 INFO HiveUtils:54 - Initializing execution hive, version 1.2.1 2020-04-09 12:23:06 INFO HiveMetaStore:589 - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 2020-04-09 12:23:06 INFO ObjectStore:289 - ObjectStore, initialize called 2020-04-09 12:23:07 INFO Persistence:77 - Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 2020-04-09 12:23:07 INFO Persistence:77 - Property datanucleus.cache.level2 unknown - will be ignored 2020-04-09 12:23:09 INFO ObjectStore:370 - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 2020-04-09 12:23:10 INFO Datastore:77 - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 2020-04-09 12:23:10 INFO Datastore:77 - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 2020-04-09 12:23:11 INFO Datastore:77 - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 2020-04-09 12:23:11 INFO Datastore:77 - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 2020-04-09 12:23:11 INFO MetaStoreDirectSql:139 - Using direct SQL, underlying DB is DERBY 2020-04-09 12:23:11 INFO ObjectStore:272 - Initialized ObjectStore 2020-04-09 12:23:11 WARN ObjectStore:6666 - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 2020-04-09 12:23:11 WARN ObjectStore:568 - Failed to get database default, returning NoSuchObjectException 2020-04-09 12:23:11 INFO HiveMetaStore:663 - Added admin role in metastore 2020-04-09 12:23:11 INFO HiveMetaStore:672 - Added public role in metastore 2020-04-09 12:23:11 INFO HiveMetaStore:712 - No user is added in admin role, since config is empty 2020-04-09 12:23:11 INFO HiveMetaStore:746 - 0: get_all_databases 2020-04-09 12:23:11 INFO audit:371 - ugi=DURGACHILLAKURU ip=unknown-ip-addr cmd=get_all_databases
2020-04-09 12:23:11 INFO HiveMetaStore:746 - 0: get_functions: db=default pat= 2020-04-09 12:23:11 INFO audit:371 - ugi=DURGACHILLAKURU ip=unknown-ip-addr cmd=get_functions: db=default pat= 2020-04-09 12:23:11 INFO Datastore:77 - The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table. 2020-04-09 12:23:11 INFO SessionState:641 - Created local directory: /tmp/6da0f004-8c65-430c-9877-5a84c7968396_resources 2020-04-09 12:23:11 INFO SessionState:641 - Created HDFS directory: /tmp/hive/DURGACHILLAKURU/6da0f004-8c65-430c-9877-5a84c7968396 2020-04-09 12:23:11 INFO SessionState:641 - Created local directory: /tmp/DURGACHILLAKURU/6da0f004-8c65-430c-9877-5a84c7968396 2020-04-09 12:23:11 INFO SessionState:641 - Created HDFS directory: /tmp/hive/DURGACHILLAKURU/6da0f004-8c65-430c-9877-5a84c7968396/_tmp_space.db 2020-04-09 12:23:11 INFO HiveClientImpl:54 - Warehouse location for Hive client (version 1.2.2) is file:/home/DURGACHILLAKURU/spark-2.3.0-bin-hadoop2.7/spark-warehouse/ 2020-04-09 12:23:11 INFO SessionManager:133 - Operation log root directory is created: /tmp/DURGACHILLAKURU/operation_logs 2020-04-09 12:23:11 INFO SessionManager:88 - HiveServer2: Background operation thread pool size: 100 2020-04-09 12:23:11 INFO SessionManager:90 - HiveServer2: Background operation thread wait queue size: 100 2020-04-09 12:23:11 INFO SessionManager:93 - HiveServer2: Background operation thread keepalive time: 10 seconds 2020-04-09 12:23:11 INFO AbstractService:89 - Service:OperationManager is inited. 2020-04-09 12:23:11 INFO AbstractService:89 - Service:SessionManager is inited. 2020-04-09 12:23:11 INFO AbstractService:101 - Service: CLIService is inited. 2020-04-09 12:23:11 INFO AbstractService:89 - Service:ThriftBinaryCLIService is inited. 2020-04-09 12:23:11 INFO AbstractService:101 - Service: HiveServer2 is inited. 2020-04-09 12:23:11 INFO AbstractService:104 - Service:OperationManager is started. 2020-04-09 12:23:11 INFO AbstractService:104 - Service:SessionManager is started. 2020-04-09 12:23:11 INFO AbstractService:104 - Service:CLIService is started. 2020-04-09 12:23:11 INFO ObjectStore:289 - ObjectStore, initialize called 2020-04-09 12:23:11 INFO Query:77 - Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing 2020-04-09 12:23:11 INFO MetaStoreDirectSql:139 - Using direct SQL, underlying DB is DERBY 2020-04-09 12:23:11 INFO ObjectStore:272 - Initialized ObjectStore 2020-04-09 12:23:11 INFO HiveMetaStore:746 - 0: get_databases: default 2020-04-09 12:23:11 INFO audit:371 - ugi=DURGACHILLAKURU ip=unknown-ip-addr cmd=get_databases: default
2020-04-09 12:23:11 INFO HiveMetaStore:746 - 0: Shutting down the object store... 2020-04-09 12:23:11 INFO audit:371 - ugi=DURGACHILLAKURU ip=unknown-ip-addr cmd=Shutting down the object store...
2020-04-09 12:23:11 INFO HiveMetaStore:746 - 0: Metastore shutdown complete. 2020-04-09 12:23:11 INFO audit:371 - ugi=DURGACHILLAKURU ip=unknown-ip-addr cmd=Metastore shutdown complete.
2020-04-09 12:23:11 INFO AbstractService:104 - Service:ThriftBinaryCLIService is started. 2020-04-09 12:23:11 INFO AbstractService:104 - Service:HiveServer2 is started. 2020-04-09 12:23:11 INFO HiveThriftServer2:54 - HiveThriftServer2 started 2020-04-09 12:23:11 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@459d2ee6{/sqlserver,null,AVAILABLE,@Spark} 2020-04-09 12:23:11 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5342eec5{/sqlserver/json,null,AVAILABLE,@Spark} 2020-04-09 12:23:11 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4eac65db{/sqlserver/session,null,AVAILABLE,@Spark} 2020-04-09 12:23:11 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@dde6f87{/sqlserver/session/json,null,AVAILABLE,@Spark} 2020-04-09 12:23:12 INFO ThriftCLIService:98 - Starting ThriftBinaryCLIService on port 10000 with 5...500 worker threads 2020-04-09 12:31:56 INFO ThriftCLIService:243 - Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V8 2020-04-09 12:31:56 INFO SessionState:641 - Created local directory: /tmp/a557491c-f155-4da0-b92f-32ee32d55b88_resources 2020-04-09 12:31:56 INFO SessionState:641 - Created HDFS directory: /tmp/hive/APP/a557491c-f155-4da0-b92f-32ee32d55b88 2020-04-09 12:31:56 INFO SessionState:641 - Created local directory: /tmp/DURGACHILLAKURU/a557491c-f155-4da0-b92f-32ee32d55b88 2020-04-09 12:31:56 INFO SessionState:641 - Created HDFS directory: /tmp/hive/APP/a557491c-f155-4da0-b92f-32ee32d55b88/_tmp_space.db 2020-04-09 12:31:56 INFO HiveSessionImpl:318 - Operation log session directory is created: /tmp/DURGACHILLAKURU/operation_logs/a557491c-f155-4da0-b92f-32ee32d55b88 Exception in thread "HiveServer2-Handler-Pool: Thread-76" java.lang.NoClassDefFoundError: org/apache/ranger/plugin/policyengine/RangerAccessResultProcessor at org.apache.spark.sql.catalyst.optimizer.RangerSparkRowFilterExtension$.apply(RangerSparkRowFilterExtension.scala:37) at org.apache.spark.sql.catalyst.optimizer.RangerSparkRowFilterExtension$.apply(RangerSparkRowFilterExtension.scala:37) at org.apache.spark.sql.SparkSessionExtensions$$anonfun$buildOptimizerRules$1.apply(SparkSessionExtensions.scala:124) at org.apache.spark.sql.SparkSessionExtensions$$anonfun$buildOptimizerRules$1.apply(SparkSessionExtensions.scala:124) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.SparkSessionExtensions.buildOptimizerRules(SparkSessionExtensions.scala:124) at org.apache.spark.sql.internal.BaseSessionStateBuilder.customOperatorOptimizationRules(BaseSessionStateBuilder.scala:225) at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$2.extendedOperatorOptimizationRules(BaseSessionStateBuilder.scala:214) at org.apache.spark.sql.catalyst.optimizer.Optimizer.batches(Optimizer.scala:92) at org.apache.spark.sql.execution.SparkOptimizer.batches(SparkOptimizer.scala:32) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76) at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:66) at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:66) at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3248) at org.apache.spark.sql.Dataset.(Dataset.scala:190) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694) at org.apache.spark.sql.hive.thriftserver.SparkSQLSessionManager.openSession(SparkSQLSessionManager.scala:68) at org.apache.hive.service.cli.CLIService.openSessionWithImpersonation(CLIService.java:202) at org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:351) at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:246) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1253) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1238) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: org.apache.ranger.plugin.policyengine.RangerAccessResultProcessor at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 40 more 2020-04-09 12:36:44 INFO ThriftCLIService:243 - Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V8 2020-04-09 12:36:44 INFO SessionState:641 - Created local directory: /tmp/615b6100-804c-4507-8378-765a59075b38_resources 2020-04-09 12:36:44 INFO SessionState:641 - Created HDFS directory: /tmp/hive/APP/615b6100-804c-4507-8378-765a59075b38 2020-04-09 12:36:44 INFO SessionState:641 - Created local directory: /tmp/DURGACHILLAKURU/615b6100-804c-4507-8378-765a59075b38 2020-04-09 12:36:44 INFO SessionState:641 - Created HDFS directory: /tmp/hive/APP/615b6100-804c-4507-8378-765a59075b38/_tmp_space.db 2020-04-09 12:36:44 INFO HiveSessionImpl:318 - Operation log session directory is created: /tmp/DURGACHILLAKURU/operation_logs/615b6100-804c-4507-8378-765a59075b38 Exception in thread "HiveServer2-Handler-Pool: Thread-84" java.lang.NoClassDefFoundError: org/apache/ranger/plugin/policyengine/RangerAccessResultProcessor at org.apache.spark.sql.catalyst.optimizer.RangerSparkRowFilterExtension$.apply(RangerSparkRowFilterExtension.scala:37) at org.apache.spark.sql.catalyst.optimizer.RangerSparkRowFilterExtension$.apply(RangerSparkRowFilterExtension.scala:37) at org.apache.spark.sql.SparkSessionExtensions$$anonfun$buildOptimizerRules$1.apply(SparkSessionExtensions.scala:124) at org.apache.spark.sql.SparkSessionExtensions$$anonfun$buildOptimizerRules$1.apply(SparkSessionExtensions.scala:124) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.SparkSessionExtensions.buildOptimizerRules(SparkSessionExtensions.scala:124) at org.apache.spark.sql.internal.BaseSessionStateBuilder.customOperatorOptimizationRules(BaseSessionStateBuilder.scala:225) at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$2.extendedOperatorOptimizationRules(BaseSessionStateBuilder.scala:214) at org.apache.spark.sql.catalyst.optimizer.Optimizer.batches(Optimizer.scala:92) at org.apache.spark.sql.execution.SparkOptimizer.batches(SparkOptimizer.scala:32) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76) at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:66) at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:66) at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3248) at org.apache.spark.sql.Dataset.(Dataset.scala:190) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694) at org.apache.spark.sql.hive.thriftserver.SparkSQLSessionManager.openSession(SparkSQLSessionManager.scala:68) at org.apache.hive.service.cli.CLIService.openSessionWithImpersonation(CLIService.java:202) at org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:351) at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:246) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1253) at org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1238) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2020-04-09 12:36:44 INFO ThriftCLIService:243 - Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V8 Caused by: java.lang.ClassNotFoundException: org.apache.ranger.plugin.policyengine.RangerAccessResultProcessor at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 40 more

MobinRanjbar commented 4 years ago

Did you try spark-ranger.jar? You do not need 'original-spark-ranger.jar'. It is just 188K and spark-ranger.jar should be 16MB. Copy it on all machines and see what will happen and attach your logs if available.

DurgaChill commented 4 years ago

Hi,

Yes I did try with spark-ranger.jar.

Have attached the thriftserver logs for your reference.

Steps I have done so far: Installed spark-2.3.1-bin-hadoop2.7. Installed hadoop-2.7.7 Installed Hive-2.1.1(Also tried with Hive-3.1.2) Ran the script - HADOOP_HOME/start-dfs.sh - hadoop runs on localhost:50070 - Verified. Ran the script - SPARK_HOME/start-master.sh - Verified Started hive metastore - HIVE_HOME/bin/hive --service metastore (underlying db - derby) Ran the script - SPARK_HOME/start-thriftserver.sh - Checked the logs - thriftserver has been started.

Before starting Spark:

Have build spark-ranger and placed the jar under SPARK_HOME/jars and mentioned the param spark.sql.extensions=org.apache.ranger.authorization.spark.authorizer.RangerSparkSQLExtension in spark-default.conf file. Have placed both xml files with appropriate params under SPARK_HOMW/conf directory. Also, have addressed the add-on issues mentioned in spark-ranger github. thriftserver_logs.txt

Started Ranger-admin service.

Now, when I am trying to access the beeline(spark) , below exception pops up.

Attached the thriftserver log file for your reference. Kindly let me know if there is anything that I need to provide.

MobinRanjbar commented 4 years ago

Well, You have not mentioned that you have setup multinode or single node. If multinode, copy spark-ranger.jar on all node.

In addition, you have put "spark.sql.extensions=org.apache.ranger.authorization.spark.authorizer.RangerSparkSQLExtension" in spark-default.conf which is invalid. The correct form is like below:

spark.sql.extensions org.apache.ranger.authorization.spark.authorizer.RangerSparkSQLExtension

Try add "spark.driver.extraClassPath YOUR_SPARK_HOME/jars/spark-ranger-1.0-SNAPSHOT.jar" in spark-default.conf.

By the way, your "ranger-spark-security.xml" is not readable(there may be permission problem) based on this error occured in the logs:

"ERROR RangerConfiguration:103 - addResourceIfReadable(ranger-spark-security.xml): couldn't find resource file location"

That will use DB audit store as default and continue. It is not related to the error actually, but I had to say to solve it.

DurgaChill commented 4 years ago

Hi,

I am running spark on single node. Thanks for providing solutions to those errors. I was able to start thrift, but when I connect with beeline with a recognized username and password as per ranger. But still end up getting the error as shown in the log file. thriftserverForUser.txt

When trying to rectify this, I tried to run the thriftserver with sudo, in which case I get a different error stating "user [root] does not have [use] priviledge on default". Attached the thriftserver log file for your reference when run as root. thriftserverForRoot.txt

If my understanding is right, then only root can be used as a user to connect ranger and spark? Or is there any other alternate to solving this?

DurgaChill commented 4 years ago

Hi,

I would also like to mention that I have installed spark in my home directory, whereas ranger would in usr/local/ .

Previously when I integrated Ranger with hive server2, I had my hive installed in my home directory. But I had Hive Ranger plugin in my /usr/local directory, so there wasn't any issue in accessing credential files. But in this case, the plugin set up is adding a jar file and some conf set up in spark directory itself. I guess that's the reason for this.

Can you please let me know if there is any inputs from your end to resolve this ? Thanks in advance.

MobinRanjbar commented 4 years ago

Hi,

When you enable security, It is not recommended to have all services installed into their $HOME. For example, you will install Spark in /usr/local and you chown it to spark user but other users can interact with it. No need to run ThriftServer as sudo. Set the owner of each services.

When I checked your logs again, I found that you have a permission problem. Two errors happened in your logs which is below:

java.io.FileNotFoundException: /etc/ranger/jolly/policycache/sparkSql_jolly.json (Permission denied)
java.lang.NoClassDefFoundError: Could not initialize class org.apache.ranger.authorization.spark.authorizer.RangerSparkAuthorizer

The first error is related to policycache folder's permission. The second error is related to access permissions of spark-ranger.jar file which is not readable. Check both and tell me the result.

yaooqinn / spark-ranger

Exception in thread "HiveServer2-Handler-Pool: Thread-76" java.lang.ExceptionInInitializerError #23

After digging into thrifserver logs, I found the exception -

Before starting Spark: