hortonworks-spark / shc

The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.
Apache License 2.0
552 stars 280 forks source link

Zookeeper Exception #148

Open sa255304 opened 7 years ago

sa255304 commented 7 years ago

17/06/22 22:37:28 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x2049787b0x0, quorum=localhost:2181, baseZNode=/hbase 17/06/22 22:37:28 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 17/06/22 22:37:28 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session 17/06/22 22:37:28 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x15cd088bbce0008, negotiated timeout = 90000 Traceback (most recent call last): File "/home/orienit/work/hbase_con.py", line 28, in df.count() File "/home/orienit/work/spark-2.1.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 380, in count File "/home/orienit/work/spark-2.1.0-bin-hadoop2.6/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in call File "/home/orienit/work/spark-2.1.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/home/orienit/work/spark-2.1.0-bin-hadoop2.6/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o26.count. : org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange SinglePartition +- HashAggregate(keys=[], functions=[partial_count(1)], output=[count#13L]) +- Scan HBaseRelation(Map(catalog -> {"table":{"namespace":"my_ns","name":"test"},"rowkey":"row1","columns":{"col0":{"cf":"rowkey","col":"row1","type":"string"},"col1":{"cf":"cf","col":"a","type":"string"}}}),None) [] ReadSchema: struct<>

at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
at org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:112)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:235)
at org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:141)
at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:368)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:225)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:272)
at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2371)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2370)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2377)
at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2405)
at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2404)
at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2778)
at org.apache.spark.sql.Dataset.count(Dataset.scala:2404)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)

Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: Thu Jun 22 22:38:17 IST 2017, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68512: row 'my_ns:test,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ubuntu,16201,1498147258670, seqNum=0

at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:271)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:195)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295)
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:821)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:193)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)

The below one is the code and spark-submit command

from pyspark import SparkContext,SparkConf from pyspark.sql import SQLContext

conf = SparkConf().setAppName("hbase-connection") sc = SparkContext(conf=conf)

sqlContext = SQLContext(sc)

''.join(string.split()) in order to write a multi-line JSON string here.

catalog = ''.join("""{ "table":{"namespace":"my_ns", "name":"test"}, "rowkey":"row1", "columns":{ "col0":{"cf":"rowkey", "col":"row1", "type":"string"}, "col1":{"cf":"cf", "col":"a", "type":"string"} } }""".split())

Reading

df = sqlContext.read.options(catalog=catalog).format("org.apache.spark.sql.execution.datasources.hbase").load() df.count()

spark-submit --packages com.hortonworks:shc-core:1.1.1-2.1-s_2.11 --repositories http://repo.hortonworks.com/content/groups/public/ --files /home/orienit/work/hbase-1.1.2/conf/hbase-site.xml hbase_con.py

weiqingy commented 7 years ago

You may want to check if you can access to Hbase table in your cluster even without SHC. So far, from the information here, I am not sure if the issue is from SHC or other stuff.

ayanguha commented 7 years ago

Looks like you do not have hbase-site.xml in the path. You should try to use --master yarn-cluster, so that --files option comes to effect. Else, you can add hbase-site.xml with --jars option.

17/06/22 22:37:28 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session

sa255304 commented 7 years ago

this setup is working in spark local mode , but in yarn-cleint and yarn-cluster it is showing hadoop authenitication issues.

do i need to setup this below configuration in whole cluster or in the client machine only?

export SPARK_CLASSPATH=/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar

weiqingy commented 7 years ago

@sa255304 In the client machine only.