Open sa255304 opened 7 years ago
You may want to check if you can access to Hbase table in your cluster even without SHC. So far, from the information here, I am not sure if the issue is from SHC or other stuff.
Looks like you do not have hbase-site.xml in the path. You should try to use --master yarn-cluster, so that --files option comes to effect. Else, you can add hbase-site.xml with --jars option.
17/06/22 22:37:28 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
this setup is working in spark local mode , but in yarn-cleint and yarn-cluster it is showing hadoop authenitication issues.
do i need to setup this below configuration in whole cluster or in the client machine only?
export SPARK_CLASSPATH=/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar
@sa255304 In the client machine only.
17/06/22 22:37:28 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x2049787b0x0, quorum=localhost:2181, baseZNode=/hbase 17/06/22 22:37:28 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 17/06/22 22:37:28 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session 17/06/22 22:37:28 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x15cd088bbce0008, negotiated timeout = 90000 Traceback (most recent call last): File "/home/orienit/work/hbase_con.py", line 28, in
df.count()
File "/home/orienit/work/spark-2.1.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 380, in count
File "/home/orienit/work/spark-2.1.0-bin-hadoop2.6/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in call
File "/home/orienit/work/spark-2.1.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/home/orienit/work/spark-2.1.0-bin-hadoop2.6/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o26.count.
: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange SinglePartition
+- HashAggregate(keys=[], functions=[partial_count(1)], output=[count#13L])
+- Scan HBaseRelation(Map(catalog -> {"table":{"namespace":"my_ns","name":"test"},"rowkey":"row1","columns":{"col0":{"cf":"rowkey","col":"row1","type":"string"},"col1":{"cf":"cf","col":"a","type":"string"}}}),None) [] ReadSchema: struct<>
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: Thu Jun 22 22:38:17 IST 2017, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68512: row 'my_ns:test,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ubuntu,16201,1498147258670, seqNum=0
The below one is the code and spark-submit command
from pyspark import SparkContext,SparkConf from pyspark.sql import SQLContext
conf = SparkConf().setAppName("hbase-connection") sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
''.join(string.split()) in order to write a multi-line JSON string here.
catalog = ''.join("""{ "table":{"namespace":"my_ns", "name":"test"}, "rowkey":"row1", "columns":{ "col0":{"cf":"rowkey", "col":"row1", "type":"string"}, "col1":{"cf":"cf", "col":"a", "type":"string"} } }""".split())
Reading
df = sqlContext.read.options(catalog=catalog).format("org.apache.spark.sql.execution.datasources.hbase").load() df.count()
spark-submit --packages com.hortonworks:shc-core:1.1.1-2.1-s_2.11 --repositories http://repo.hortonworks.com/content/groups/public/ --files /home/orienit/work/hbase-1.1.2/conf/hbase-site.xml hbase_con.py