jupyter-incubator / sparkmagic

Jupyter magics and kernels for working with remote Spark clusters
Other
1.33k stars 447 forks source link

[BUG] The spark session cannot open managed hive tables (ACID table) #741

Closed GaryLiuTelus closed 11 months ago

GaryLiuTelus commented 2 years ago

Describe the bug A clear and concise description of what the bug is. When I use the following code to open a Hive managed table, it keeps running without returning any results back.

from pyspark_llap import HiveWarehouseSession 
HiveAcid = HiveWarehouseSession.session(spark).build()
HiveAcid.table("xxxsx.test_table_acid").printSchema()

When checking the log, it said:

21/12/02 17:09:32 WARN HiveConnection: Failed to connect to qccr-hadoop-m010.oss.ads:10501
21/12/02 17:09:32 INFO HS2ActivePassiveHARegistryClient: Returning cached registry client for namespace: hs2ActivePassiveHA-sasl
21/12/02 17:09:32 INFO ZooKeeperHiveClientHelper: Found HS2 Active Host: qccr-hadoop-m010.oss.ads Port: 10501 Identity: 37876f0a-65d6-4e2b-af3d-167025bfdf38 Mode: http:/cliservice
21/12/02 17:09:32 INFO Utils: Selected HiveServer2 instance with uri: jdbc:hive2://qccr-hadoop-m010.oss.ads:10501/;serviceDiscoveryMode=zooKeeperHA;zooKeeperNamespace=hs2ActivePassiveHA;auth=delegationToken
21/12/02 17:09:32 WARN HiveConnection: Could not open client transport with JDBC Uri: jdbc:hive2://qccr-hadoop-m010.oss.ads:10501/;serviceDiscoveryMode=zooKeeperHA;zooKeeperNamespace=hs2ActivePassiveHA;auth=delegationToken: Could not establish connection to jdbc:hive2://qccr-hadoop-m010.oss.ads:10501/;serviceDiscoveryMode=zooKeeperHA;zooKeeperNamespace=hs2ActivePassiveHA;auth=delegationToken: HTTP Response code: 401 Retrying 0 of 1
21/12/02 17:09:32 WARN HiveConnection: Delegation token with key: [hive] cannot be found.
21/12/02 17:09:32 INFO HiveConnection: Connected to qccr-hadoop-m010.oss.ads:10501
21/12/02 17:09:32 ERROR HiveConnection: Error opening session

But actually, the JDBC Url should be jdbc:hive2://qccr-hadoop-m001.oss.ads:2181,qccr-hadoop-m002.oss.ads:2181......, which I have specified in the session_configs:

"session_configs": {
    "driverMemory": "4G",
    "executorMemory": "4G",
    "executorCores": 2,
    "jars":["hdfs://xxxx/jars/hive-warehouse-connector-assembly-1.0.0.3.1.5.0-152.jar"],
    "pyFiles":["hdfs://xxxx/jars/pyspark_hwc-1.0.0.3.1.5.0-152.zip"],
    "conf": {
        "spark.yarn.appMasterEnv.PYSPARK_PYTHON": "/python_env/keras_py3.6_env/bin/python",
        "spark.yarn.executorEnv.PYSPARK_PYTHON": "/python_env/keras_py3.6_env/bin/python", 
        "spark.sql.hive.hiveserver2.jdbc.url.principal":"_HOST@HADOOP",
        "spark.sql.hive.hiveserver2.jdbc.url":"jdbc:hive2://qccr-hadoop-m001.oss.ads:2181,qccr-hadoop-m002.oss.ads:2181,qccr-hadoop-m003.oss.ads:2181/;serviceDiscoveryMode=zooKeeperHA;zooKeeperNamespace=hs2ActivePassiveHA",
        "spark.hadoop.hive.llap.daemon.service.hosts":"@llap0",
        "spark.security.credentials.hiveserver2.enabled":"true"
}
}

But based on log, the session still try to use jdbc:hive2://qccr-hadoop-m010.oss.ads:10501/, instead of jdbc:hive2://qccr-hadoop-m001.oss.ads:2181,qccr-hadoop-m002.oss.ads:2181......, and so got a 401 error. Looks the configurations are not properly transferred.

But it worked with non-managed tables well. like

spark.table("xxxx.test_non_managed_orc").printSchema()

And the same setting worked well in Zeppelin Livy interpreters, which can open ACID tables without issues.

To Reproduce Steps to reproduce the behavior.

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Versions:

Additional context Add any other context about the problem here.

GaryLiuTelus commented 2 years ago

This has been resolved. the problem was caused by pointing HWC jar file to HDFS. After change the path back to edge node, it resolved. My original configuration is:

"session_configs":{
"jars":["hdfs://xxxx/xxx/hive-warehouse-connector-assembly-1.0.0.3.1.5.0-152.jar"]
}

Now my configuration:


"session_configs":{
"jars":["file:///yyy/yy/hive-warehouse-connector-assembly-1.0.0.3.1.5.0-152.jar"]
}```