Closed raviranak closed 1 year ago
Hi @raviranak , What will you do if it's vanilla Spark? You can try the same thing. I think vanilla Spark conf directory does not contain hive-site.xml, either. Are you using pyspark installed via pip or binary spark install? Have you tried putting hive-site.xml into the conf dir?
I have figured way for configuring the hive metastore with mysql , but getting an error
default_spark_conf = { "spark.jars.packages": "mysql:mysql-connector-java:8.0.32", "spark.jars": "/home/ray/.ivy2/jars/com.mysql_mysql-connector-j-8.0.32.jar", "spark.hadoop.javax.jdo.option.ConnectionDriverName": "com.mysql.jdbc.Driver", "spark.hadoop.javax.jdo.option.ConnectionURL": "jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true", "spark.hadoop.javax.jdo.option.ConnectionUserName": "test", "spark.hadoop.javax.jdo.option.ConnectionPassword": "", "spark.sql.catalog.spark_catalog.type":"hive", "spark.sql.catalogImplementation":"hive" } spark = raydp.init_spark( app_name="Darwin_SPARK", num_executors=1, executor_cores=1, executor_memory='4G', enable_hive = True, configs=default_spark_conf)
Getting error when trying to create a table like `df = spark.createDataFrame([ (1, "Smith"), (2, "Rose"), (3, "Williams") ], ("id", "name"))
df.write.mode("overwrite").saveAsTable("employees12")`
Stack Trace
2023-05-22 10:45:09,515 WARN HiveMetaStore [Thread-5]: Retrying creating default database after error: Unexpected exception caught. javax.jdo.JDOFatalInternalException: Unexpected exception caught. at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1203) ~[javax.jdo-3.2.0-m3.jar:?] at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:814) ~[javax.jdo-3.2.0-m3.jar:?] at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:702) ~[javax.jdo-3.2.0-m3.jar:?] at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:521) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:550) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:405) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:342) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:303) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:79) ~[hadoop-client-api-3.3.2.jar:?] at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:139) ~[hadoop-client-api-3.3.2.jar:?] at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:594) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431) ~[hive-metastore-2.3.9.jar:2.3.9] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_362] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_362] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_362] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_362] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6902) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:162) ~[hive-metastore-2.3.9.jar:2.3.9] at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70) ~[hive-exec-2.3.9-core.jar:2.3.9]
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.datanucleus.util.NucleusLogger at org.datanucleus.plugin.PluginRegistryFactory.newPluginRegistry(PluginRegistryFactory.java:58) at org.datanucleus.plugin.PluginManager.<init>(PluginManager.java:60) at org.datanucleus.plugin.PluginManager.createPluginManager(PluginManager.java:430) at org.datanucleus.AbstractNucleusContext.<init>(AbstractNucleusContext.java:85) at org.datanucleus.PersistenceNucleusContextImpl.<init>(PersistenceNucleusContextImpl.java:167) at org.datanucleus.PersistenceNucleusContextImpl.<init>(PersistenceNucleusContextImpl.java:156) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.<init>(JDOPersistenceManagerFactory.java:415) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:304) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:213) ... 80 more
Could you please help @kira-lin
Did same with SparkSession and its working
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Spark Examples") \ .config("spark.hadoop.javax.jdo.option.ConnectionURL", "jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true") \ .config("spark.hadoop.javax.jdo.option.ConnectionDriverName", "com.mysql.cj.jdbc.Driver") \ .config("spark.hadoop.javax.jdo.option.ConnectionUserName", "test") \ .config("spark.hadoop.javax.jdo.option.ConnectionPassword", "") \ .config("spark.sql.catalogImplementation","hive") \ .config("spark.sql.catalog.spark_catalog.type","hive") \ .config("spark.jars","/home/ray/.ivy2/jars/com.mysql_mysql-connector-j-8.0.32.jar") \ .config("spark.jars.packages", "mysql:mysql-connector-java:8.0.32") \ .enableHiveSupport().getOrCreate()
This seems to be a bug , could you please look into it
Wanted to configure spark to use mysql as metastore instead of derby hive metastore which is default
Not able to finde the hive-site.xml in spark implementation of raydp