Closed saldroubi closed 4 years ago
Yes, should be possible w/ this option:
--cluster-name CLUSTER_NAME
Cluster name to export the metastore to a specific
cluster. Cluster will be started.
Have you tried this option?
No this did not fix it, sorry. I see that the cluster started but then I get an error.
Is this because it is spark 3.0?
ERROR: AttributeError: databaseName {"resultType": "error", "summary": "AttributeError: databaseName", "cause": "---------------------------------------------------------------------------\nValueError Traceback (most recent call last)\n/databricks/spark/python/pyspark/sql/types.py in getattr(self, item)\n 1594 # but this will not be used in normal cases\n-> 1595 idx = self.fields.index(item)\n 1596 return self[idx]\n\nValueError: 'databaseName' is not in list\n\nDuring handling of the above exception, another exception occurred:\n\nAttributeError Traceback (most recent call last)\n in \n----> 1 all_dbs = [x.databaseName for x in spark.sql("show databases").collect()]; print(len(all_dbs))\n\n in (.0)\n----> 1 all_dbs = [x.databaseName for x in spark.sql("show databases").collect()]; print(len(all_dbs))\n\n/databricks/spark/python/pyspark/sql/types.py in getattr(self, item)\n 1598 raise AttributeError(item)\n 1599 except ValueError:\n-> 1600 raise AttributeError(item)\n 1601 \n 1602 def setattr(self, key, value):\n\nAttributeError: databaseName"}
Traceback (most recent call last): File "./export_db.py", line 151, in main() File "./export_db.py", line 137, in main hive_c.export_hive_metastore(cluster_name=args.cluster_name) File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/HiveClient.py", line 200, in export_hive_metastore all_dbs = self.log_all_databases(cid, ec_id, metastore_dir) File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/HiveClient.py", line 21, in log_all_databases raise ValueError("Cannot identify number of databases due to the above error") ValueError: Cannot identify number of databases due to the above error
I just confirmed that it does NOT work with runtime version: 7.0 (includes Apache Spark 3.0.0, Scala 2.12) But it works for runtime: 6.5 (includes Apache Spark 2.4.5, Scala 2.11)
I am closing this issue and reopening another more clear and concise about the problem.
I noticed that when doing an export with --metastore it creates a cluster with a name stored in the data/azure_cluster.json. However, I would like it to use an existing cluster in my workspace. Is this possible?
I changed the cluster name and version to a cluster I crated as shown below but I getting an error. Here is the azure_cluster.json file followed by the error.
{ "num_workers": 1, "cluster_name": "export", "spark_version": "7.0.x-scala2.12", "spark_conf": {}, "node_type_id": "Standard_F4s_v2", "ssh_public_keys": [], "custom_tags": {}, "spark_env_vars": { "PYSPARK_PYTHON": "/databricks/python3/bin/python3" }, "autotermination_minutes": 30, "init_scripts": [] }
######################## ERROR #######################################
python3 ./export_db.py --azure --metastore --debug --profile ws-databricks
https://adb-5463815377663355.15.azuredatabricks.net dapi6fe373fdb9f18c9613840ebefdccde43 Export the metastore configs at 2020-09-15 17:16:17.354008 Get: https://adb-5463815377663355.15.azuredatabricks.net/api/2.0/clusters/list Get: https://adb-5463815377663355.15.azuredatabricks.net/api/2.0/clusters/list Starting export with id 0915-220257-ruins233 post: https://adb-5463815377663355.15.azuredatabricks.net/api/2.0/clusters/start Error: Cluster 0915-220257-ruins233 is in unexpected state Running. Get: https://adb-5463815377663355.15.azuredatabricks.net/api/2.0/clusters/get Cluster creation time: 0:00:00.662221 Creating remote Spark Session post: https://adb-5463815377663355.15.azuredatabricks.net/api/1.2/contexts/create post: https://adb-5463815377663355.15.azuredatabricks.net/api/1.2/commands/execute Get: https://adb-5463815377663355.15.azuredatabricks.net/api/1.2/commands/status Get: https://adb-5463815377663355.15.azuredatabricks.net/api/1.2/commands/status Get: https://adb-5463815377663355.15.azuredatabricks.net/api/1.2/commands/status ERROR: AttributeError: databaseName {"resultType": "error", "summary": "<span class=\"ansi-red-fg\">AttributeError: databaseName", "cause": "---------------------------------------------------------------------------\nValueError Traceback (most recent call last)\n/databricks/spark/python/pyspark/sql/types.py in getattr(self, item)\n 1594 # but this will not be used in normal cases\n-> 1595 idx = self.fields.index(item)\n 1596 return self[idx]\n\nValueError: 'databaseName' is not in list\n\nDuring handling of the above exception, another exception occurred:\n\nAttributeError Traceback (most recent call last)\n in \n----> 1 all_dbs = [x.databaseName for x in spark.sql(\"show databases\").collect()]; print(len(all_dbs))\n\n in (.0)\n----> 1 all_dbs = [x.databaseName for x in spark.sql(\"show databases\").collect()]; print(len(all_dbs))\n\n/databricks/spark/python/pyspark/sql/types.py in getattr(self, item)\n 1598 raise AttributeError(item)\n 1599 except ValueError:\n-> 1600 raise AttributeError(item)\n 1601 \n 1602 def setattr(self, key, value):\n\nAttributeError: databaseName"}
Traceback (most recent call last): File "./export_db.py", line 151, in
main()
File "./export_db.py", line 137, in main
hive_c.export_hive_metastore(cluster_name=args.cluster_name)
File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/HiveClient.py", line 200, in export_hive_metastore
all_dbs = self.log_all_databases(cid, ec_id, metastore_dir)
File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/HiveClient.py", line 21, in log_all_databases
raise ValueError("Cannot identify number of databases due to the above error")
ValueError: Cannot identify number of databases due to the above error
PS /Users/saldroubi/Dropbox/git/db-migration>