mrchristine / db-migration

Databricks Migration Tools
Other
43 stars 27 forks source link

Add support for spark 3.0 #29

Closed saldroubi closed 4 years ago

saldroubi commented 4 years ago

When using export with a specific cluster using the --cluster_name argument and the cluster is spark 3.0 then an error is generated as shown below. This was tested on Azure Databricks. Cluster used is : 7.0 (includes Apache Spark 3.0.0, Scala 2.12)

python3 ./export_db.py --azure --metastore --cluster-name export --profile ws-databricks

ERROR: AttributeError: databaseName {"resultType": "error", "summary": "<span class=\"ansi-red-fg\">AttributeError: databaseName", "cause": "---------------------------------------------------------------------------\nValueError Traceback (most recent call last)\n/databricks/spark/python/pyspark/sql/types.py in getattr(self, item)\n 1594 # but this will not be used in normal cases\n-> 1595 idx = self.fields.index(item)\n 1596 return self[idx]\n\nValueError: 'databaseName' is not in list\n\nDuring handling of the above exception, another exception occurred:\n\nAttributeError Traceback (most recent call last)\n in \n----> 1 all_dbs = [x.databaseName for x in spark.sql(\"show databases\").collect()]; print(len(all_dbs))\n\n in (.0)\n----> 1 all_dbs = [x.databaseName for x in spark.sql(\"show databases\").collect()]; print(len(all_dbs))\n\n/databricks/spark/python/pyspark/sql/types.py in getattr(self, item)\n 1598 raise AttributeError(item)\n 1599 except ValueError:\n-> 1600 raise AttributeError(item)\n 1601 \n 1602 def setattr(self, key, value):\n\nAttributeError: databaseName"}

Traceback (most recent call last): File "./export_db.py", line 151, in main() File "./export_db.py", line 137, in main hive_c.export_hive_metastore(cluster_name=args.cluster_name) File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/HiveClient.py", line 200, in export_hive_metastore all_dbs = self.log_all_databases(cid, ec_id, metastore_dir) File "/Users/saldroubi/Dropbox/git/db-migration/dbclient/HiveClient.py", line 21, in log_all_databases raise ValueError("Cannot identify number of databases due to the above error") ValueError: Cannot identify number of databases due to the above error

mrchristine commented 4 years ago

This would be a feature request that I can add. For the time being, can you use Spark 2 to export?

mrchristine commented 4 years ago

Support added for DBR 7 release.