Closed gobiviswanath closed 3 years ago
Here is the workspace with mixed set of ddls with variety of tables, managed, external, csv, orc, delta, parquet and hive table.
Source: https://adb-5985677774820244.4.azuredatabricks.net/?o=5985677774820244#setting/clusters Destination: https://adb-8237512505927139.19.azuredatabricks.net/?o=8237512505927139#setting/clusters
One more thing to add: this is tested in
6.4 DBR in source workspace. Python 3.7.4 6.4 DBR in destination workspace.
I agree the managed tables are converted to external tables but this is better than the tables not being migrated at all. we can add notes in readme defining the limitation.
Thanks for commit and tests. The main challenge with this PR is the additional API requests to fetch the LOCATION data for the tables. This is not needed since we have all the information locally, and an additional API request per table is very costly for a migration.
I have a tested version working now that I will commit.
It tests whether the database path is custom + the table DDL is missing the LOCATION
attribute, if it is missing the LOCATION
attribute, then I append it to the DDL before importing.
Hi Miklos, thank you for the constructive feedback. I cleansed the code and updated commit. Please review and let me know if it still needs improvements.
Problem:
The table exports that does not have location or path in ddl definition fails at destination fails during import. I added a fix to take care of this.
Here is the workspace with mixed set of ddls with variety of tables, managed, external, csv, orc, delta, parquet and hive table.
https://adb-5985677774820244.4.azuredatabricks.net/?o=5985677774820244#setting/clusters
If the tables are exported in this workspace with current migrate script :
1) it would fail at destination with empty state. I add Charles Bernard as witness. charles.bernard@databricks.com and we had to make this change to migrate Prudencial at APAC. We migrated with this script. as poc. 2) With the patch this is fixed.
If you are testing this, please import spark config from migrating cluster (API_Metastore_Work_Leave_Me_Alone) from source to destination workspace.
resolved issue: bug! delta and external tables in metastore import fails #54
Thanks again in advance!