Closed gobiviswanath closed 3 years ago
@gobiviswanath thanks for reporting this.
It looks like the DDL you provided is not the table that's causing the issue. Please go into logs/metastore/{db_name}/table_name
and provide the DDL for the table that's throwing the error.
For the second issue in question, please review the exported metastore entries and we can look into the fix if needed. Have you tried the --metastore-unicode
option for the metastore export to ensure that control characters are captured? I'll need to know the export command used here.
It looks like a bug w/ Azure using Delta CTAS statements when we export the DDL. I couldn't reproduce this on AWS. Could you open an eng ticket internally and have them take a look?
Well, this does not look like a bug with DDL export itself, if we are creating a delta table without specifying location and allow table inherit location from database path.
%sql show create table
will miss the location. This seem like expected behaviour.
When we use this
CREATE TABLE `test_delta_externalDB`.`customers_adlspath` ( `c_custkey` BIGINT, `c_name` STRING, `c_address` STRING, `c_nationkey` BIGINT, `c_phone` STRING, `c_acctbal` DECIMAL(12,2), `c_comment` STRING, `c_mktsegment` STRING) USING DELTA
to create table at destination destination, it fails. This is mainly because
a) databases migrated failed to inherit path / location from source workspace issue#56
b) We do not update the path/ dp path checks in create statements we generate by describing the table at source. The delta tables explicitly require location for shallow creation of tables.
Closing this as it's related to #56 . Please re-open a new issue if you run into issues with the latest changes.
Still an issue with delta tables have contributed a fix: attached the file here could you please review and commit the fix?
The fix includes describing the table location and appending to the trailing path of ddl commands that is missing location for delta tables. I specifically add a check for delta because other tables need testing.
It looks like I do not have access to push so attached here.
It's unclear why we need your patch since the #56 was not address because there's an issue with Azure serializing the function. This fix doesn't address the issue.
I'd also recommend you look at forking a GitHub repo and how to submit pull requests.
The fix is needed because delta import at destination fails after export:
Get: https://adb-4425252124055355.15.azuredatabricks.net/api/1.2/commands/status
ERROR:
org.apache.spark.sql.AnalysisException: Cannot create table ('deltadb3
.orders_dbfspath
'). The associated location ('abfss://gobidatagen@adlsgen2passthroughtest.dfs.core.windows.net/deltaDB3/orders_dbfspath') is not empty.;
{'resultType': 'error', 'summary': 'org.apache.spark.sql.AnalysisException: Cannot create table ('deltadb3
.orders_dbfspath
'). The associated location ('abfss://gobidatagen@adlsgen2passthroughtest.dfs.core.windows.net/deltaDB3/orders_dbfspath') is not empty.;', 'cause': '---------------------------------------------------------------------------\nPy4JJavaError Traceback (most recent call last)\n/databricks/spark/python/pyspark/sql/utils.py in deco(*a, *kw)\n 62 try:\n---> 63 return f(a, **kw)\n 64 except py4j.protocol.Py4JJavaError as e:\n\n/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)\n 327 "An error occurred while calling {0}{1}{2}.\n".\n--> 328 format(target_id, ".", name), value)\n 329 else:\n\nPy4JJavaError: An error occurred while calling o211.sql.\n: org.apache.spark.sql.AnalysisException: Cannot create table (\'deltadb3
.orders_dbfspath
\'). The associated location (\'abfss://gobidatagen@adlsgen2passthroughtest.dfs.core.windows.net/deltaDB3/orders_dbfspath\') is not empty
We used this tool to migrate delta table. There is issue with delta table import:
1) There is a bug in migration tool which missed location path in create statement where table creation fails while doing metastore migration. The exact details are highlighted in the screenshots.
The migration tool generates delta import command in below format but it fails
Error:
ERROR:
The correct command should be
2) There are also issues with some external tables after migration which have .seq.gz files as extensions which need to be fixed as well. They return empty dataset.