jeppe742 / DeltaLakeReader

Read Delta tables without any Spark
Apache License 2.0
47 stars 14 forks source link

SchemaString Key Error #58

Open crisnaX opened 6 months ago

crisnaX commented 6 months ago

While trying to load delta table getting shemaString key error at apply_partial_logs function. This issue due to some imperfections in checkpoint file like metadata doesn't have schemaString in one of the checkpoint file. I need to skip that file and load delta table. For this we need to add error handling functionality. IMG_20240309_031808

jeppe742 commented 6 months ago

Hey @crisnaX I'm a bit surprised you even managed to get this error. According to the delta protocol the schemaString is a required field. What did you use to create the delta table?

crisnaX commented 6 months ago

We are using databricks streaming tables. pyspark will create delta tables. For the negative testing we send the change in schema or some other violated files to check the capability of handling of our data quality rules. During that time the delta log records are not having this schemaString. When I am using delta lake reader from my vscode, I was getting this error at such particular log file. For that,I have raised a pull request.

crisnaX commented 6 months ago

Just Try and Except added to partial logs,I was able to resolve that issue. You are correct according to delta protocol schema String is required field. But, exceptional cases like schema mismatch this is happening.

crisnaX commented 6 months ago

Any update @jeppe742 ?