jeppe742 / DeltaLakeReader

Read Delta tables without any Spark
Apache License 2.0
47 stars 14 forks source link

DelaTable function returns all versions of delta table #12

Closed nitzmali closed 3 years ago

nitzmali commented 3 years ago

from deltalake import DeltaTable df = DeltaTable("path_to_delta_table",file_system=fs).to_pandas()

In the latest release of 0.2.2, I have been trying to read a

delta_table_issue

delta table from S3 which only updates few rows. When I do a read on full delta table. The dataframe has both initial value and updated value. But, I only need the latest snapshot which is the latest update. Not all the updates ever done. Am I missing something? For validation, I verified by reading through Spark context and It returns only the latest snapshot. Any help?

For reference I have attached a snapshot of read from DeltaTable and read from Spark and the data frame has two and one row respectively.

jeppe742 commented 3 years ago

Hey @nitzmali, Thanks for spotting this. I had a small mistake in the code that should be fixed in #13 Also published the fix in version 0.2.3, if you want to test it out

nitzmali commented 3 years ago

Thanks as a lot @jeppe742 for quick response. It perfectly works fine now. Cheers.