jeppe742 / DeltaLakeReader

Read Delta tables without any Spark
Apache License 2.0
47 stars 14 forks source link

DataTable is returning proper columns, but with null data #52

Open m-glisson opened 11 months ago

m-glisson commented 11 months ago

I have a data set that I unfortunately can't share, however its hosted on S3.

I can load the data in using

delta_table_path = 's3://my/delta/path'
df = DeltaTable(delta_table_path, file_system=fs).to_pandas() 

this comes across with the correct column names, and seemingly the correct row count, however all of the data int the dataset is null which is not the case because we do have this data picked up in spark and generating output tables

Sorry for the vague response, I'm just looking for some advice or if this is a known issue

jeppe742 commented 11 months ago

Hey @m-glisson I'm not aware of this issue. Unfortunately, I hope you understand that it is almost impossible for me to troubleshoot without any more information. Would you at least be able to provide one of the json files from the _delta_log, or maybe just the schema of the table?