ssec-jhu / bluephos

automated computational tool streamlining the development and analysis of blue phosphorescent materials
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

Handle dtypes for all-null columns in parquet write #64

Open amitschang opened 2 months ago

amitschang commented 2 months ago

Maybe the filtering will make this moot - but in some cases at least "structure" column can be all NULL (None) in the output and then this is by default interpreted as int in writing out to parquet. Pandas can read this back OK due to it's own schema hints, but general parquet tools may fail on combining schema of multiple files in such a case (duckdb for instance, which can be "fixed" with union_by_name option).

We may want to do something about this, such that the written-out schema is consistent

xiangchenjhu commented 2 months ago

We can explicitly set the column type with 'as type('str') to resolve it and I will test it and push PR for it later