Maybe the filtering will make this moot - but in some cases at least "structure" column can be all NULL (None) in the output and then this is by default interpreted as int in writing out to parquet. Pandas can read this back OK due to it's own schema hints, but general parquet tools may fail on combining schema of multiple files in such a case (duckdb for instance, which can be "fixed" with union_by_name option).
We may want to do something about this, such that the written-out schema is consistent
Maybe the filtering will make this moot - but in some cases at least "structure" column can be all NULL (
None
) in the output and then this is by default interpreted asint
in writing out to parquet. Pandas can read this back OK due to it's own schema hints, but general parquet tools may fail on combining schema of multiple files in such a case (duckdb for instance, which can be "fixed" withunion_by_name
option).We may want to do something about this, such that the written-out schema is consistent