Even though parquet files have explicit per-column dtype metadata, pandas will overwrite these instructions for nullable integer columns and assign them as floats. Down the line, this causes overflow errors when numpy is trying to recast the epoch timestamps into datetimes.
More info: https://pandas.pydata.org/docs/user_guide/integer_na.html#nullable-integer-data-type
WIP.
A pr to fix: https://github.com/transitmatters/mbta-performance/issues/4
Even though parquet files have explicit per-column dtype metadata, pandas will overwrite these instructions for nullable integer columns and assign them as floats. Down the line, this causes overflow errors when numpy is trying to recast the epoch timestamps into datetimes. More info: https://pandas.pydata.org/docs/user_guide/integer_na.html#nullable-integer-data-type
Tests to come.