Currently, observations with date time data that can't be easily converted into a datetime object (that Pandas can work with) get dropped. I have not put any time into figuring out why they are not properly converted. Check how many observations get dropped, and figure out whether there are ways we can change the datetime parsing to preserve more data.
IMPORTANT: The dtypes have to be handled properly or parquet will kill the pipeline. Using parquet format is non-negotiable for this project (other than if / when we have to use something like Mongo DB).
Currently, observations with date time data that can't be easily converted into a
datetime
object (thatPandas
can work with) get dropped. I have not put any time into figuring out why they are not properly converted. Check how many observations get dropped, and figure out whether there are ways we can change thedatetime
parsing to preserve more data.IMPORTANT: The
dtypes
have to be handled properly orparquet
will kill the pipeline. Usingparquet
format is non-negotiable for this project (other than if / when we have to use something like Mongo DB).