mclevey / podlm

Probabilistic Opinion Dynamics with Language Models
MIT License
1 stars 0 forks source link

Improve processing for date time data #9

Open mclevey opened 10 months ago

mclevey commented 10 months ago

Currently, observations with date time data that can't be easily converted into a datetime object (that Pandas can work with) get dropped. I have not put any time into figuring out why they are not properly converted. Check how many observations get dropped, and figure out whether there are ways we can change the datetime parsing to preserve more data.

IMPORTANT: The dtypes have to be handled properly or parquet will kill the pipeline. Using parquet format is non-negotiable for this project (other than if / when we have to use something like Mongo DB).