wesm / feather

Feather: fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow
Apache License 2.0
2.74k stars 167 forks source link

Feather doesn't preserve datetime.date type when reading a dataframe #359

Closed palpen closed 5 years ago

palpen commented 5 years ago

My goal is to keep just the date portion of the datetime object in the 'date' column. Here's an example

import pandas as pd
import feather

data = {'date': ['2014-05-01 18:47:05.069722', '2014-05-01 18:47:05.119994', 
                 '2014-05-02 18:47:05.178768', '2014-05-02 18:47:05.230071', 
                 '2014-05-02 18:47:05.230071', '2014-05-02 18:47:05.280592']}
df = pd.DataFrame(data, columns = ['date'])

df['date'] = pd.to_datetime(df['date']).dt.date
print(type(df['date'].iloc[1]))
>>> datetime.date

However, when I feather save this data frame and, subsequently, load it, the 'date' column turns into a Panda's timestamp type.

df.to_feather('df')
df = feather.read_dataframe('df')
print(type(df['date'].iloc[1]))
>>> pandas._libs.tslibs.timestamps.Timestamp

I now have more information than I'd like since the timestamp includes time information, which I do not need.

Is this an expected behaviour?

wesm commented 5 years ago

It does appear to be the current behavior. I'm not sure it's deliberate, though. I opened https://issues.apache.org/jira/browse/ARROW-3899 in Apache Arrow about possibly changing things. Let's continue the discussion there