hadley / precis

Succintly Summarise Data Frames
64 stars 4 forks source link

datetime variables supported? #1

Open randomgambit opened 7 years ago

randomgambit commented 7 years ago

Hi @hadley,

I am glad you plan on powering up the base summary function. Following up on https://github.com/hadley/vctrs/issues/17, do you plan on adding support for datetime variables?

Thanks

hadley commented 7 years ago

Yes, but I don't have any great ideas about what to display.

randomgambit commented 7 years ago

@hadley

hehe, in case of doubt, always have a look at Pandas!

In particular the describe method might be of interest to you http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.describe.html

An example below

data = pd.DataFrame({'string' : ['hadley', 'wi', 'ckam', 'hadley'],
                     'date' :[pd.to_datetime('2015-02-05 15:30'), pd.to_datetime('2015-02-05 15:30'), pd.to_datetime('2015-02-05 15:30'),pd.to_datetime('2015-02-05 17:30')],
                     'numeric' : [1., 2, 3, 4]})

data
Out[9]: 
                 date  numeric  string
0 2015-02-05 15:30:00      1.0  hadley
1 2015-02-05 15:30:00      2.0      wi
2 2015-02-05 15:30:00      3.0    ckam
3 2015-02-05 17:30:00      4.0  hadley

data.describe(include = 'all')
Out[11]: 
                       date   numeric  string
count                     4  4.000000       4
unique                    2       NaN       3
top     2015-02-05 15:30:00       NaN  hadley
freq                      3       NaN       2
first   2015-02-05 15:30:00       NaN     NaN
last    2015-02-05 17:30:00       NaN     NaN
mean                    NaN  2.500000     NaN
std                     NaN  1.290994     NaN
min                     NaN  1.000000     NaN
25%                     NaN  1.750000     NaN
50%                     NaN  2.500000     NaN
75%                     NaN  3.250000     NaN
max                     NaN  4.000000     NaN
bearloga commented 7 years ago

Unordered suggestions of varying usefulness: