Open randomgambit opened 7 years ago
Yes, but I don't have any great ideas about what to display.
@hadley
hehe, in case of doubt, always have a look at Pandas!
In particular the describe
method might be of interest to you http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.describe.html
An example below
data = pd.DataFrame({'string' : ['hadley', 'wi', 'ckam', 'hadley'],
'date' :[pd.to_datetime('2015-02-05 15:30'), pd.to_datetime('2015-02-05 15:30'), pd.to_datetime('2015-02-05 15:30'),pd.to_datetime('2015-02-05 17:30')],
'numeric' : [1., 2, 3, 4]})
data
Out[9]:
date numeric string
0 2015-02-05 15:30:00 1.0 hadley
1 2015-02-05 15:30:00 2.0 wi
2 2015-02-05 15:30:00 3.0 ckam
3 2015-02-05 17:30:00 4.0 hadley
data.describe(include = 'all')
Out[11]:
date numeric string
count 4 4.000000 4
unique 2 NaN 3
top 2015-02-05 15:30:00 NaN hadley
freq 3 NaN 2
first 2015-02-05 15:30:00 NaN NaN
last 2015-02-05 17:30:00 NaN NaN
mean NaN 2.500000 NaN
std NaN 1.290994 NaN
min NaN 1.000000 NaN
25% NaN 1.750000 NaN
50% NaN 2.500000 NaN
75% NaN 3.250000 NaN
max NaN 4.000000 NaN
Unordered suggestions of varying usefulness:
Hi @hadley,
I am glad you plan on powering up the base summary function. Following up on https://github.com/hadley/vctrs/issues/17, do you plan on adding support for datetime variables?
Thanks