Open jreback opened 8 years ago
xfref #12806
cc @BastiaanBergman
I realized as merging #12803 that we didn't actually have to do this in cython and instead is a trivial map operation.
In [7]: s.dt.weekday.map(dict(enumerate(['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'])))
Out[7]:
0 Tuesday
1 Wednesday
2 Thursday
3 Friday
4 Saturday
5 Sunday
6 Monday
7 Tuesday
8 Wednesday
9 Thursday
dtype: object
And if you categorize its even easier (and way more efficient)
In [18]: cats
Out[18]: ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
In [19]: s.dt.weekday.astype('category',ordered=True).cat.rename_categories(cats)
Out[19]:
0 Tuesday
1 Wednesday
2 Thursday
3 Friday
4 Saturday
5 Sunday
6 Monday
7 Tuesday
8 Wednesday
9 Thursday
dtype: category
Categories (7, object): [Monday < Tuesday < Wednesday < Thursday < Friday < Saturday < Sunday]
I don't know what the speed implications are for big dataframes. In any case, implementing alongside the existing Cython code wasn't exactly un-trivial.
On Tue, Apr 26, 2016 at 6:52 AM, Jeff Reback notifications@github.com wrote:
And if you categorize its even easier.
In [18]: cats Out[18]: ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
In [19]: s.dt.weekday.astype('category',ordered=True).cat.rename_categories(cats) Out[19]: 0 Tuesday 1 Wednesday 2 Thursday 3 Friday 4 Saturday 5 Sunday 6 Monday 7 Tuesday 8 Wednesday 9 Thursday dtype: category Categories (7, object): [Monday < Tuesday < Wednesday < Thursday < Friday < Saturday < Sunday]
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/pydata/pandas/issues/12993#issuecomment-214751982
@BastiaanBergman no, what I mean is that THIS impl is trivial. Of course the cython is not :<
I would think they shouldn't be ordered (because it's cyclic). An order would probably only enable .max()
, and .min()
, right?
well also allows comparisons, e.g.
In [4]: os = s.dt.weekday.astype('category',ordered=True).cat.rename_categories(cats)
In [5]: os
Out[5]:
0 Tuesday
1 Wednesday
2 Thursday
3 Friday
4 Saturday
5 Sunday
6 Monday
7 Tuesday
8 Wednesday
9 Thursday
dtype: category
Categories (7, object): [Monday < Tuesday < Wednesday < Thursday < Friday < Saturday < Sunday]
In [9]: os
Out[9]:
0 Tuesday
1 Wednesday
2 Thursday
3 Friday
4 Saturday
5 Sunday
6 Monday
7 Tuesday
8 Wednesday
9 Thursday
dtype: category
Categories (7, object): [Monday < Tuesday < Wednesday < Thursday < Friday < Saturday < Sunday]
In [10]: os.min()
Out[10]: 'Monday'
In [11]: os<'Wednesday'
Out[11]:
0 True
1 False
2 False
3 False
4 False
5 False
6 True
7 True
8 False
9 False
dtype: bool
I'd like to give this a try. Can I work on this?
Go for it @sivakar12! Some of the files you may want to edit are in this recent PR https://github.com/pandas-dev/pandas/pull/18164/files
I found that categorical is not defined in the Cython code. So I focused on the DatetimeIndex class, tried calling as_type, returning a CategoricalIndex from the _field_accessor method there. They are not working and I always end up getting dtype: object. What am I missing?
After the index is created, you can either use the map
function or astype
with predefined categories as described in these comments: https://github.com/pandas-dev/pandas/issues/12993#issuecomment-214751982 or https://github.com/pandas-dev/pandas/issues/12993#issuecomment-214751314
I made DatetimeIndex
class return a CategoricalIndex
when weekday_name property is accessed. But the output of s.dt.weekday_name
returns a DatetimeProperties
object which seems to convert it back to object
type.
The code in the comments apply map
or astype
on an instance of DatetimeProperties
not on DatetimeIndex
which works fine.
I can't figure out what's going on inside DatetimeProperties
Feel free to open a pull request (you can mark it as a work in progress) with your initial changes. It will be easier for us to review and help debug the issue.
Not wild about making DatetimeArray have a dependency on Categorical (which in turn has dependency on Index)
this would be an indirect dependency and is for user convenience
12803 added
.dt.weekday_name
. I think its appropriate to return this (and.weekday
) as ordered categoricals