pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.73k stars 17.95k forks source link

API: revisit adding datetime-like ops in Series #7207

Closed jreback closed 10 years ago

jreback commented 10 years ago

related is #7146, #7206

maybe provide a to_index() function and/or a class-like accessor

.values_to_index (too long though)

shoyer commented 10 years ago

@jreback yes, the idea is to hide the (surprising) DatetimeIndex under the hood. (which, IMHO, is still a bit of a hack -- it would be nice to have the non-index functionality separated out in a DatetimeArray object like Categorical)

jorisvandenbossche commented 10 years ago

+1 on returning Series. That is the point of these methods I think: it are attributes/methods that act on the Series, so should return Series. That it does a pass-by through DatetimeIndex under the hood is an implementation detail.

jreback commented 10 years ago

@shoyer I think DatetimeArray could be made to work, but might not be necessary.

If you have a sketch of it, pls post.

shoyer commented 10 years ago

@jreback Here is the general idea: DatetimeArray would wrap a np.datetime64, but include all the necessary numpy fixes (e.g., NaT, comparison) and the non-index specific properties/methods of DatetimeIndex (e.g., all these datetime components like day, week, month, etc.). This would leave DatetimeIndex as an Index + DatetimeArray adding the index specific methods.

As I mentioned above, you could even return a DatetimeArray from series.values (although that might be a surprising, given how everyone is used to values being a ndarray).

I think that it would be a nice idea in theory.... but again it's not my top priority.

jankatins commented 10 years ago

I would suggest going the other way by adding a CategoricalNamespace (or CategoricalSeriesAPI or whatever it is named) object which would wrap Categorical. If category support ever changes (-> numpy implementation), this wouldn't need a API change.

jreback commented 10 years ago

@JanSchulz Categorical IS already a namespace. No need for further indirection

jankatins commented 10 years ago

@jreback: I disagree: it has the same problem as an index, lot's of methods which are not API (basically the API as Series.cat is reorder_levels, levels, and remove_unused_levels). Should we comment that in #7768?

jreback commented 10 years ago

@JanSchulz let's leave the Categorical stuff to #7768

jreback commented 10 years ago

pls have a look at updated #7953. removed a lot of the clutter from the previous impl (which did a form of redirection), and added docs.