wesm / pandas2

Design documents and code for the pandas 2.0 effort.
https://pandas-dev.github.io/pandas2/
306 stars 41 forks source link

API: .to_numpy() #69

Open jreback opened 7 years ago

jreback commented 7 years ago

xref https://github.com/pandas-dev/pandas/issues/14052

currently we have an (implicity) numpy conversion when we access .values of a 1D (Series). This mostly returns a numpy array, though we do return numpy-like objects several dtypes:

This also has implications when we have a 2D object (DataFrame). we use a type that can safely hold all of the data:

so generally this is ok for 2D in that you preserve as much as possible (though of course you must copy / return heavyweight object array at times).

So need some though on how to make this api look & validate cases.

I would propose .to_numpy() (a function, so we can potentially pass options). and it won't break the current API (which we can preserve I think / provide back-compat). w/o making libpandas jump thru hoops to support the 'old' stuff.

wesm commented 7 years ago

I agree with this -- it would be helpful to start migrating away from the .values API toward something more explicit to ease the burden. We might even want to introduce a logging layer into pandas 1.0 to alert users to use of "non-future proof" APIs

jreback commented 7 years ago

right I suppose could instrument things maybe via an option

pandas.options.future.logging='warn'|'raise'|'ignore'

and namespaced a bit for .future.* options if we need.