wesm / pandas2

Design documents and code for the pandas 2.0 effort.
https://pandas-dev.github.io/pandas2/
306 stars 41 forks source link

Supporting current panel use cases, interactions with xarray #62

Open wesm opened 7 years ago

wesm commented 7 years ago

The most common use case for panels I've seen has been as an aligning container for data frames -- you can insert a DataFrame "item" as you would a column normally. This can alleviate some awkwardness when working with multi-indexed data.

Couple questions around panels:

In either case, we'd be eliminating a bunch of thinly supported code

shoyer commented 7 years ago

I agree, keeping around Panel as a simple data container could make sense. I have also found it to be useful as an intermediate data structure for easier data alignment, though I can't think of particular use cases off the top of my head.

CC @MaximilianR

max-sixty commented 7 years ago

I don't have a strong view.

xarray is pretty good for aligning! So I predominately use that:

In [5]: df = pd.DataFrame(np.random.rand(3,4), columns=list('abcd'))

In [6]: df
Out[6]:
          a         b         c         d
0  0.164063  0.014835  0.529693  0.268561
1  0.076066  0.598840  0.887823  0.566114
2  0.599438  0.021646  0.775174  0.959695

In [7]: xr.Dataset({'first': df, 'second': df[list('ab')]})
Out[7]:
<xarray.Dataset>
Dimensions:  (dim_0: 3, dim_1: 4)
Coordinates:
  * dim_0    (dim_0) int64 0 1 2
  * dim_1    (dim_1) object 'a' 'b' 'c' 'd'
Data variables:
    second   (dim_0, dim_1) float64 0.1641 0.01483 nan nan 0.07607 0.5988 ...
    first    (dim_0, dim_1) float64 0.1641 0.01483 0.5297 0.2686 0.07607 ...

And pandas' stack / unstacking is pretty good for swapping axes.

What's the use case where you'd need functionality in pandas?

we should consider the API that will replace the current to_panel and to_frame workflows

@jreback has built some good .to_xarray, and we've built some decent (not perfect yet) coercion by passing xarray & pandas objects into each others' constructors

jreback commented 7 years ago

this is merged: https://github.com/pandas-dev/pandas/pull/15601

so can think about this (at some point).