pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.58k stars 17.9k forks source link

Deprecation of Panel ? #13563

Closed jorisvandenbossche closed 7 years ago

jorisvandenbossche commented 8 years ago

This is a topic that has come up recently (https://github.com/pydata/pandas/issues/10000, https://github.com/pydata/pandas/issues/8906, pandas-dev mailing list discussion), let's make this an issue to track the discussion about it.

Deprecating Panels would be a rather large change, so:

cc @pydata/pandas @MaximilianR

sinhrks commented 8 years ago

I'm +1 on moving to xarray, but GitHub search shows the deprecation is not easy... As long as I know about popular packages, pydata/data-reader and quantopian/zipline uses Panel.

CC @davidastephens @ehebert

max-sixty commented 8 years ago

No change this end - we are still using xarray heavily, and it's working beautifully. We've also improved the integration of xarray & pandas, so that should ease the path to deprecation.

wesm commented 8 years ago

I'm +1 on deprecating Panels; @jreback moved mountains to create a consistent internal object model from 1 to N dimensions, but there is still a feeling of second-class citizenry when it comes to working with data over 2 dimensions. I think we would be better served in the long run by really optimizing for the 1 and 2-dimensional use cases (similar to what the R community has done, though the API surface area of dplyr, data.table, and built-in data frames is quite a bit smaller than pandas -- primarily lacking in the level of indexing complexity).

I maintain that we should plan for a pandas 0.X.Y long-term support LTS release branch that becomes bugfix only so that we can start investing in renovations. I'm interested in feedback from the other core devs how realistic you feel this is.

I've long worried about the amount of baggage we are carrying forward -- there are many organizations with large codebases that have made their peace with pandas's rough edges (data type issues, view / copying semantics, etc.), and it doesn't make sense to abandon them. On the flip side, it would be a shame to be held back from undertaking a more aggressive cleanup and retool of the internals to introduce better performance, extensibility, missing data / data type issues, etc. I regret that 6 months have passed since I brought up this grand scheme and I haven't been able to carve out the time to make a dent, beyond demo'ing a proof-of-concept of integer NAs. Also, I would feel much better about working on this on a long-lived branch (similar to what happened with IPython) under some kind of feature freeze.

Anyway, some of these comments are beyond the scope of this issue. I don't think we should deprecate Panel unless we're collectively on board to the idea of cleaning up pandas internals over the next 12-24 months (which is as much of a code organization problem as anything -- particularly quarantining unit tests that we are contemplating "breaking").

den-run-ai commented 6 years ago

There are plenty of examples using panel in SO:

https://stackoverflow.com/questions/tagged/panel+pandas

One particular one I'm not sure how to port and do not want to depend on xarray is this one:

https://stackoverflow.com/a/23088780/2230844

jaypeedevlin commented 6 years ago

I noticed today that none of the docs for the panel class/methods seem to have notification around the fact that it's deprecated.

There's the 'deprecate panel' in the 0.22.0 'what's new', but it seems likely that people may not see that if they're searching for panel or following direct links to the docs.

I can see this example of a deprecation note in a docstring, which subjectively doesn't seem to draw a lot of attention to itself. Is there a convention for these that's a little bit more 'attention-grabbing'? Once I know of the best way, I'm happy to submit a PR.

Edit: Actually, just found https://github.com/pandas-dev/pandas/commit/1d32264c62c8c43f0e728328c4abfc452d98609d which seems to indicate exactly what to do in this instance.

jorisvandenbossche commented 6 years ago

There is a deprecation in the user guide, and a warning when you actually use it, but you are certainly correct we could add a notice in all docstrings as well to give this more visibility.

Typically a .. deprecated:: sphinx directive is the way to go to add such deprecations.

PR very welcome!

joseortiz3 commented 5 years ago

I'll be the first to protest deprecation of panels, specifically the need to rewrite legacy code. I have plenty of legacy code for finance for which conversion to multi-index is very painful, code which now spews panel warnings despite working flawlessly. Of course, I write any new code only using multi-index dataframes (which have a significantly higher learning curve, which I am happy that I overcame).

Note about feeling that "3 or more dimensions feels like second-class usage", I would note that there is a deep asymmetry even between the dimensions of a 2D pandas object - columns and rows are explicitly treated differently in pandas, with rows being second-class to columns in a highly non-intuitive way, disobeying the mathematical symmetries of matrices. Food for thought. Then again, often the dimensions of real-life data are inherently asymmetric, since time is a very special type of dimension.

wesm commented 5 years ago

@joseortiz3 the problem has less to do whether there are users of the code and more about whether there is sufficient bandwidth to maintain the code. If there isn't a motivated developer base to support a component of an open source software project, it doesn't seem reasonable that maintainers of the rest of the project should be burdened by it.

The general thinking (and @jreback and others can comment) is that having > 2 dimensional data structures has made many parts of the codebase significantly more difficult to develop and maintain. This has a high long term cost. Given pandas's funding situation (or lack thereof) I don't see how it is tenable

jreback commented 5 years ago

The general thinking (and @jreback and others can comment) is that having > 2 dimensional data structures has made many parts of the codebase significantly more difficult to develop and maintain. This has a high long term cost. Given pandas's funding situation (or lack thereof) I don't see how it is tenable

This is exactly right. Furthermore, pandas has quite a number of pull requests coming daily and many open issues (2600+). We have a limited amount of core devs (12), so there is a natural limitation to how much the (already huge) scope of pandas can be. Panel is not nearly as mature as other aspects of pandas and would be better served by separate motivated maintainers. Note that there is already quite an overlap with the xarray package in use cases.

joseortiz3 commented 5 years ago

Totally reasonable, of course. Would it be so difficult to write a "panel wrapper" that has a panel-like interface to what is actually a multi-index dataframe? It wouldn't need to implement all of the methods of panel, it would just allow the for 90% of legacy code to be rewritten via a simple from ____ import PanelMultiIndex as Panel. If I had time and/or money! Some day.