wesm / pandas2

Design documents and code for the pandas 2.0 effort.
https://pandas-dev.github.io/pandas2/
306 stars 41 forks source link

Is this project advancing? #79

Open sursu opened 4 years ago

sursu commented 4 years ago

Just a question:

I see that the latest commit in this repository has been more than 2 years ago. Is this project meant to replace pandas and if so: is it advancing? This discusion on Reddit offers few answers.

The ideas proposed seem really appealing to me. Especially the "judicious and responsible use of modern C++".

If not, I suggest there to be a note to redirect potential enthusiasts to projects alike where there is a need for contributions.

toobaz commented 4 years ago

If not, I suggest there to be a note to redirect potential enthusiasts to projects alike where there is a need for contributions.

pandas is definitely one ;-)

datapythonista commented 4 years ago

I think the content of this repo is somethow superseded by the pandas roadmap: https://pandas.pydata.org/pandas-docs/stable/development/roadmap.html

But there are good ideas here and the documentation is still valid, as a direction for the development of pandas.

But there is no work in parallel to pandas and Apache Arrow development specific to a pandas 2 project.

On Wed, Sep 25, 2019 at 8:28 AM Pietro Battiston notifications@github.com wrote:

If not, I suggest there to be a note to redirect potential enthusiasts to projects alike where there is a need for contributions.

pandas https://github.com/pandas-dev/pandas/issues is definitely one ;-)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas2/issues/79?email_source=notifications&email_token=ACMXUADQPUBT6KTPAAISKYDQLNDOPA5CNFSM4I2LNL22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7RR23Q#issuecomment-534977902, or mute the thread https://github.com/notifications/unsubscribe-auth/ACMXUACS72Q7MV6SFJ6Z7CTQLNDOPANCNFSM4I2LNL2Q .

sursu commented 4 years ago

My understanding is that these proposed changes will gradually be implemented in pandas, and there won't be a pandas2.

While I see that rewriting of the Block Manager is in the roadmap, I don't see there the Building “libpandas” in C++11/14 for lowest level implementation tier.

I am curious to know whether there was a decision to stick with Cython.

If I am wrong and C++ is being embraced, I imagine that some implementations will have to coexist: methods in C++ and methods in Cython. Are there already examples of that?

jorisvandenbossche commented 4 years ago

A big part of the ideas that are listed in this repo (certainly the page on the "data structure changes") evolved into Wes starting Arrow. So for now that is where the C++ work is being done, there are no short-term plans to do that in pandas itself (but we might start using Arrow).

For the BlockManager rewrite, there is currently no concrete decision whatsoever (so also not to stick with cython), except that it could be beneficial. That's an item of the roadmap that needs more to be discussed/detailed more.

We should probably update the README of this repo to reflect this status better.

I imagine that some implementations will have to coexist: methods in C++ and methods in Cython

pyarrow is an example of that.

wesm commented 4 years ago

@sursu indeed one of my primary motivations in developing the Apache Arrow project (which has more or less been my primary focus since sometime in 2015) is to develop next-generation data frame internals, and to do so in a way that doesn't create another large codebase owned by a small Python-only core development team. We're developing Arrow with the help of a much larger core community.

pandas has millions of users so advancing the goals from the "pandas2" discussion will take years of work to make progress without disrupting existing users. There is also the very important question of who will pay for the work.