Closed bivald closed 3 years ago
hi Niklas -- well "pandas2" likely won't be called "pandas2", but the goal is to have copy-on-write data frames and lazy copying in the future, so you can create a "shallow" copy of a DataFrame that does not perform any memory copying until mutation occurs.
Note I've just created an organization to raise money to work on these problems (see https://ursalabs.org/tech/) -- I don't know how long it will take to see things built, but at the current rate it's likely to take some years.
Closing this due to age, feel free to open it up again if you want to
Hi,
I'm not sure the proper way to give feedback to the design phase of pandas 2.x, feel free to move this elsewhere. I know that immutable dataframes are out-of-scope for pandas 1.x (https://github.com/pandas-dev/pandas/issues/16567), but I would love to see this feature in pandas 2.x.
The background is when using low latency computation frameworks such as Dask Distributed (https://github.com/dask/distributed/) more people (myself included) are starting to view Dask+Pandas as a real-time query engine that rivals and surpasses most traditional databases in several use cases. However, Dask using Pandas is therefore not immutable: You can have a helper function which accidentally alters the data and corrupts your in-memory storage.
More background on the use case on https://stackoverflow.com/questions/50017443/read-only-pandas-dataset-in-dask-distributed
Right now the options are basically:
But would love for it in Pandas 2.x to have an option for immutability. I know it's not a simple task though.
Regards, Niklas