pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.22k stars 17.78k forks source link

Option for making pandas dataframe completely immutable #16567

Closed mitar closed 7 years ago

mitar commented 7 years ago

It seems currently there is no option similar to numpy's setflags to make pandas dataframe completely immutable (writeable=false). We are considering a design where we use immutability to know that we can cache objects. While we can still design a system like that, it would be great if we could enforce immutability to catch any errors.

jreback commented 7 years ago

This is not in-scope for pandas 1.x, it could be done in a sub-class.

Virtually all operations return new objects. Simply don't use inplace flags, nor do in-place indexing and you have de-facto immutability.

jreback commented 7 years ago

you can also use pandas.util.hash_pandas_object to make data hashes as well.

mitar commented 7 years ago

Oh. :-( As a subclass it is pretty tricky, because you have to make sure you shadow over Pandas methods which can potentially change internals. Hashing is also not enough. It can tell you that something changed, but not prevent changing.

mitar commented 7 years ago

Is there a way to tell pandas to create a pandas object using a subclass?

jreback commented 7 years ago

http://pandas-docs.github.io/pandas-docs-travis/internals.html

fkromer commented 4 years ago

@mitar @jreback @TomAugspurger I know of this third party package addressing this issue: static-frame. Do you know other packages as well?

TomAugspurger commented 4 years ago

Nope.

On Fri, Apr 24, 2020 at 7:55 AM Florian Kromer notifications@github.com wrote:

@mitar https://github.com/mitar @jreback https://github.com/jreback @TomAugspurger https://github.com/TomAugspurger I know of this third party package addressing this issue: static-frame https://github.com/InvestmentSystems/static-frame. Do you know other packages as well?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/16567#issuecomment-618991559, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOITVIQ7OYO54HHQAYR3ROGD5NANCNFSM4DNSVC5A .

fkromer commented 4 years ago

Is there is also no tooling support (e.g. pylint extension) to ensure people don't introduce bugs into functionality which takes a dataframe as input, manipulates the dataframe and output the manipulated dataframes by mistake?

fkromer commented 4 years ago

For functions a workaround could be to implement a decorator for dynamic analysis: It would have to check which args and kwargs are dataframes or series. For e.g. a single dataframe it would have to calculate the hash of the input dataframe df_in before and after the wrapped function is called (df_in_hash_before = pd.util.hash_pandas_object(df_in, index=True), df_in_hash_after = pd.util.hash_pandas_object(df_in, index=True)) and assert if the hashes differ (pd.testing.assert_series_equal(input_df_hash, output_df_hash)).

mitar commented 4 years ago

Yes, I gave up on this. I find it really sad because Pandas is almost there. Many methods have in_place argument and it would be great if you could just enforce this to be False and prevent any other modifications. Getting a copy every time by design (when enabled).

fkromer commented 4 years ago

To get this into pandas would be too optimistic I guess. I'm thinking about to implement the decorator and publishing it in a package pytest-pandas. This would allow to add dynamic anaysis of "mutability conformity" during tests.