pyjanitor-devs / pandas_flavor

The easy way to write your own flavor of Pandas
https://zsailer.github.io/software/pandas-flavor/
MIT License
297 stars 17 forks source link

How can I distinguish between inplace and copy operations? #20

Open DOH-Manada opened 3 years ago

DOH-Manada commented 3 years ago

Problem description: It's difficult to determine how/when to use inplace or create a copy and the behavior is inconsistent. Does pandas_flavor always require a copy/modifying the original dataframe? Is there a way to avoid this in order to save memory?

import pandas_flavor

# Does not replace original dataframe
@pandas_flavor.register_dataframe_method
def drop_empty_rows(dataframe):
    return dataframe.dropna(axis=0, how='all')

# Should replace original dataframe
@pandas_flavor.register_dataframe_method
def drop_empty_rows(dataframe):
    dataframe = dataframe.dropna(axis=0, how='all')
    return dataframe

# Should replace original dataframe
@pandas_flavor.register_dataframe_method
def drop_empty_rows(dataframe):
    dataframe_processed = dataframe.copy()
    dataframe_processed = dataframe.dropna(axis=0, how='all')
    return dataframe_processed

If I call the first function on a dataframe, it returns the dataframe with dropped rows but does not change the original dataframe.

dict_rows = {}
dict_rows['A']  = [20,numpy.nan,40,10,50]
dict_rows['B'] = [50,numpy.nan,10,40,50]
dict_rows['C'] = [30,numpy.nan,50,40,50]

dataframe = pandas.DataFrame(dict_rows)

This function returns the reduced dataframe, but doesn't affect the original dataframe.

>>> dataframe.drop_empty_rows()
      A     B     C
0  20.0  50.0  30.0
2  40.0  10.0  50.0
3  10.0  40.0  40.0
4  50.0  50.0  50.0
samukweku commented 1 year ago

@DOH-Manada pandas-flavor doesnt copy. It just provides a convenient interface. The real work is in pandas. you can also enable pandas copy on write which should take care of copy or not copy decisions for better memory performance