ttricco / sarracen

A Python library for smoothed particle hydrodynamics (SPH) analysis and visualization.
https://sarracen.readthedocs.io
GNU General Public License v3.0
15 stars 18 forks source link

loc[] and iloc[] do not update special columns #71

Closed ttricco closed 8 months ago

ttricco commented 8 months ago

Slicing and filtering operations on a SarracenDataFrame should return copies that have updated their special columns.

For example, if the original dataframe has a mass column, and the sliced dataframe does not, then mcol should be updated.

This works for drop() and [] (aka __getitem__()). The dataframe copies will have their special columns updated.

This does not work for .loc[] or .iloc[]. The copy of the dataframe will have the special columns of the original dataframe, even if those columns have been filtered out. That is, for example, mcol might still be set even though the mass column is no longer present in the copy.

The complication is that .loc and .iloc are attributes (properties) that return _LocIndexer and _iLocIndexer objects. It is the __getitem__() methods of these objects that are actually called to do the slicing. In other words, .loc[] and .iloc[] are actually two steps -- .loc or .iloc followed by []. Thus updating the special columns (or any private attributes) of the copy might require overriding the __getitem__() method in these indexer objects, as they are what return the dataframe copy. I don't like the idea of subclassing private classes and seems deep enough in the weeds to cause unintended complications.