pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.77k stars 17.97k forks source link

BUG: `Styler.map_index()` breaks command-query separation by mutating AND returning the Styler #59360

Open wjandrea opened 3 months ago

wjandrea commented 3 months ago

I noticed this by messing up the callback signature, but when I fixed the problem, the error kept happening.

Example setup:

df = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"])
st = df.style

I meant to write this (makes the index values red):

st.applymap_index(lambda _: 'color: #ff0000', axis='index')

But accidentally wrote this (returning a list):

st.applymap_index(lambda _: ['color: #000000'], axis='index')

This raises ValueError: too many values to unpack (expected 2)

After fixing the callback and re-running the Jupyter cell, the issue kept appearing, which didn't make any sense to me.

I found the problem by looking at the source code:

https://github.com/pandas-dev/pandas/blob/7acd629fea2a32d1ace93ceab2b62d5f5f9b2d47/pandas/io/formats/style.py#L2001-L2015

This modifies the Styler object's _todo attribute and returns the Styler, which is confusing and breaks the command-query separation that Python code normally follows (e.g. list.sort() doesn't return the sorted list; use sorted(list) instead).

I believe you could fix this by making a copy of the Styler object, modifying its _todo, and returning the copy.


mroeschke commented 3 months ago

cc @attack68

attack68 commented 3 months ago

Method chaining and returning self has been the standard design for Styler for quite a bit more than 5 years. It doesn't just affect this method but many others. Also, adding the method call to a todo object stored on the Styler for lazy evaluation is by far the most efficient means of performing this function.

I appreciate your perspective but do not believe that everyone shares it, particularly since your reported error here is based on the re-run of an edited Jupyter cell, and the method remains in memory. This error would not appear if you ran this in a script, for example, or cleared and re-ran all the Jupyter cells. Therefore this isn't a bug, but is in fact working as expected and as currently documented.

Copying a Styler may be very inefficient depending on what other parameters have been configured. Also the code for copying a styler is quite complex, so returning a copy is not a preferable solution from a code maintenance perspective, or from a user perspective if they use many functions in chain.

But the main reason this isn't a suitable change is that it would fundamentally change and break the design structure for all of the users of pandas who have already implemented programs based on its current design.

Many thanks for your feedback and post to the board.

wjandrea commented 3 months ago

@attack68

this isn't a bug, but is in fact working as expected and as currently documented.

Where is it documented that Styler methods modify self? I looked at Style in the API reference and Table Visualization in the user guide, especially § Limitations — and .map_index() of course.


Secondarily, some clarifications:

This error would not appear if you ran this in a script, for example, or cleared and re-ran all the Jupyter cells.

Yes, I realize that, but lots of people use Pandas interactively and re-running everything isn't always reasonable.

adding the method call to a todo object stored on the Styler for lazy evaluation is by far the most efficient means of performing this function.

Absolutely, I never said anything against that.

the method remains in memory.

It remains in memory because the Styler is holding a reference to it, which I'm saying it shouldn't, ideally.

attack68 commented 3 months ago

It remains in memory because the Styler is holding a reference to it, which I'm saying it shouldn't, ideally.

For arguments sake, suppose I agree with you. This is first issue of its kind in probably 7 years. It came about becuase your code had a manually imposed bug which persisted in memory of a specific IDE. If your proposal was implemented, even with appropriate breaking changes release notices, I guarantee there would be far more issues and users posting about their Styler workflows breaking 'unexpectedly'. Even if this is theoretically sensible, it is not practical for existing users.

wjandrea commented 3 months ago

@attack68 I'm not looking for a debate, I just want to know where it's documented.