utterances-bot commented 1 year ago

The When of Lambda | The When of Python Blog

Lambdas are a Python language feature we should provide clear guidance on. Used well they simplify code and are perfectly readable; used poorly, and code becames opaque and bug-prone. Although use should be restrained, an accommodation should be made for Pandas and sorting, albeit with caveats to ensure usage doesn't compromise readability. And a => syntax could be nice.

https://when-of-python.github.io/blog/the-when-of-lambda.html

nathanjmcdougall commented 1 year ago

itemgetter is cute, but I see it like using map and filter instead of list comprehensions: i.e. not a good idea.

With Pandas, I don't generally like to put column/row indexing inside the apply's lambda. For example, I'd refactor this:

df.apply(lambda row: row['w'] * row['h'], axis='columns')

As this:

area = lambda w, h: w * h
df[['w', 'h']].apply(lambda row: area(*row), axis='columns')

Note that now a lambda isn't necessary for the area function.

Now a more complex example:

df.apply(lambda row: volume(width=row['w'], height=row['h'], depth=_DEFAULT_DEPTH), axis='columns')

Compare with this (the benefits become especially apparent when your function has 5 arguments rather than 2, and when your column names are more than one character):

df[['w', 'h']].apply(lambda row: volume(*row), depth=_DEFAULT_DEPTH, axis='columns')

Ultimately, I'd prefer if Pandas offered a pd.DataFrame.row_apply function which let us simply further to:

df[['w', 'h']].row_apply(volume, depth=_DEFAULT_DEPTH)

Which would let us avoid lambdas entirely. It seems such a common use case. Another solution which I think would be even better again would be to allow the column names to be passed as *args similar to how seaborn does things:

df.row_apply2(volume, width='w', row='h', depth=_DEFAULT_DEPTH)

Such a function could even be overloaded to accept either string column names or a series-like of actual values (again, this is how seaborn does things).

grantps commented 1 year ago

Thanks Nathan - I had to read it couple of times but I ended up agreeing. As a general point, the more we can make our Pandas readable the better. Opaque Pandas can hide so many bugs. As a community we need good conventions to make our code safe for humans.

grantps commented 1 year ago

A couple of years ago Trey Hunner ran a poll on sorting with lambda vs two alternatives: https://twitter.com/treyhunner/status/1225869341490630657. Lambda does seem very popular for this use case.

grantps commented 1 year ago

And when I reached out to Trey directly he had the following to say about the Pandas use case: "I do think writing pandas without lambda would be quite cumbersome as it seems designed for lambda (just based on how often functions need to be passed around)".

shearichard commented 1 year ago

I agree that the arrow function syntax is significantly clearer than the python approach, I think I would say this anyway but I have to admit that I do quite a bit of js and so this may influence me. Of course we're stuck with what we have now .... pending Python 4 😉.

I prefer itemgetter over lambdas (willing to give pandas a pass here) as it is, imho, harder to misinterpret when the hour is late and eyes are tired.

Two thumbs up for meaningful variables, too often once a lambda has been chosen it seems to become a contest to see how few characters it can be expressed in.

when-of-python / blog

blog/the-when-of-lambda #6

The When of Lambda | The When of Python Blog