python / cpython

The Python programming language
https://www.python.org
Other
63.35k stars 30.34k forks source link

Mention `all()` & `any()` are only efficient for existing iterables #119187

Closed nineteendo closed 5 months ago

nineteendo commented 5 months ago

Documentation

I think it would be useful to mention that a for loop is more efficient in a case like this:

-return any(key in m for m in self.maps)
+for mapping in self.maps:
+    if key in mapping:
+        return True
+return False

See also this rejected proposal: https://github.com/nineteendo/for-any-each

Linked PRs

AlexWaygood commented 5 months ago

It's not at all clear that this will remain true in the future. The fact that this is currently true is due to the current implementation of generators, and the current implementation details of any() and all().

The implementation details of generators have changed a lot in recent versions, and will likely continue to change in upcoming versions as we continue to optimise CPython. In Python 3.12, we added a new implementation of list/dict/set comprehensions, which means that they are now inlined (speeding them up greatly); we could very possibly do something like this with generator expressions in the future.

When it comes to any() and all(), meanwhile: people have floated, several times, reimplementing builtins.any() and builtins.all() in pure Python (e.g. https://github.com/faster-cpython/ideas/discussions/373). It's possible that this might actually be faster than the current implementation in C now (due to the overhead of crossing the Python/C boundary). Once the new JIT has been improved and has become the default build of CPython, it will be even more likely that reimplementing these functions in Python will be faster than the current implementations in C.

We shouldn't document details like this that have such high likelihood of changing in the near future, especially if it will encourage people to use more ugly antipatterns in their code.

nineteendo commented 5 months ago

I'll get back with a benchmark comparing a Python implementation.

nineteendo commented 5 months ago

Don't get me wrong, I would much prefer to use all() & any() over for loops. But the fact that this isn't documented at all (no pun intended) isn't ideal.

Also, with all() and any() not being keywords, I don't see a way for them to even match the performance of a for loop. Even if it's a tiny bit slower, it should be mentioned.

rhettinger commented 5 months ago

I concur with Alex. This is implementation specific and is out of line with what we would normally include in the documentation.

Also, people who care about extreme performance tend to run profilers to identify their hotspots, or they use PyPy or Cython. Python itself is not a squeeze-out-every-clock-cycle language, people choose it for its clarity and economy of expression.

Side note: This isn't specific to any and all. You could say the same about min, max, sum, or anything else that consumes an iterable. In general, code written in pure python (a generator expression or list comprehension) tends to be slower than "vectorized" code were the components are implemented in C (the itertools for example).

ericvsmith commented 5 months ago

FWIW, I concur with Raymond and Alex. We couldn't possibly document the tradeoffs for every single code construct. If it's important to you, profile it, and expect it to change over time.

pochmann3 commented 5 months ago

@AlexWaygood Even with all those optimizations, won't any(genexp) still have the overhead of transmitting values from the generator iterator to any? Can it actually get as fast as the "for loop" way?

AlexWaygood commented 5 months ago

@AlexWaygood Even with all those optimizations, won't any(genexp) still have the overhead of transmitting values from the generator iterator to any? Can it actually get as fast as the "for loop" way?

Maybe not exactly as fast, but I'm pretty sure we can get close enough that it wouldn't make a significant difference unless it's an extremely hot loop. As well as those optimisations mentioned above, Ken Jin's talk at PyCon last week also floated the idea of "true function inlining" for hot functions that could be done by an optimisation pass prior to the JIT pass over the code; this could also make a significant difference if any() and all() were rewritten in Python.

I really wouldn't assume any kind of stability when it comes to performance details in this area.