sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.4k stars 473 forks source link

Improve handling of finite iterables #24757

Open embray opened 6 years ago

embray commented 6 years ago

There is a general problem throughout much of Sage where code checks if some argument is a list or tuple (e.g. isinstance(x, (list, tuple))) where really it should be able to accept any argument which, depending on the context, implements either the Sequence protocol, or even just the Iterable protocol.

This especially becomes a problem on Python 3 where common built-ins like map() and filter() now return lazily-evaluated iterators instead of lists.

In many cases we've addressed this by replacing map/filter calls with list comprehensions and that works fine. But really also ought to make many of these interfaces more flexible in what types they accept.

For some cases simply replacing (list, tuple) with Sequence is sufficient (basically, this is any object that implements __iter__ as well as __len__ and __getitem__). Unfortunately, there are other cases where we don't need the full Sequence protocol, and in fact would like to take any iterable (like map).

The problem with the latter case is that in most cases what we really want is finite iterables. We don't want to write code that can be passed an infinite generate and hang. To some extent this is the user's responsibility but we should try as much as possible to prevent it. If an iterable implements __len__ then no problem, but map and filter do not (since they can wrap any iterable(s), finite or not).

An even bigger problem is that Python's interface does not allow us to easily introspect map or filter to see if the iterable's they're wrapping are finite. In some sense there's no general way to do this with certainty, if you have arbitrarily nested iterables (though I feel that that's a shortcoming in Python). But for most cases we should be able to check if a map or filter is finite by looking directly at the iterable(s) they were passed as arguments, so it might be nice to have a C-level helper for that.

Component: python3

Issue created by migration from https://trac.sagemath.org/ticket/24757

jdemeyer commented 6 years ago
comment:1

Why is the finiteness so important? For me, an iterable is just an iterable. Trying to figure out the length of arbitrary iterables is not possible, so why bother?

embray commented 6 years ago
comment:2

There's been lots of debate about this in the Python community in the past.

Of course finiteness is important. If you do something like list(infinite_generator) your code will hang forever, or until you blow up with a MemoryError. As I wrote, one can only do so much to protect users from that--it's not 100% possible in all cases--but I certainly prefer to as much as possible. I've definitely seen cases where just blindly assuming all iterables will be finite can blow up in one's face.

jdemeyer commented 6 years ago
comment:3

Replying to @embray:

Of course finiteness is important. If you do something like list(infinite_generator) your code will hang forever, or until you blow up with a MemoryError.

What's your point? If the user writes bad code, then bad things can happen. That's a basic axiom of programming. We don't "protect" the user from running

L = []
while True:
    L.append(0)

Of course, it can often be nice to check the input given by the user. That's what exceptions are for. But here we have the problem that we want to accept certain input where we cannot a priori determine whether it's good (finite) or bad (infinite). So there are two "solutions":

  1. Your solution: only accept input which can be proven good. This means breaking certain meaningful use cases.

  2. My solution: accept all input and assume that it's good. This puts the burden on the user not to do anything stupid.

embray commented 6 years ago
comment:4

The problem with iterators, and especially generators, is that their behavior can be extremely obscured. If you have an explicit while True: it's obvious to see, and no one would write it. But it's quite easy (and this comes up a lot especially in coroutine-based programming) where you (the user, but which I also mean a developer) can wind up with some infinite generator buried deep in other generators.

This discussion led in part to the creation of __length_hint__ though that's more intended just as an optimization for the interpreter to use; it's not, nor is it intended to be, a fix to this problem. It was just a small concession that's come out of those discussions.

Of course, given some arbitrary code it's impossible to know if it will always terminate, but there are definitely common cases where we can do better than nothing.

This means breaking certain meaningful use cases.

Like what?

Also, I'll note, there are lots of places in my Python 3 branch where for now I just accept arbitrary iterables where it's possible to. But this is a concession to the fact that there's often no better way to check finiteness. The point of this ticket was to talk about adding some simple checks for common cases. E.g. if it were possible with the map object to dig into it and introspect what iterables were passed to it there's a lot one might be able to say, and it's unfortunate that the API doesn't allow that. I believe it should.

jdemeyer commented 6 years ago
comment:5

Replying to @embray:

This means breaking certain meaningful use cases.

Like what?

Generators.

embray commented 6 years ago
comment:6

I'm not sure what you mean. I'm not proposing anything that would break using arbitrary generators. Though it's a shame Python doesn't provide a way to attach a length hint to generators--or at least something along the lines of "this is expected to terminate".

jdemeyer commented 6 years ago
comment:7

Replying to @embray:

I'm not proposing anything that would break using arbitrary generators.

OK, then what are you proposing? I don't understand anymore...

Maybe you should give a concrete example of some bad iterable that you want to catch.

embray commented 6 years ago
comment:8

I'm not proposing anything specific here. This ticket is intended mostly for brainstorming.

I think it does matter whether or not an iterable is finite, but the problem is much of Python doesn't seem to care and I'm wondering out loud if there's anything we can do to improve that situation for places where it matters. I see it as a quality issue...

jdemeyer commented 6 years ago
comment:9

Thanks for the clarification. Still, I don't really what could be done. Surely, the problem cannot be solved for all iterables because of the Halting Problem. And there are almost no cases where an iterable is known to be infinite. So either you know that it's finite or you don't know anything. I think that we should always assume the best and treat the unknown cases as finite. So you end up with zero information.

jdemeyer commented 6 years ago
comment:10

Related: #13556

embray commented 6 years ago
comment:11

Obviously we can't always say with absolute certainty that some calculation will terminate, but when implementing iterators, even generators, we can say "this should terminate".

embray commented 6 years ago
comment:12

Just to give an example, say iterables had some flag on them like __finite__. I'm not sure how you'd set it on generator expressions--perhaps a built-in you can wrap it in like finite(<generator expression>) which sets said flag (or use it as a decorator on generator functions).

Then, for example, a range object would certainly have __finite__ == True. Then, say, take map. By default map.__finite__ == False (this does not mean it is definitely infinite, it just means we don't know). If do something like m = map(func, range(10)) then m.__finite__ == True. Likewise for m = map(func, range(10), range(10)) and so on.

I get your point that if we said "okay, this function will reject any inputs that don't have __finite__ == True" then we would be excluding otherwise valid inputs that simply don't have that flag set. But if such a feature did work I think you'd still wind up accepting the vast majority of well-formed inputs. I'm especially motivated here by map and filter now that they return iterators instead of lists, as well as some other built-ins.

Anyways, this isn't necessarily a problem that Sage alone can solve.

jdemeyer commented 6 years ago
comment:13

Replying to @embray:

Just to give an example, say iterables had some flag on them like __finite__.

OK, now I understand what you mean. This one sentence explains everything. Still, that sounds like something for a PEP, not something that Sage can fix.

dimpase commented 6 years ago
comment:14

It seems to me that a wholesale replacement of these new (py3-only) iterators (e.g. filter()) by iterables (e.g. lists) is not a good idea, as it prevents potential performance optimisations.

Why not just wrapping them with list() or tuple() instead? It is much less work, and it does not remove the opportunity to optimise later on, if desired.

jdemeyer commented 6 years ago
comment:16

See also https://bugs.python.org/issue33939

embray commented 6 years ago
comment:17

Interesting. I'm curious, how did you stumble on that?

jdemeyer commented 6 years ago
comment:18

https://mail.python.org/pipermail/python-dev/2018-June/153978.html