sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.4k stars 474 forks source link

Register FreeModuleElement to the Sequence ABC #24815

Open embray opened 6 years ago

embray commented 6 years ago

For most intents and purposes instances of FreeModuleElement can be used equivalently to list or tuple, in that they all meet the requirements of the Sequence protocol (which on Python 3 also includes range).

This simplifies some code that treats Vector distinctly from list and tuple. To be clear, in this case we are explicitly treating the FreeModuleElement type, and not the abstract Vector type which does not meet these requirements, but that's okay since FreeModuleElement implements any concrete vector space elements.

By convention, this always imports collections.Sequence as SequenceABC so as to not confuse it with Sage's Sequence class.

See also: #26769 Add an issequence utility to check for list, tuple, and other compatible objects

Component: refactoring

Author: Erik Bray

Branch/Commit: u/embray/refactoring/free-module-sequence @ 9660547

Issue created by migration from https://trac.sagemath.org/ticket/24815

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Commit: fcda724

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

a8d1c0eRegister FreeModuleElement to the Sequence ABC
a73178dSome simple examples where isinstance(..., (list, tuple, Vector)) could be replaced with SequenceABC
da6b45aA slightly more involved replacement with SequenceABC
fcda724More misc PEP-8 cleanup in the affected modules
embray commented 6 years ago
comment:2

One small concern I have with this is that str/bytes are also considered Sequences by Python. I think this is okay though, because under this treatment it's really no different than if passed, say, a list of one character strings. Then, if such as list is not valid input, a string is no less invalid. It just means that if you intend a function to accept string arguments to be processed separately from list/tuple-like arguments, they should always be handled first. This is what I would probably normally do anyways.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Changed commit from fcda724 to 6644dbd

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

dc25e5fSome simple examples where isinstance(..., (list, tuple, Vector)) could be replaced with SequenceABC
16c2af9A slightly more involved replacement with SequenceABC
6644dbdMore misc PEP-8 cleanup in the affected modules
embray commented 6 years ago
comment:4

This still has some failing tests--I think mostly due to minor mistakes...

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

6af1167A slightly more involved replacement with SequenceABC
876b4d4More misc PEP-8 cleanup in the affected modules
d0ec617Added a line here
9660547depending on what the base ring is, the arguments may be strings, which doesn't make sense for the code that follows this condition
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Changed commit from 6644dbd to 9660547

embray commented 6 years ago
comment:6

In fact, the main bug I had was due to the aforementioned concern about str, where there was one case where it didn't make sense to treat lists of strings equivalently to lists of lists. This was the only such case, however (a slightly tricky one since it was in the matrix constructor, which has a lot of tricky logic pertaining to flexibility in its arguments).

jdemeyer commented 6 years ago
comment:7

Can you please not make changes to src/sage/matrix because I'm dealing with that in #24742.

jdemeyer commented 6 years ago
comment:8

I'm not entirely convinced by using the Python ABC stuff. Of course we should register the Sage types in the ABC, but using the ABC is a different matter.

First of all, why Sequence? It excludes stuff that you typically want to include such as generators and the itertools stuff like zip(...).

Second, the ABC is horribly slow. It's about a factor 200 slower than checking isinstance(foo, (tuple, list)), a factor 40 slower than is_Vector() and a factor 25 slower than checking for iterability. I tested this in Cython, when given a FreeModuleElement as input.

So, why not check for iterability instead (and maybe exclude dicts)? That's what I plan to do in #24742.

embray commented 6 years ago
comment:10

I could incorporate #24742 if it even makes sense to, or just drop the changes there since it's not that important.

There are several problems with checking just for iterability. In most cases (if not all) that I updated here the code also uses indexing, so you really need that aspect of Sequence as well. The other problem is merely checking for iterability is that it can allow through infinite iterables (whereas Sequence--and perhaps this a misnomer on its part--must be finite). I discussed this problem more in #24757. As I also mentioned there, while on some level it's the user's responsibility not to pass in infinite generators, for example, where they don't belong, it's nice to prevent it where possible. zip, map, filter, etc. are all problematic here.

The performance issue is a real concern though. Just note that in addition to list, tuple, and Vector this is also adding support for range on Python 3 (and by extension any other custom type that would work just as well here, including possibly some other types in Sage).

I guess it's a question of context and what the relative overhead is. 200 x (eps < 1ns) is nothing if the overall method is on the order of 10ms, say, and what the likelihood is of using that call in a loop (or even its frequency in the test suite). For the matrix constructor most of all that's certainly worth looking at closely.

embray commented 6 years ago
comment:11

On the performance question, again, perhaps for the sake of Cython optimization one could first check list/tuple explicitly since those are going to be the most common anyways, and then fall back on the more generic check.

embray commented 6 years ago
comment:12

Replying to @embray:

On the performance question, again, perhaps for the sake of Cython optimization one could first check list/tuple explicitly since those are going to be the most common anyways, and then fall back on the more generic check.

Come to think of it, this would be an optimization worth building into Cython, if possible. Common ABC from Python's stdlib --> exact type checks for common built-in types that implement that ABC, followed by generic isinstance(...).

jdemeyer commented 6 years ago
comment:13

Replying to @embray:

I could incorporate #24742

Better make this ticket independent from #24742 and just remove the changes to src/sage/matrix from this ticket.

embray commented 6 years ago
comment:14

I don't mind, but then #24742 should do something about this, or a follow-up to it. There are cases where it should accept vectors and ranges at the very least, but currently doesn't.

jdemeyer commented 6 years ago
comment:15

Replying to @embray:

Come to think of it, this would be an optimization worth building into Cython, if possible. Common ABC from Python's stdlib --> exact type checks for common built-in types that implement that ABC, followed by generic isinstance(...).

But then the case where it's not an instance of the ABC will still be slow.

I have sometimes been thinking about adding a fast Cython function caching calls to isinstance(..., some_type). If implemented well, this could be very fast. It could be used for ABCs as well as for other common type checks in Sage like isinstance(..., Element). We could take inspiration from src/sage/structure/coerce_dict.pyx.

jdemeyer commented 6 years ago
comment:16

Replying to @embray:

I don't mind, but then #24742 should do something about this

It will.

embray commented 6 years ago
comment:17

I think though, that the case where it does the isinstance check would be relatively rare if the common cases are handled first. Before worrying extensively about it one would have to prove that there's an important common case that it actually substantially impacts.

embray commented 6 years ago
comment:19

I believe this issue can reasonably be addressed for Sage 8.4.

embray commented 6 years ago
comment:20

Replying to @jdemeyer:

Replying to @embray:

Come to think of it, this would be an optimization worth building into Cython, if possible. Common ABC from Python's stdlib --> exact type checks for common built-in types that implement that ABC, followed by generic isinstance(...).

But then the case where it's not an instance of the ABC will still be slow.

I have sometimes been thinking about adding a fast Cython function caching calls to isinstance(..., some_type). If implemented well, this could be very fast. It could be used for ABCs as well as for other common type checks in Sage like isinstance(..., Element). We could take inspiration from src/sage/structure/coerce_dict.pyx.

I'm thinking about this again, and wondering what exactly you mean. I think we really need some convenient isiterable and perhaps issequence functions, perhaps in sage.cpython. It doesn't necessarily have to work with the ABC framework. Perhaps we can just explicitly enumerate some specific types we care about for these cases (list, tuple, vector, (x)range, ...) which are going to be by far the most common. On the other hand, this is pretty inflexible, and that's where the ABC system is quite nice as it is very flexible, but that comes at a small cost.

There is also already caching built into ABCs, albeit implemented mostly in pure Python. Each ABC has a _abc_cache attribute, which is a WeakSet of all types known to implement that ABC (e.g. Sequence._abc_cache on Python 2 contains, by default, list, tuple, basestring, buffer, and xrange). There is also a Sequence._abc_negative_cache for negative lookups, also WeakSet. This gets invalidated any time a new ABC is registered.

Some interesting news is that Python 3.7 added a new C implementation of ABCMeta, which makes everything a bit faster. On Python 2.7:

python2.7 -m timeit -s 'from collections import Sequence; l=[]' 'isinstance(l, Sequence)'

and on 3.7:

python3.7 -m timeit -s 'from collections.abc import Sequence; l=[]' 'isinstance(l, Sequence)'
500000 loops, best of 5: 543 nsec per loop

So a little more than twice as fast?

Negative lookups are improved even more:

python2.7 -m timeit -s 'from collections import Sequence; l=0' 'isinstance(l, Sequence)'
100000 loops, best of 3: 2.17 usec per loop
python3.7 -m timeit -s 'from collections.abc import Sequence; l=0' 'isinstance(l, Sequence)'
500000 loops, best of 5: 534 nsec per loop

Of course, isinstance on a non-ABC is still much faster:

python3.7 -m timeit -s 'from collections.abc import Sequence; l=0' 'isinstance(l, tuple)'
2000000 loops, best of 5: 168 nsec per loop

but naturally, it slows down if the argument is a tuple, the more elements you add. This is probably significantly optimized in Cython too.

Going back to Python 2, I think nearly half the time is just spent in the lookup into the WeakSet cache:

python2.7 -m timeit -s 'from collections import Sequence; l=[]' 't = getattr(l, "__class__", None); t is not None and t in Sequence._abc_cache'
1000000 loops, best of 3: 0.663 usec per loop

(I can't make this test as easily on Python 3.7 since the _abc_cache is no longer exposed as Python attribute). So the C implementation from Python 3.7 is probably spending the majority of its time in this cache lookup, since it's doing away with some overhead from making a Python method call, the getattr call, etc.

I wonder if we couldn't do better. The use of a WeakSet makes sense for the case of classes that are created and later destroyed (more commonly these will be in the negative cache, since you'd be even less likely to register them to an ABC). That said, it's still pretty uncommon even for classes to be deleted. For builtin types, and for most classes in Sage, classes will never be deleted, and we could just cache them in a fixed-size array (if a new class is registered to the ABC it would have to be realloc'd, but this is rare).

Though I have doubts whether it's worth micro-optimizing this further. So I'd lean instead toward just backporting the C implementation of ABCMeta to Python 2, if anything at all.

embray commented 5 years ago
comment:22

Retargeting some of my tickets (somewhat optimistically for now).

embray commented 5 years ago
comment:23

Moving all my in-progress tickets to 8.8 milestone.

embray commented 5 years ago
comment:24

Tickets still needing working or clarification should be moved to the next release milestone at the soonest (please feel free to revert if you think the ticket is close to being resolved).

embray commented 4 years ago
comment:25

Ticket retargeted after milestone closed

mkoeppe commented 4 years ago
comment:26

Batch modifying tickets that will likely not be ready for 9.1, based on a review of the ticket title, branch/review status, and last modification date.

mkoeppe commented 4 years ago

Description changed:

--- 
+++ 
@@ -3,3 +3,5 @@
 This simplifies some code that treats `Vector` distinctly from `list` and `tuple`.  To be clear, in this case we are explicitly treating the `FreeModuleElement` type, and not the abstract `Vector` type which does not meet these requirements, but that's okay since `FreeModuleElement` implements any concrete vector space elements.

 By convention, this always imports `collections.Sequence` as `SequenceABC` so as to not confuse it with Sage's `Sequence` class.
+
+See also: #26769 Add an `issequence` utility to check for list, tuple, and other compatible objects
embray commented 4 years ago
comment:28

The relation between this and #26769 is slightly orthogonal since the PySequence_Check function used in issequence does not, I believe, check if an object belongs to the Sequence ABC. Perhaps it should, though I think #26769 was born partly out of Jeroen's complaint here that checking isinstance(..., Sequence) is too slow. But I think that's also improved a lot in Python 3.

mkoeppe commented 3 years ago
comment:30

Setting new milestone based on a cursory review of ticket status, priority, and last modification date.

mkoeppe commented 3 years ago
comment:31

Setting a new milestone for this ticket based on a cursory review.