Open embray opened 6 years ago
Commit: fcda724
Branch pushed to git repo; I updated commit sha1. New commits:
a8d1c0e | Register FreeModuleElement to the Sequence ABC |
a73178d | Some simple examples where isinstance(..., (list, tuple, Vector)) could be replaced with SequenceABC |
da6b45a | A slightly more involved replacement with SequenceABC |
fcda724 | More misc PEP-8 cleanup in the affected modules |
One small concern I have with this is that str
/bytes
are also considered Sequence
s by Python. I think this is okay though, because under this treatment it's really no different than if passed, say, a list of one character strings. Then, if such as list is not valid input, a string is no less invalid. It just means that if you intend a function to accept string arguments to be processed separately from list/tuple-like arguments, they should always be handled first. This is what I would probably normally do anyways.
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
dc25e5f | Some simple examples where isinstance(..., (list, tuple, Vector)) could be replaced with SequenceABC |
16c2af9 | A slightly more involved replacement with SequenceABC |
6644dbd | More misc PEP-8 cleanup in the affected modules |
This still has some failing tests--I think mostly due to minor mistakes...
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
6af1167 | A slightly more involved replacement with SequenceABC |
876b4d4 | More misc PEP-8 cleanup in the affected modules |
d0ec617 | Added a line here |
9660547 | depending on what the base ring is, the arguments may be strings, which doesn't make sense for the code that follows this condition |
In fact, the main bug I had was due to the aforementioned concern about str
, where there was one case where it didn't make sense to treat lists of strings equivalently to lists of lists. This was the only such case, however (a slightly tricky one since it was in the matrix constructor, which has a lot of tricky logic pertaining to flexibility in its arguments).
Can you please not make changes to src/sage/matrix
because I'm dealing with that in #24742.
I'm not entirely convinced by using the Python ABC stuff. Of course we should register the Sage types in the ABC, but using the ABC is a different matter.
First of all, why Sequence
? It excludes stuff that you typically want to include such as generators and the itertools stuff like zip(...)
.
Second, the ABC is horribly slow. It's about a factor 200 slower than checking isinstance(foo, (tuple, list))
, a factor 40 slower than is_Vector()
and a factor 25 slower than checking for iterability. I tested this in Cython, when given a FreeModuleElement
as input.
So, why not check for iterability instead (and maybe exclude dicts)? That's what I plan to do in #24742.
I could incorporate #24742 if it even makes sense to, or just drop the changes there since it's not that important.
There are several problems with checking just for iterability. In most cases (if not all) that I updated here the code also uses indexing, so you really need that aspect of Sequence
as well. The other problem is merely checking for iterability is that it can allow through infinite iterables (whereas Sequence
--and perhaps this a misnomer on its part--must be finite). I discussed this problem more in #24757. As I also mentioned there, while on some level it's the user's responsibility not to pass in infinite generators, for example, where they don't belong, it's nice to prevent it where possible. zip, map, filter, etc. are all problematic here.
The performance issue is a real concern though. Just note that in addition to list
, tuple
, and Vector
this is also adding support for range
on Python 3 (and by extension any other custom type that would work just as well here, including possibly some other types in Sage).
I guess it's a question of context and what the relative overhead is. 200 x (eps < 1ns) is nothing if the overall method is on the order of 10ms, say, and what the likelihood is of using that call in a loop (or even its frequency in the test suite). For the matrix constructor most of all that's certainly worth looking at closely.
On the performance question, again, perhaps for the sake of Cython optimization one could first check list
/tuple
explicitly since those are going to be the most common anyways, and then fall back on the more generic check.
Replying to @embray:
On the performance question, again, perhaps for the sake of Cython optimization one could first check
list
/tuple
explicitly since those are going to be the most common anyways, and then fall back on the more generic check.
Come to think of it, this would be an optimization worth building into Cython, if possible. Common ABC from Python's stdlib --> exact type checks for common built-in types that implement that ABC, followed by generic isinstance(...)
.
Replying to @embray:
I could incorporate #24742
Better make this ticket independent from #24742 and just remove the changes to src/sage/matrix
from this ticket.
I don't mind, but then #24742 should do something about this, or a follow-up to it. There are cases where it should accept vectors and ranges at the very least, but currently doesn't.
Replying to @embray:
Come to think of it, this would be an optimization worth building into Cython, if possible. Common ABC from Python's stdlib --> exact type checks for common built-in types that implement that ABC, followed by generic
isinstance(...)
.
But then the case where it's not an instance of the ABC will still be slow.
I have sometimes been thinking about adding a fast Cython function caching calls to isinstance(..., some_type)
. If implemented well, this could be very fast. It could be used for ABCs as well as for other common type checks in Sage like isinstance(..., Element)
. We could take inspiration from src/sage/structure/coerce_dict.pyx
.
Replying to @embray:
I don't mind, but then #24742 should do something about this
It will.
I think though, that the case where it does the isinstance
check would be relatively rare if the common cases are handled first. Before worrying extensively about it one would have to prove that there's an important common case that it actually substantially impacts.
I believe this issue can reasonably be addressed for Sage 8.4.
Replying to @jdemeyer:
Replying to @embray:
Come to think of it, this would be an optimization worth building into Cython, if possible. Common ABC from Python's stdlib --> exact type checks for common built-in types that implement that ABC, followed by generic
isinstance(...)
.But then the case where it's not an instance of the ABC will still be slow.
I have sometimes been thinking about adding a fast Cython function caching calls to
isinstance(..., some_type)
. If implemented well, this could be very fast. It could be used for ABCs as well as for other common type checks in Sage likeisinstance(..., Element)
. We could take inspiration fromsrc/sage/structure/coerce_dict.pyx
.
I'm thinking about this again, and wondering what exactly you mean. I think we really need some convenient isiterable
and perhaps issequence
functions, perhaps in sage.cpython
. It doesn't necessarily have to work with the ABC framework. Perhaps we can just explicitly enumerate some specific types we care about for these cases (list, tuple, vector, (x)range, ...) which are going to be by far the most common. On the other hand, this is pretty inflexible, and that's where the ABC system is quite nice as it is very flexible, but that comes at a small cost.
There is also already caching built into ABCs, albeit implemented mostly in pure Python. Each ABC has a _abc_cache
attribute, which is a WeakSet
of all types known to implement that ABC (e.g. Sequence._abc_cache
on Python 2 contains, by default, list
, tuple
, basestring
, buffer
, and xrange
). There is also a Sequence._abc_negative_cache
for negative lookups, also WeakSet
. This gets invalidated any time a new ABC is registered.
Some interesting news is that Python 3.7 added a new C implementation of ABCMeta, which makes everything a bit faster. On Python 2.7:
python2.7 -m timeit -s 'from collections import Sequence; l=[]' 'isinstance(l, Sequence)'
and on 3.7:
python3.7 -m timeit -s 'from collections.abc import Sequence; l=[]' 'isinstance(l, Sequence)'
500000 loops, best of 5: 543 nsec per loop
So a little more than twice as fast?
Negative lookups are improved even more:
python2.7 -m timeit -s 'from collections import Sequence; l=0' 'isinstance(l, Sequence)'
100000 loops, best of 3: 2.17 usec per loop
python3.7 -m timeit -s 'from collections.abc import Sequence; l=0' 'isinstance(l, Sequence)'
500000 loops, best of 5: 534 nsec per loop
Of course, isinstance on a non-ABC is still much faster:
python3.7 -m timeit -s 'from collections.abc import Sequence; l=0' 'isinstance(l, tuple)'
2000000 loops, best of 5: 168 nsec per loop
but naturally, it slows down if the argument is a tuple, the more elements you add. This is probably significantly optimized in Cython too.
Going back to Python 2, I think nearly half the time is just spent in the lookup into the WeakSet
cache:
python2.7 -m timeit -s 'from collections import Sequence; l=[]' 't = getattr(l, "__class__", None); t is not None and t in Sequence._abc_cache'
1000000 loops, best of 3: 0.663 usec per loop
(I can't make this test as easily on Python 3.7 since the _abc_cache
is no longer exposed as Python attribute). So the C implementation from Python 3.7 is probably spending the majority of its time in this cache lookup, since it's doing away with some overhead from making a Python method call, the getattr
call, etc.
I wonder if we couldn't do better. The use of a WeakSet
makes sense for the case of classes that are created and later destroyed (more commonly these will be in the negative cache, since you'd be even less likely to register them to an ABC). That said, it's still pretty uncommon even for classes to be deleted. For builtin types, and for most classes in Sage, classes will never be deleted, and we could just cache them in a fixed-size array (if a new class is registered to the ABC it would have to be realloc'd, but this is rare).
Though I have doubts whether it's worth micro-optimizing this further. So I'd lean instead toward just backporting the C implementation of ABCMeta to Python 2, if anything at all.
Retargeting some of my tickets (somewhat optimistically for now).
Moving all my in-progress tickets to 8.8 milestone.
Tickets still needing working or clarification should be moved to the next release milestone at the soonest (please feel free to revert if you think the ticket is close to being resolved).
Ticket retargeted after milestone closed
Batch modifying tickets that will likely not be ready for 9.1, based on a review of the ticket title, branch/review status, and last modification date.
Description changed:
---
+++
@@ -3,3 +3,5 @@
This simplifies some code that treats `Vector` distinctly from `list` and `tuple`. To be clear, in this case we are explicitly treating the `FreeModuleElement` type, and not the abstract `Vector` type which does not meet these requirements, but that's okay since `FreeModuleElement` implements any concrete vector space elements.
By convention, this always imports `collections.Sequence` as `SequenceABC` so as to not confuse it with Sage's `Sequence` class.
+
+See also: #26769 Add an `issequence` utility to check for list, tuple, and other compatible objects
The relation between this and #26769 is slightly orthogonal since the PySequence_Check
function used in issequence
does not, I believe, check if an object belongs to the Sequence
ABC. Perhaps it should, though I think #26769 was born partly out of Jeroen's complaint here that checking isinstance(..., Sequence)
is too slow. But I think that's also improved a lot in Python 3.
Setting new milestone based on a cursory review of ticket status, priority, and last modification date.
Setting a new milestone for this ticket based on a cursory review.
For most intents and purposes instances of
FreeModuleElement
can be used equivalently tolist
ortuple
, in that they all meet the requirements of the Sequence protocol (which on Python 3 also includesrange
).This simplifies some code that treats
Vector
distinctly fromlist
andtuple
. To be clear, in this case we are explicitly treating theFreeModuleElement
type, and not the abstractVector
type which does not meet these requirements, but that's okay sinceFreeModuleElement
implements any concrete vector space elements.By convention, this always imports
collections.Sequence
asSequenceABC
so as to not confuse it with Sage'sSequence
class.See also: #26769 Add an
issequence
utility to check for list, tuple, and other compatible objectsComponent: refactoring
Author: Erik Bray
Branch/Commit: u/embray/refactoring/free-module-sequence @
9660547
Issue created by migration from https://trac.sagemath.org/ticket/24815