Closed darijgr closed 9 years ago
Description changed:
---
+++
@@ -35,7 +35,7 @@
sage: T = StandardTableau([[1,2],[3]]) -sage: T[0][1] = 2 +sage: T[0][1] = 5 sage: isinstance(T, StandardTableau) True
This is definitely an issue. My first thought is to convert all of the sublists (rows) of the tableaux as tuples and upon calling _repr_
, convert them back to lists. I'll think more about this today.
Any results? I'm also in favor in tuples. Do you think there will be a speed regression to this? It seems to me that we are not making any use of the deep mutability of tableaux, so I wouldn't expect that to happen.
Sorry, I let this one drop off my radar. Here's the block that we should look at IMO (lines 315-318 in tableau.py
)
# CombinatorialObject verifies that t is a list
# We must verify t is a list of lists
if not all(isinstance(row, list) for row in t):
raise ValueError("A tableau must be a list of lists.")
The first part of the comment is bogus, CombinatorialObject
verifies that t
acts like a list:
sage: CombinatorialObject((1,2,3))
[1, 2, 3]
and from it's doc:
- ``l`` -- a list or any object that can be convert to a list by ``list``
The part we need to look it is should we just do something like t = map(tuple, t)
for the tableau input and let that error out?
Overall, there will be speed regression in the printing of tableaux. Here's what I would do for the _repr_()
:
def _repr_(self):
return repr(map(list, self._list))
and here are some sample timings:
sage: L = [(1,2)]*5
sage: %timeit repr(L)
100000 loops, best of 3: 14.5 us per loop
sage: %timeit repr(map(list, L))
10000 loops, best of 3: 17.2 us per loop
sage: L = [(1,2)]*20
sage: %timeit repr(L)
10000 loops, best of 3: 48.7 us per loop
sage: %timeit repr(map(list, L))
10000 loops, best of 3: 62 us per loop
sage: L = [(1,2,3,4)]*100
sage: %timeit repr(L)
1000 loops, best of 3: 332 us per loop
sage: %timeit repr(map(list, L))
1000 loops, best of 3: 401 us per loop
So it's about 10-20% slowdown.
(FYI repr(L)
seems to be ever so slightly faster than L.__repr__()
.)
So perhaps we should make them all CombinatorialObject
instead of tuple
...?
The other option to this would be having __getitem__()
and the like return tuples or copies of the lists. I'm worried this might also result in a slowdown in code that is called significantly more often.
For point 2., that shared data comment is about creating a tableau from another tableau. I use Family
when I want immutable dictionaries, although I would still use a list of list (type) approach to arbitrary shaped tableaux.
t = map(tuple, t)
sounds like a good idea to me. The immutability of the outer list is handled by CombinatorialObject
, so we only need to care about the inner ones.
Is _repr_
slowdown important? That is, is _repr_
used in any non-IO contexts such as hashing and caching? (I know it sometimes is; the question whether it is here.)
Is _repr_
the only thing that gets slowed down if we replace the (inner) lists by tuples? I'd say it should be, because any method on tableaux that needs to mess around with rows as lists currently needs to clone them before doing so, and experiments tell me that t[:]
is considerably faster when t
is a tuple than when t
is a list:
sage: g = tuple(range(15))
sage: %timeit g[:]
10000000 loops, best of 3: 73.4 ns per loop
sage: g = range(15)
sage: %timeit g[:]
1000000 loops, best of 3: 197 ns per loop
sage: g = tuple(range(3))
sage: %timeit g[:]
10000000 loops, best of 3: 73.4 ns per loop
sage: g = range(3)
sage: %timeit g[:]
10000000 loops, best of 3: 158 ns per loop
This might and not might be related, but do you have any idea where the slowdowns in #14711 comment:107 come from? (I'm aware that running all doctests in series is not the scientific way of assessing performance, but I'm still worried about tableaux getting slower...)
Currently the _repr_
is used to create the hash for CombinatorialObject
. We might be better served storing CombinatorialObject
as a tuple instead of a list and just using the default hash. (Also you can think of CombinatorialObject
as the SageObject
equivalent to a python tuple.)
In regards to #14711 and from what I understand of Simon's comment, it is about the creation of the parent object Tableaux()
and is not a slowdown per-say. More specifically, it's about the weakly referenced Tableaux()
parent having to be recreated during Sage's startup since the morphisms which hold a weak reference to it are being recreated. So once you do hold a strong reference to Tableaux()
, it won't be destroyed/recreated (of course unless you delete the strong reference as well). Moreover, although the relative value is high, the absolute value is still low so I don't think it's affecting things much.
I haven't looked seriously at the code recently so this is just a random thought. A priori, the general plan for this kind of object is to move from CombinatorialObject
to ClonableList
or one of its friends. Maybe a ClonableList
of tuples, or even a Clonable array of arrays of C ints.
Cheers, Nicolas
@
Nicolas: Thank you. Can you remind me how ClonableList
differs from CombinatorialObject
? Does it allow mutation at clone time?
Currently tableaux can have non-integer entries, and this is both useful and used (e.g. for skew tableaux). So I'm not convinced of switching to C ints.
@tscrim: Apparently combinatorial objects hash like this:
if self._hash is None:
self._hash = str(self._list).__hash__()
return self._hash
So the __repr__
is not used, but rather the list is taken into a string and the latter is hashed. I assume this won't take any longer with tuples? Or am I looking at the wrong hash function?
Thanks also for your comments on #14711, though they're still somewhat over my head. What morphisms hold a weak reference to Tableaux()
?
Replying to @darijgr:
@
Nicolas: Thank you. Can you remind me howClonableList
differs fromCombinatorialObject
? Does it allow mutation at clone time?
One is that ClonableList
is cythonized
Currently tableaux can have non-integer entries, and this is both useful and used (e.g. for skew tableaux). So I'm not convinced of switching to C ints.
I would not switch to C ints since we I believe the crystals of tableaux are filled with wrappers around ints. In either case, it is very conceivable to me that we would want non C ints as entries.
@tscrim: Apparently combinatorial objects hash like this:
if self._hash is None: self._hash = str(self._list).__hash__() return self._hash
So the
__repr__
is not used, but rather the list is taken into a string and the latter is hashed. I assume this won't take any longer with tuples? Or am I looking at the wrong hash function?
The __str__()
ends up calling the __repr__()
, and then the resulting string is hashed since we can't just hash lists. For tuples, we can just call hash(self._list)
.
Thanks also for your comments on #14711, though they're still somewhat over my head. What morphisms hold a weak reference to
Tableaux()
?
shrugs Actually those parents aren't being recreated on startup, I misremembered / misread Simon's comment. It's about when running all tests. There might be some dependency cycle (perhaps directly in the morphism, but maybe not) which creates a tableau (and hence Tableaux()
) that becomes completely weakly referenced with #14711 and so it gets garbage collected. shrugs IDK, it would require some detailed searching and analysis.
Branch: public/15862
Commit: f300f91
Branch pushed to git repo; I updated commit sha1. New commits:
5228b88 | crystal fixups |
Stopgaps: #17997
Branch pushed to git repo; I updated commit sha1. New commits:
70c1fa4 | more changes |
Branch pushed to git repo; I updated commit sha1. New commits:
b084115 | Clean up a bit and fix remaining failing doctest. |
This is a bit ugly at the moment:
sage: t = Tableau([[1,2],[3]])
sage: list(t)
[(1, 2), (3,)]
sage: t.to_list()
[[1, 2], [3]]
The first calls the iterator, while the second returns an actual copy with the tuples converted to lists.
Branch pushed to git repo; I updated commit sha1. New commits:
b0ee04a | Put ._list back in |
Branch pushed to git repo; I updated commit sha1. New commits:
1ff725b | additional tangential changes |
Do we have doctests for the issues mentioned in the OP?
We do now.
Branch pushed to git repo; I updated commit sha1. New commits:
8d6f606 | remove uses of to_list methods that were not actively using its listness |
Branch pushed to git repo; I updated commit sha1. New commits:
b72dde0 | speeding up SemistandardTableaux containment test (30% on a 4x5 tableau) |
Branch pushed to git repo; I updated commit sha1. New commits:
3ca3e47 | same without bug |
Branch pushed to git repo; I updated commit sha1. New commits:
c801501 | getting rid of an older bug too |
Branch pushed to git repo; I updated commit sha1. New commits:
5853cac | microoptimization (0%--10% in my use cases) on StandardTableaux containment |
Changed keywords from tableaux, sage-combinat, mutability to tableaux, sage-combinat, mutability, days64
Work Issues: see if skew tableaux have gotten slower
Just for the record: Part II of this project, i.e. changing the parent from CombinatorialObject to ClonableList, can be found at public/TransitionClonable
.
Branch pushed to git repo; I updated commit sha1. New commits:
d45f2cc | optimize `__init__` of semistandard tableaux |
Branch pushed to git repo; I updated commit sha1. New commits:
328b3d7 | speeding up is_semistandard on skew tableaux by about 10x |
Branch pushed to git repo; I updated commit sha1. New commits:
c9734a3 | ridding tableau.py of flatten, making things faster again |
Branch pushed to git repo; I updated commit sha1. New commits:
b68a79b | a few more optimizations |
Tableaux in Sage used to be implemented as lists of lists. The outer list was wrapped in a
CombinatorialObject
, which made it immutable (at least without accessing underscored attributes). The inner lists, however, could be easily mutated; for example:This kind of mutability was likely not intended. I, personally, have only ever triggered it by accident.
The present branch replaces the inner lists in the implementation of tableaux and skew tableaux by tuples. As a consequence, tableaux become completely (rather than just shallowly) immutable (unless their entries themselves are mutable, which can be blamed on the user). They are still printed as lists of lists, but this is just a
_repr_
issue.The branch also makes some optimizations and corrections.
Old description:
Tableaux in Sage are mutable objects, at least indirectly:
This in itself is probably not a bug, although not the kind of behavior I like either (what exactly is sped up by mutability of tableaux?). But there are things which probably are bugs given this behavior:
sage/combinat/tableau.py
says:But we are not immutable. This comment line is supposed to provide justification for initializing the tableau as a
CombinatorialObject
, but the docstring ofCombinatorialObject
says that "CombinatorialObjects are shallowly immutable, and the intention is that they are semantically immutable". The latter is not satisfied for tableaux.If we want tableaux to be mutable, why wrap them inside such a class? If we want them to be immutable, wouldn't it be right to encode them as CombinatorialObjects of CombinatorialObjects? Or is the speed cost for this too steep? And, finally, what is it that CombinatorialObject does that tuple does not?
And, on a related note, does Sage provide a class for immutable dictionaries? (I'm still hell-bent on implementing arbitrary-shaped tableaux.)
CC: @anneschilling @tscrim @nthiery @stumpc5 @AndrewAtLarge @zabrocki @sagetrac-sage-combinat @hivert @sagetrac-jpswanson
Component: combinatorics
Keywords: tableaux, sage-combinat, mutability, days64
Stopgaps: #17997
Author: Josh Swanson, Jan Keitel, Darij Grinberg
Branch/Commit:
430003d
Reviewer: Travis Scrimshaw
Issue created by migration from https://trac.sagemath.org/ticket/15862