Mutability of tableaux part I: lists of tuples instead of lists of lists

darijgr commented 10 years ago

Tableaux in Sage used to be implemented as lists of lists. The outer list was wrapped in a CombinatorialObject, which made it immutable (at least without accessing underscored attributes). The inner lists, however, could be easily mutated; for example:

sage: T = Tableau([[1,2],[2]])
sage: t0 = T[0]
sage: t0[1] = 3 
sage: T
[[1, 3], [2]]

This kind of mutability was likely not intended. I, personally, have only ever triggered it by accident.

The present branch replaces the inner lists in the implementation of tableaux and skew tableaux by tuples. As a consequence, tableaux become completely (rather than just shallowly) immutable (unless their entries themselves are mutable, which can be blamed on the user). They are still printed as lists of lists, but this is just a _repr_ issue.

The branch also makes some optimizations and corrections.

Old description:

Tableaux in Sage are mutable objects, at least indirectly:

sage: T = Tableau([[1,2],[2]])
sage: t0 = T[0]
sage: t0[1] = 3 
sage: T
[[1, 3], [2]]

This in itself is probably not a bug, although not the kind of behavior I like either (what exactly is sped up by mutability of tableaux?). But there are things which probably are bugs given this behavior:

Tableaux are hashed by reference:

sage: T = Tableau([[1,2],[2]])
sage: hash(T)
-7723024261707595164
sage: T[0][1] = 4
sage: hash(T)
-7723024261707595164

Line 311 of sage/combinat/tableau.py says:

            # Since we are (suppose to be) immutable, we can share the underlying data

But we are not immutable. This comment line is supposed to provide justification for initializing the tableau as a CombinatorialObject, but the docstring of CombinatorialObject says that "CombinatorialObjects are shallowly immutable, and the intention is that they are semantically immutable". The latter is not satisfied for tableaux.

If we want tableaux to be mutable, why wrap them inside such a class? If we want them to be immutable, wouldn't it be right to encode them as CombinatorialObjects of CombinatorialObjects? Or is the speed cost for this too steep? And, finally, what is it that CombinatorialObject does that tuple does not?

And, on a related note, does Sage provide a class for immutable dictionaries? (I'm still hell-bent on implementing arbitrary-shaped tableaux.)

Mutating tableaux poisons type judgments (or what passes for type judgments in Sage):

sage: T = StandardTableau([[1,2],[3]])
sage: T[0][1] = 5
sage: isinstance(T, StandardTableau)
True

There is some action at a distance (although fortunately rare):

sage: T = SkewTableau([[1,2],[3]])
sage: S = T.to_tableau()  # Tableau(S) doesn't work, wondering if it should?
sage: S
[[1, 2], [3]]
sage: T
[[1, 2], [3]]
sage: T[0][1] = 5
sage: S
[[1, 5], [3]]
sage: T
[[1, 5], [3]]

How would I define Loday's Hopf algebra of tableaux if tableaux are mutable?

CC: @anneschilling @tscrim @nthiery @stumpc5 @AndrewAtLarge @zabrocki @sagetrac-sage-combinat @hivert @sagetrac-jpswanson

Component: combinatorics

Keywords: tableaux, sage-combinat, mutability, days64

Stopgaps: #17997

Author: Josh Swanson, Jan Keitel, Darij Grinberg

Branch/Commit: 430003d

Reviewer: Travis Scrimshaw

Issue created by migration from https://trac.sagemath.org/ticket/15862

darijgr commented 10 years ago

Description changed:

--- 
+++ 
@@ -35,7 +35,7 @@

sage: T = StandardTableau([[1,2],[3]]) -sage: T[0][1] = 2 +sage: T[0][1] = 5 sage: isinstance(T, StandardTableau) True

tscrim commented 10 years ago

comment:2

This is definitely an issue. My first thought is to convert all of the sublists (rows) of the tableaux as tuples and upon calling _repr_, convert them back to lists. I'll think more about this today.

darijgr commented 10 years ago

comment:3

Any results? I'm also in favor in tuples. Do you think there will be a speed regression to this? It seems to me that we are not making any use of the deep mutability of tableaux, so I wouldn't expect that to happen.

tscrim commented 10 years ago

comment:4

Sorry, I let this one drop off my radar. Here's the block that we should look at IMO (lines 315-318 in tableau.py)

# CombinatorialObject verifies that t is a list
# We must verify t is a list of lists
if not all(isinstance(row, list) for row in t):
    raise ValueError("A tableau must be a list of lists.")

The first part of the comment is bogus, CombinatorialObject verifies that t acts like a list:

sage: CombinatorialObject((1,2,3))
[1, 2, 3]

and from it's doc:

- ``l`` -- a list or any object that can be convert to a list by ``list``

The part we need to look it is should we just do something like t = map(tuple, t) for the tableau input and let that error out?

Overall, there will be speed regression in the printing of tableaux. Here's what I would do for the _repr_():

def _repr_(self):
    return repr(map(list, self._list))

and here are some sample timings:

sage: L = [(1,2)]*5
sage: %timeit repr(L)
100000 loops, best of 3: 14.5 us per loop
sage: %timeit repr(map(list, L))
10000 loops, best of 3: 17.2 us per loop
sage: L = [(1,2)]*20
sage: %timeit repr(L)
10000 loops, best of 3: 48.7 us per loop
sage: %timeit repr(map(list, L))
10000 loops, best of 3: 62 us per loop
sage: L = [(1,2,3,4)]*100
sage: %timeit repr(L)
1000 loops, best of 3: 332 us per loop
sage: %timeit repr(map(list, L))
1000 loops, best of 3: 401 us per loop

So it's about 10-20% slowdown. (FYI repr(L) seems to be ever so slightly faster than L.__repr__().) So perhaps we should make them all CombinatorialObject instead of tuple...?

The other option to this would be having __getitem__() and the like return tuples or copies of the lists. I'm worried this might also result in a slowdown in code that is called significantly more often.

For point 2., that shared data comment is about creating a tableau from another tableau. I use Family when I want immutable dictionaries, although I would still use a list of list (type) approach to arbitrary shaped tableaux.

darijgr commented 10 years ago

comment:5

t = map(tuple, t) sounds like a good idea to me. The immutability of the outer list is handled by CombinatorialObject, so we only need to care about the inner ones.

Is _repr_ slowdown important? That is, is _repr_ used in any non-IO contexts such as hashing and caching? (I know it sometimes is; the question whether it is here.)

Is _repr_ the only thing that gets slowed down if we replace the (inner) lists by tuples? I'd say it should be, because any method on tableaux that needs to mess around with rows as lists currently needs to clone them before doing so, and experiments tell me that t[:] is considerably faster when t is a tuple than when t is a list:

sage: g = tuple(range(15))
sage: %timeit g[:]
10000000 loops, best of 3: 73.4 ns per loop
sage: g = range(15)
sage: %timeit g[:]
1000000 loops, best of 3: 197 ns per loop
sage: g = tuple(range(3))
sage: %timeit g[:]
10000000 loops, best of 3: 73.4 ns per loop
sage: g = range(3)
sage: %timeit g[:]
10000000 loops, best of 3: 158 ns per loop

This might and not might be related, but do you have any idea where the slowdowns in #14711 comment:107 come from? (I'm aware that running all doctests in series is not the scientific way of assessing performance, but I'm still worried about tableaux getting slower...)

tscrim commented 10 years ago

comment:6

Currently the _repr_ is used to create the hash for CombinatorialObject. We might be better served storing CombinatorialObject as a tuple instead of a list and just using the default hash. (Also you can think of CombinatorialObject as the SageObject equivalent to a python tuple.)

In regards to #14711 and from what I understand of Simon's comment, it is about the creation of the parent object Tableaux() and is not a slowdown per-say. More specifically, it's about the weakly referenced Tableaux() parent having to be recreated during Sage's startup since the morphisms which hold a weak reference to it are being recreated. So once you do hold a strong reference to Tableaux(), it won't be destroyed/recreated (of course unless you delete the strong reference as well). Moreover, although the relative value is high, the absolute value is still low so I don't think it's affecting things much.

nthiery commented 10 years ago

comment:7

I haven't looked seriously at the code recently so this is just a random thought. A priori, the general plan for this kind of object is to move from CombinatorialObject to ClonableList or one of its friends. Maybe a ClonableList of tuples, or even a Clonable array of arrays of C ints.

Cheers, Nicolas

darijgr commented 10 years ago

comment:8

@Nicolas: Thank you. Can you remind me how ClonableList differs from CombinatorialObject? Does it allow mutation at clone time?

Currently tableaux can have non-integer entries, and this is both useful and used (e.g. for skew tableaux). So I'm not convinced of switching to C ints.

@tscrim: Apparently combinatorial objects hash like this:

        if self._hash is None:
            self._hash = str(self._list).__hash__()
        return self._hash

So the __repr__ is not used, but rather the list is taken into a string and the latter is hashed. I assume this won't take any longer with tuples? Or am I looking at the wrong hash function?

Thanks also for your comments on #14711, though they're still somewhat over my head. What morphisms hold a weak reference to Tableaux()?

tscrim commented 10 years ago

comment:9

Replying to @darijgr:

@Nicolas: Thank you. Can you remind me how ClonableList differs from CombinatorialObject? Does it allow mutation at clone time?

One is that ClonableList is cythonized

Currently tableaux can have non-integer entries, and this is both useful and used (e.g. for skew tableaux). So I'm not convinced of switching to C ints.

I would not switch to C ints since we I believe the crystals of tableaux are filled with wrappers around ints. In either case, it is very conceivable to me that we would want non C ints as entries.

@tscrim: Apparently combinatorial objects hash like this:
        if self._hash is None:
            self._hash = str(self._list).__hash__()
        return self._hash
So the __repr__ is not used, but rather the list is taken into a string and the latter is hashed. I assume this won't take any longer with tuples? Or am I looking at the wrong hash function?

The __str__() ends up calling the __repr__(), and then the resulting string is hashed since we can't just hash lists. For tuples, we can just call hash(self._list).

Thanks also for your comments on #14711, though they're still somewhat over my head. What morphisms hold a weak reference to Tableaux()?

shrugs Actually those parents aren't being recreated on startup, I misremembered / misread Simon's comment. It's about when running all tests. There might be some dependency cycle (perhaps directly in the morphism, but maybe not) which creates a tableau (and hence Tableaux()) that becomes completely weakly referenced with #14711 and so it gets garbage collected. shrugs IDK, it would require some detailed searching and analysis.

a1031e5c-5e2c-4c90-8ba4-91be67e304df commented 9 years ago

Branch: public/15862

a1031e5c-5e2c-4c90-8ba4-91be67e304df commented 9 years ago

Commit: f300f91

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

5228b88 crystal fixups

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from f300f91 to 5228b88

633146c6-d9c3-4bcd-9486-77c47b650840 commented 9 years ago

Stopgaps: #17997

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 5228b88 to 70c1fa4

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

70c1fa4 more changes

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`060642f`	`infinite crystal stopgap, see #17997`
`9f6ab7d`	`Fixed minor merge conflict`
`014e04b`	`All combinat doctests now pass, except for one TestSuite in ribbon_tableau.py`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 70c1fa4 to 014e04b

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

b084115 Clean up a bit and fix remaining failing doctest.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 014e04b to b084115

a1031e5c-5e2c-4c90-8ba4-91be67e304df commented 9 years ago

comment:19

This is a bit ugly at the moment:

sage: t = Tableau([[1,2],[3]])
sage: list(t)
[(1, 2), (3,)]
sage: t.to_list()
[[1, 2], [3]]

The first calls the iterator, while the second returns an actual copy with the tuples converted to lists.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from b084115 to b0ee04a

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

b0ee04a Put ._list back in

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from b0ee04a to 1ff725b

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

1ff725b additional tangential changes

darijgr commented 9 years ago

comment:23

Do we have doctests for the issues mentioned in the OP?

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 1ff725b to 486a8cd

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`c5846a5`	`15862: Add mutability doctest`
`486a8cd`	`Merge branch 'public/15862' of trac.sagemath.org:sage into t/15862/public/15862`

a1031e5c-5e2c-4c90-8ba4-91be67e304df commented 9 years ago

comment:25

We do now.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

8d6f606 remove uses of to_list methods that were not actively using its listness

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 486a8cd to 8d6f606

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 8d6f606 to b72dde0

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

b72dde0 speeding up SemistandardTableaux containment test (30% on a 4x5 tableau)

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from b72dde0 to 3ca3e47

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

3ca3e47 same without bug

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

c801501 getting rid of an older bug too

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 3ca3e47 to c801501

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

5853cac microoptimization (0%--10% in my use cases) on StandardTableaux containment

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from c801501 to 5853cac

darijgr commented 9 years ago

Changed keywords from tableaux, sage-combinat, mutability to tableaux, sage-combinat, mutability, days64

darijgr commented 9 years ago

Work Issues: see if skew tableaux have gotten slower

a1031e5c-5e2c-4c90-8ba4-91be67e304df commented 9 years ago

comment:32

Just for the record: Part II of this project, i.e. changing the parent from CombinatorialObject to ClonableList, can be found at public/TransitionClonable.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

d45f2cc optimize `__init__` of semistandard tableaux

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 5853cac to d45f2cc

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

328b3d7 speeding up is_semistandard on skew tableaux by about 10x

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from d45f2cc to 328b3d7

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

c9734a3 ridding tableau.py of flatten, making things faster again

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 328b3d7 to c9734a3

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from c9734a3 to b68a79b

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

b68a79b a few more optimizations

sagemath / sage

Mutability of tableaux part I: lists of tuples instead of lists of lists #15862