Open nbruin opened 9 years ago
Branch: u/nbruin/algebra_memoryleak
Indeed, with the branch attached things are fine (and run a lot faster too, because there's not the overhead of creating all those categories every time).
Incidentally, running the command with
A = FiniteDimensionalAlgebra(GF(p), [Matrix([[1, 0], [0, 1]]), Matrix([[0, 1], [0, 0]])])
goes horribly wrong again, because of the little pearl in matrix_space.py
:
def change_ring(self, R):
try:
return self.__change_ring[R]
except AttributeError:
self.__change_ring = {}
except KeyError:
pass
M = MatrixSpace(R, self.__nrows, self.__ncols, self.__is_sparse)
self.__change_ring[R] = M
return M
which conveniently undoes all the effort of the rest of the coercion framework to avoid strong references. Note that simply changing the dict to weakly keyed won't help: the matrix spaces will hold a reference to the base ring anyway. Having it weak on both key and value would work, but now we're just replicating the UniqueRepresentation caching. Is that necessary?
New commits:
d486b71 | trac 17360: try to avoid referencing base ring in category. |
Branch pushed to git repo; I updated commit sha1. New commits:
2c942ad | trac 17360: remove custom MatrixSpace.change_ring cache, because it leaks and doesn't seem to improve performance otherwise anyway. |
OK! some initial timing gives me with the old code:
sage: M=matrix(ZZ,2,2,[1,0,0,1])
sage: k=GF(3)
sage: %timeit M.change_ring(k)
10000 loops, best of 3: 111 µs per loop
and after removing the custom cache:
sage: %timeit M.change_ring(k)
10000 loops, best of 3: 112 µs per loop
so I think there is no difference (we did improve UniqueRepresentation a lot), and removing the cache makes the example work fine.
There are plenty of other parent inits that set a base-ring referencing category by default, by the way. Any ideas on how to conveniently detect these?
Do you also want to fix this for other algebras (except Weyl and Clifford, I'll do that in #17096 since I'm editing that part of those files anyways)?
Replying to @tscrim:
Do you also want to fix this for other algebras
I would like that it gets fixed (because the pervasive leaking has repeatedly prevented me from using sage on problems of interesting scale); I'm not particularly keen on doing it myself, since I don't have a particularly good idea about how to do that efficiently.
(except Weyl and Clifford, I'll do that in #17096 since I'm editing that part of those files anyways)?
Thanks!
OK, now I'm confused. If I do
import gc
from collections import Counter
gc.collect()
pre={id(a) for a in gc.get_objects()}
for p in prime_range(20000):
C=VectorSpaces(GF(p))
gc.collect()
post=Counter(str(type(a)) for a in gc.get_objects() if id(a) not in pre)
[p for p in post.iteritems() if p[1] > 2000]
I get no hits (i.e, the category VectorSpaces seems to be collectible), but if I do the same with
C=Algebras(GF(p))
I do end up with a lot of junk. What is it that makes Algebras not collectible? [There's also the performance component: making all these different VectorSpaces categories does take time too, so equipping a vector space object by default with a generic category that doesn't need to be created for every base ring is a good idea anyway, but it would be good to understand why VectorSpaces aren't immortal like Algebras are]
One obvious difference is that Algebras inherits from CategoryWithAxiom_over_base_ring . It may be that the axiom stuff ruins mortality? It definitely ruins performance:
sage: %time for p in prime_range(20000): C=Algebras(GF(p))
CPU times: user 4.17 s, sys: 71 ms, total: 4.24 s
Wall time: 4.23 s
versus (in a fresh session, because the above creates the vector spaces anyway):
sage: %time for p in prime_range(20000): C=VectorSpaces(GF(p))
CPU times: user 660 ms, sys: 14 ms, total: 674 ms
Wall time: 662 ms
Both are noticeable, so should be avoided if possible, but the Algebras one is just VERY slow. It's not using that much memory so I have trouble attributing that just to overhead in allocating memory.
Algebras
is a subclass of CategoryWithAxiom_over_base_ring
whereas Modules
(and VectorSpaces
) is a subclass of Category_module
. I'm guessing that any subclass of CategoryWithAxiom
gets nailed into memory because there is a link back to the list of axioms or the call to the strongly cached axiom_of_nested_class
.
Should we or did someone open a ticket to fix that (general) nasty memory leak caused by WithAxiom
?
I'm thinking we should change this ticket to take care of the CategoryWithAxiom
leak.
I can get rid of the memory leak by removing the caching on Category._with_axiom_as_tuple
and Category._with_axiom
(although it takes a few passes of gc.collect()
to get rid of the finite fields). However I believe this could lead to a slowdown in instantiating categories, and we have the workaround of the memory leak of Algebras(Fields())
.
So perhaps what we can do is have the cached part should be at the class level (which should cover the heavy lifting and result in a large speedup in [Nils] creation because the construction won't be done for every new Algebras
category) and the caching should return a class. Thoughts?
The following code shows a memory leak:
(the content of "post" shows that all finite fields and a lot of categories are still in memory). The cause is simple: the category is still a parametrized one by default. The fix, therefore, is simple.
CC: @simon-king-jena @nthiery @jpflori
Component: memleak
Branch/Commit: u/nbruin/algebra_memoryleak @
2c942ad
Issue created by migration from https://trac.sagemath.org/ticket/17360