Open dimpase opened 9 years ago
Description changed:
---
+++
@@ -1 +1,28 @@
-In #18897 one leak is fixed, but there are more left, see comments 27 and later.
+In #18897 one leak is fixed, but there are more left, see comments 27 and later:
+
+```
+sage: L = [1,1,0,1,0,1,2,1,1,0,0,1,2,0,0,1,1,0,2,0,0,0,0,0,1]
+sage: def test(L, dim):
+....: import gc
+....: from collections import Counter
+....: gc.collect()
+....: pre={id(c) for c in gc.get_objects()}
+....: m = matrix(dim, L)
+....: for p in range(2,102):
+....: m.change_ring(GF(nth_prime(p))).eigenvalues()
+....: gc.collect()
+....: post=Counter(type(o) for o in gc.get_objects() if id(o) not in pre)
+....: return [(k,v) for (k,v) in post.iteritems() if v>10]
+....:
+sage: test(L, 5)
+[(<class 'sage.rings.algebraic_closure_finite_field.AlgebraicClosureFiniteField_pseudo_conway_with_category'>,
+ 100),
+...
+ (<class 'sage.rings.homset.RingHomset_quo_ring_with_category'>, 100),
+...
+ (<type 'sage.rings.finite_rings.integer_mod.NativeIntStruct'>, 100),
+ (<type 'sage.rings.finite_rings.integer_mod.Int_to_IntegerMod'>, 200),
+...
+ (<class 'sage.rings.homset.RingHomset_generic_with_category'>, 100),
+...]
+```
Dependencies: #18897
This is probably because the algebraic_closure()
method of finite fields is cached (see #14990), creating circular references between finite fields and their algebraic closures. The following simpler code exhibits a similar memory leak:
import gc
from collections import Counter
gc.collect()
pre = {id(c) for c in gc.get_objects()}
for p in prime_range(100):
GF(p).algebraic_closure()
gc.collect()
post = Counter(type(o) for o in gc.get_objects() if id(o) not in pre)
print([(k,v) for (k,v) in post.iteritems() if v>10])
Replying to @pjbruin:
This is probably because the
algebraic_closure()
method of finite fields is cached (see #14990), creating circular references between finite fields and their algebraic closures.
Circular references shouldn't be problematic, unless one of the objects involved has a __del__
method. However:
sage: K = GF(5)
sage: hasattr(K, '__del__')
False
sage: hasattr(K.algebraic_closure(), '__del__')
False
And indeed:
sage: while 1:
....: print get_memory_usage()
....: for p in range(2, 102):
....: A = GF(nth_prime(p)).algebraic_closure()
....:
<constant amount of memory>
Oops, that's a bad test. Of course it had a constant memory consumption even if memory could not be freed. Sorry.
sage: import gc
sage: _ = gc.collect()
sage: while 1:
....: print gc.collect()
....: for p in range(2, 102):
....: A = GF(nth_prime(p)).algebraic_closure()
....:
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
That's a leak.
Replying to @pjbruin:
This is probably because the
algebraic_closure()
method of finite fields is cached (see #14990), creating circular references between finite fields and their algebraic closures. The following simpler code exhibits a similar memory leak:import gc from collections import Counter gc.collect() pre = {id(c) for c in gc.get_objects()} for p in prime_range(100): GF(p).algebraic_closure() gc.collect() post = Counter(type(o) for o in gc.get_objects() if id(o) not in pre) print([(k,v) for (k,v) in post.iteritems() if v>10])
Like noticed in #18897, when this code is runned the first time, there are many stuff left. But the second time, the list is empty.
And the leak indeed only occurs if we store the algebraic closure. In a freshly started session (the previous result has also been in a freshly started session):
sage: _ = gc.collect()
sage: while 1:
....: print gc.collect()
....: for p in range(2, 102):
....: A = GF(nth_prime(p))
....:
0
8396
7969
7969
7969
7969
7969
7969
7969
7969
7969
7969
7969
7969
7969
7969
7969
7969
7969
7969
7969
7969
7969
7969
Like noticed in #18897, when this code is runned the first time, there are many stuff left. But the second time, the list is empty.
...ok, but that's the same thing for the bug presented in the description of the ticket. Sorry.
Replying to @seblabbe:
Like noticed in #18897, when this code is runned the first time, there are many stuff left. But the second time, the list is empty.
That's just because there is only one base ring. Do the same while iterating over GF(p)
, and you'll see how things accumulate.
At #18897, a binary tree was duely deallocated, however it was forgotten to dereference the root node. But here, we have objects that could be deallocated, but aren't. It is a totally different kind of leak.
Attachment: test.png
Reference chain to finite field after creating its algebraic closure
attachment: test.png results from the following code:
sage: import objgraph, gc
sage: K = GF(31)
sage: n = id(K)
sage: A = K.algebraic_closure()
sage: del A,K
sage: gc.collect()
0
sage: L = [c for c in gc.get_objects() if id(c)==n]
sage: objgraph.show_backrefs(L[0],filename="/home/king/Sage/work/memleak/test.png")
Graph written to /tmp/objgraph-r5RhSM.dot (21 nodes)
Image generated as /home/king/Sage/work/memleak/test.png
Apparently (since all other references are weak or circular), the references that prevent deallocation come from sage.rings.finite_rings.integer_mod.NativeIntStruct
.
It seems that there is a NativeIntStruct created for each finite field. If the NativeIntStruct is small enough, then it has a (multiplication?) table, which holds references to all elements of the finite field. And they have, of course, references to their parent. But what is referencing the NativeIntStruct?
It seems that the reference somehow comes from constructing the algebraic closure, as we have
sage: K = GF(31)
sage: del K
sage: L = [c for c in gc.get_objects() if isinstance(c, sage.rings.finite_rings.integer_mod.NativeIntStruct) and len(c._get_table())==31]
sage: len(L)
0
So, the NativeIntStruct can be garbage collected when we do not construct the algebraic closure.
How can one find a reference chain from the algebraic closure to the NativeIntStruct?
Attachment: test2.png
Another reference chain, towards NativeIntStruct
attachment: test2.png shows the result of
sage: K = GF(31)
sage: A = K.algebraic_closure()
sage: del A,K
sage: L = [c for c in gc.get_objects() if isinstance(c, sage.rings.finite_rings.integer_mod.NativeIntStruct) and len(c._get_table())==31]
sage: filter = lambda x: (x is not L) and (not isinstance(x, sage.rings.finite_rings.integer_mod.IntegerMod_int))
sage: objgraph.show_backrefs(L[0],filter=filter,filename="/home/king/Sage/work/memleak/test2.png")
Graph written to /tmp/objgraph-4qP1R3.dot (16 nodes)
Image generated as /home/king/Sage/work/memleak/test2.png
The picture somehow looks familiar: There is a coerce map involved, namely !Int_to_IntegerMod. Coerce maps are supposed to have a weak reference to the domain and a strong reference to the codomain. Since the domain of !Int_to_IntegerMod presumably is the ring of integers and can't be deallocated anyway, a strong reference to the codomain means trouble...
I was also playing with objgraph...
def test(L, dim):
import objgraph
import gc
from collections import Counter
gc.collect()
pre={id(c) for c in gc.get_objects()}
m = matrix(dim, L)
for p in range(2,102):
m.change_ring(GF(nth_prime(p))).eigenvalues()
gc.collect()
O = gc.get_objects()
post=Counter(type(o) for o in O if id(o) not in pre)
T = [k for (k,v) in post.iteritems() if v==100]
D = dict((type(o),o) for o in O if type(o) in T)
for i,v in enumerate(sorted(D.values())):
print v
objgraph.show_backrefs(v,filename="test_{}.png".format(i))
print "--"
I get :
sage: L = [1,1,0,1,0,1,2,1,1,0,0,1,2,0,0,1,1,0,2,0,0,0,0,0,1]
sage: test(L, 5)
Principal ideal (547) of Integer Ring
Graph written to /tmp/objgraph-e803N8.dot (20 nodes)
Image generated as test_0.png
--
Set of Homomorphisms from Integer Ring to Finite Field of size 181
Graph written to /tmp/objgraph-I_uL5V.dot (12 nodes)
Image generated as test_1.png
--
<class 'sage.rings.finite_rings.conway_polynomials.PseudoConwayLattice'>
Graph written to /tmp/objgraph-oMuwMY.dot (13 nodes)
Image generated as test_2.png
--
Algebraic closure of Finite Field of size 181
Graph written to /tmp/objgraph-M_9gmt.dot (14 nodes)
Image generated as test_3.png
--
Finite Field of size 547
Graph written to /tmp/objgraph-ZostIO.dot (25 nodes)
Image generated as test_4.png
--
Set of Homomorphisms from Finite Field of size 181 to Univariate Polynomial Ring in x over Finite Field of size 181
Graph written to /tmp/objgraph-diDGKn.dot (12 nodes)
Image generated as test_5.png
--
<sage.rings.finite_rings.integer_mod.NativeIntStruct object at 0x7f366d1e3590>
Graph written to /tmp/objgraph-JxmEbJ.dot (30 nodes)
Image generated as test_6.png
--
I will attach test_3.png for the algebraic closure right now.
Ref graph for Algebraic closure of Finite Field of size 181
Attachment: test_3.png
Let me try to recall why the current "weak-referencing coerce maps" are made as they are, to understand why we have a strong reference chain to Int_to_IntegerMod
.
What I do not understand: Why is that more than a strong reference CYCLE (which would not prevent garbage collection) from the codomain to the map and back?
Aha. The coercion model caches the coerce maps, too. So, it isn't the codomain only that holds a cache.
The coercion model is of course a permanent object. It references a TripleDict
to store the coercion maps. The coercion map references the codomain. Hence, if the domain is strongly referenced from somewhere, then the coerce map and thus the codomain can not be garbage collected.
Can we perhaps make it so that the cache in the coercion model only keeps a WEAK reference to the coerce map? I worry about performance, though, since getting the referenced object from a weak reference is a bit costly.
Do we need to worry about premature collection of coerce map? If I understand correctly, the cache in the coercion model is mainly for performance, as the map is cached as an attribute of the codomain anyway. And the codomain of any map is a parent. Hence, the cache of the coercion model actually is redundant.
So, perhaps a better idea is to completely get rid of the coercion model cache, as the coercion model can use the codomain's cache. I'll try that.
No, the coercion model's cache is needed. It is relevant for pushouts. There, we have no codomain, as it first needs to be constructed.
Perhaps my diagnosis was wrong: It is not relevant for pushouts, but we want to cache the ABSENCE of a coercion from a parent to, say, int
. So, if the codomain happens to be a parent then we can use its cache; otherwise there is no problem to use the coercion model's cache since int
and friends will never be garbage collected anyway.
So, better not use weak references to coercion maps...
It seems that #14058 is relevant -- and perhaps it fixes our problem (except that it has no branch).
I am afraid that #14058 (which now has a branch) does not suffice to fix the problem: GF(31) still can't be collected after creating its algebraic closure. However, the coercion model is not mentioned any longer. Instead, the reference goes via a weak value dictionary, which seems to be the polynomial ring cache.
My guess:
The question is: How is there a strong reference to P? Perhaps there is a strong reference from GF(31) to P? Maybe via caching the algebraic closure, which references an element of P? This would be enough to keep P alive.
I tried to find a backref chain as follows:
sage: import objgraph, gc, __main__
sage: K = GF(31)
sage: A = K.algebraic_closure()
sage: nK = id(K)
sage: del A,K
sage: L = [c for c in gc.get_objects() if id(c) == nK]
sage: objgraph.find_backref_chain(L[0], lambda x: x in __main__.__dict__.values(), extra_ignore=(id(L),))
but it crashes with a segfault. Any idea why it fails?
And this one
sage: import objgraph, gc
sage: cm = sage.structure.element.get_coercion_model()
sage: K = GF(31)
sage: A = K.algebraic_closure()
sage: nK = id(K)
sage: del A,K
sage: gc.collect()
0
sage: L = [c for c in gc.get_objects() if id(c) == nK]
sage: objgraph.find_backref_chain(L[0], lambda x: x is cm, extra_ignore=(id(L),))
doesn't finish after several minutes.
I get pretty good plots from
objgraph.show_backrefs(L,filename='plot.png',max_depth=5)
It seems a global link exists here:
sage: [a for a in sage.rings.polynomial.polynomial_ring_constructor.__dict__['_cache'].keys() if id(a[0]) == id(L[0])]
[(Finite Field of size 31, ('x',), False, None),
(Finite Field of size 31, ('x',), False, 'FLINT')]
Replying to @nbruin:
I get pretty good plots from
objgraph.show_backrefs(L,filename='plot.png',max_depth=5)
Ahm, L?? That's a list.
It seems a global link exists here:
sage: [a for a in sage.rings.polynomial.polynomial_ring_constructor.__dict__['_cache'].keys() if id(a[0]) == id(L[0])] [(Finite Field of size 31, ('x',), False, None), (Finite Field of size 31, ('x',), False, 'FLINT')]
Of course. As long as P=GF(31)['x']
lives, its base ring will live, too. The question is why the polynomial ring can't be collected. So, perhaps it would be better to try and find a chain for the polynomial ring instead.
Attachment: test_P_14058.png
Backref graph for a polynomial ring over finite field
attachment: test_P_14058.png is interesting. It seems that the reference chain to the polynomial ring is via TRACEBACKS! The question then arises: Why are the tracebacks not garbage collected?
Replying to @simon-king-jena:
Of course. As long as
P=GF(31)['x']
lives, its base ring will live, too. The question is why the polynomial ring can't be collected. So, perhaps it would be better to try and find a chain for the polynomial ring instead.
OK, the finite field caches under algebraic_closure
in its dict a CachedMethodCaller
that references a PseudoConwayLattice
object that in its __dict__
has a ring that is the polynomial ring. That's a reference chain from the finite field to the polynomial ring, preventing the polynomial ring from being deallocated.
Oops. I see that the chain goes via objgraph. So, I guess the new attachment is worthless.
Replying to @nbruin:
OK, the finite field caches under
algebraic_closure
in its dict aCachedMethodCaller
that references aPseudoConwayLattice
object that in its__dict__
has a ring that is the polynomial ring. That's a reference chain from the finite field to the polynomial ring, preventing the polynomial ring from being deallocated.
And, by comment:19, it prevents the base ring from collection, because it is not simply a cyclic reference, but a strong reference from a weak value dictionary to one of its keys.
Why does the CachedMethodCaller reference a PseudoConwayLattice? Shouldn't it "only" reference the return value?
Replying to @simon-king-jena:
Why does the CachedMethodCaller reference a PseudoConwayLattice? Shouldn't it "only" reference the return value?
I see. The algebraic closure itself references the PseudoConwayLattice.
Hm. At some point we should use a weak reference. Maybe right here.
The lattice that is stored in the algebraic closure is either passed as an argument to the init method, or is constructed during init:
def __init__(self, base_ring, name, category=None, lattice=None, use_database=True):
if not (is_FiniteField(base_ring) and base_ring.is_prime_field()):
raise NotImplementedError('algebraic closures of finite fields are only implemented for prime fields')
from sage.rings.finite_rings.conway_polynomials import PseudoConwayLattice
p = base_ring.characteristic()
if lattice is None:
lattice = PseudoConwayLattice(p, use_database)
elif not isinstance(lattice, PseudoConwayLattice) or lattice.p != p:
raise TypeError('lattice must be a pseudo-Conway lattice with characteristic %s' % p)
self._pseudo_conway_lattice = lattice
AlgebraicClosureFiniteField_generic.__init__(self, base_ring, name, category)
That's a dilemma. If "lattice" is not passed as an argument, it is no problem to weakly reference it, as it can be reconstructed, should it be garbage collected. But otherwise? Hm.
I think I know a potential solution.
The problem, by the above analysis: The polynomial ring cache is a weak value dictionary. Generally, a strong reference chain from key to value will prevent garbage collection of the key-value pair.
I suggest to remove the "global" polynomial ring cache. Instead, I suggest that the polynomial ring constructor uses a weak value dictionary that is stored as an attribute of the base ring (e.g., in self.__cached_methods
, which is available for all parents).
The weakly referenced values are polynomial rings. The keys are the list of variable names and information on term order and implementation---so, strong references to them shouldn't be problematic.
In that model, a strong reference chain from the base ring to the polynomial ring would NOT prevent garbage collection, since in the worst case it is a reference cycle (base ring <-> polynomial ring).
Hooray! With that change, I get
sage: import objgraph, gc
sage: K = GF(31)
sage: A = K.algebraic_closure()
sage: n = id(K)
sage: del A,K
sage: gc.collect()
186
sage: L = [c for c in gc.get_objects() if id(c) == n]
sage: L
[]
Last 10 new commits:
6d34077 | Simplify code for deallocation of binary trees |
80ce876 | Further simplification |
ac68d9b | Remove strong references to parents used in binary operations in the coercion model. |
e153d08 | #14058: Add doctest |
b9141ee | Merge branch 'ticket/14058' into develop |
90ed181 | Trivial fix for a coercion doctest |
64b572c | refcount libsingular rings used in plural |
9793dbc | Make one test more stable |
219fbf4 | Merge branch 't/14058/weakly_reference_binary_operation_codomains' into t/18905/memleak_for_integer_mod_rel_14058 |
3ada19d | Replace the global polynomial ring cache by a cache in the base ring |
Changed dependencies from #18897 to #18897, #14058
Author: Simon King
Commit: 3ada19d
To my slight surprise, replacing the global polynomial ring cache by a local cache did not only solve the issue tracked here, but it did not introduce a new problem: With the attached branch, all tests should pass.
I have reviewed most part of #14058, but I think someone should have a look at my additions (review patch) there and finish the review.
From the patchbot
sage -t --long src/sage/structure/coerce.pyx
**********************************************************************
File "src/sage/structure/coerce.pyx", line 1307, in sage.structure.coerce.CoercionModel_cache_maps.coercion_maps
Failed example:
print N2-N0
Expected:
0
Got:
-1
Is it what we should get?
Replying to @videlec:
From the patchbot ... Is it what we should get?
Note that the same error appears at #14058, even though the commits from here are not part of #14058, if I see that correctly.
Could someone verify if it is really the case that tests pass with "develop", but fail with the branch from here (merged in "develop" of course)? I currently do not have the bandwidth.
Replying to @simon-king-jena:
Replying to @videlec: Could someone verify if it is really the case that tests pass with "develop", but fail with the branch from here (merged in "develop" of course)? I currently do not have the bandwidth.
It does fail, see #14058 comment 62.
Replying to @videlec:
Replying to @simon-king-jena:
Replying to @videlec: Could someone verify if it is really the case that tests pass with "develop", but fail with the branch from here (merged in "develop" of course)? I currently do not have the bandwidth.
It does fail, see #14058 comment 62.
Please be clearer. It fails in what setting? Does it only fail with #14058? Does it also fail with develop? Does it also fail with the branch from here?
If it fails both here and at #14058, but not with develop, then I reckon both branches trigger a memory leak that was introduced elsewhere.
Replying to @simon-king-jena:
Please be clearer. It fails in what setting? Does it only fail with #14058? Does it also fail with develop? Does it also fail with the branch from here?
Sorry for the noise. I just notice that #14058 is a dependency for the ticket here. So, we should focus on #14058.
bump...
I think the description of the ticket should be updated because I don't see those instances with multiple of one hundred anymore on sage-6.9.rc0. (Exactly what leaks are fixed in this ticket?)
Is the following line needed?
+from sage.structure.parent import Parent
Other than that, I verified that the branches indeed fixes the problem mentioned at comment 33. When the two thing above are fixed, to me, it will be a positive review.
In #18897 one leak is fixed, but there are more left, see comments 27 and later:
CC: @jm58660
Component: memleak
Author: Simon King
Branch/Commit: public/18905 @
e952388
Issue created by migration from https://trac.sagemath.org/ticket/18905