Closed jpflori closed 12 years ago
I have slightly modifies attachment: trac11521_triple_homset.patch: In the old version, I had created a TripleDict(50)
, but meanwhile I learnt that the parameter of the TripleDict
should not be even and should best be a prime number. In the new version, it is prime...
Concerning the weak references: Why exactly is the experimental patch so slow? Is it because the access to the weakly referenced actions is so slow? Or is it because the actions are garbage collected even if their domain/codomain is still alive, so that the actions are uselessly created over and over again? I suspect it is the latter.
I guess you're right.
Here is a piece of code emphasizing the problem.
sage: K = GF(1<<60, 't')
sage: a = K.random_element()
sage: E = EllipticCurve([1, 0, 0, 0, a])
sage: P = E.random_point()
sage: 2*P
...
sage: ZZ._introspect_coerce()['_action_hash']
{(<type 'list'>, ...} # No Abelian group of points of the curve!!!
sage: get_coercion_model().get_cache()[1]
{(Integer Ring, Abelian group of points on Elliptic Curve defined by ..., <built-in function mul>): <weakref at ...; dead>, ...} # the "dead" are bad
Indeed, has all the dicts for coercion caches have weakrefs to their values, the actions get garbage collected. That becomes kind of tricky...
As pointed before, the problem is that if we let strong reference to the actions in the values of the dict, these action themselves have strong refs to the domain and codomain and so prevent the whole garbage collection to occur. Is it sensible to use weakrefs for [co]domains in Functors? Hence garbage collection can occur in the cache dicts, but if someone actually use functors directly, he must be sure to have some strong references to its domain and codomain somewhere to avoid garbage collection...
Another question: why not use a TripleDict in the parent class for the _action_hash dict rather that a WeakValueDict with three keys? That could somehow unify the approach taken here!
Replying to @jpflori:
Indeed, has all the dicts for coercion caches have weakrefs to their values, the actions get garbage collected. That becomes kind of tricky...
Yes, and note that we have two locations for storing the actions:
_action_hash
of parents.I found that having weak references in the coercion model is enough for fixing the leak - even if one has strong references in _action_hash
. That is something I don't fully understand. In the example of the ticket description, we have an action of ZZ
. ZZ
is not subject to garbage collection, hence, having a strong reference in ZZ._action_hash
should keep the action alive, and thus the elliptic curve (namely the different copies of E that are created in the loop).
Anyway, even in that case we would see the drastic slow-down.
As pointed before, the problem is that if we let strong reference to the actions in the values of the dict, these action themselves have strong refs to the domain and codomain and so prevent the whole garbage collection to occur. Is it sensible to use weakrefs for [co]domains in Functors? Hence garbage collection can occur in the cache dicts, but if someone actually use functors directly, he must be sure to have some strong references to its domain and codomain somewhere to avoid garbage collection...
Indeed that would be the consequence. I think that would not be acceptable: If the user keeps an action, then s/he can rightfully expect that domain and codomain are automatically kept alive.
Another question: why not use a TripleDict in the parent class for the _action_hash dict rather that a WeakValueDict with three keys? That could somehow unify the approach taken here!
I was thinking of that, too. However, in addition to _action_hash
, the parent also has a _action_list
. And that might be a bigger problem.
Replying to @simon-king-jena:
Yes, and note that we have two locations for storing the actions: in the coercion model in the attribute
_action_hash
of parents. I found that having weak references in the coercion model is enough for fixing the leak - even if one has strong references in_action_hash
. That is something I don't fully understand.
As I posted above in ZZ._action_hash equality is tested with "==" (rather than identity with "is") so in the code of the ticket description where only "one" curve is used, only one curve gets stored in _action_hash. If you try the code posted some comments above where (really) different curves are used, you'll see that the leak is not fixed if you dont use a similar approach for _action_hash as for the coercion model.
In the example of the ticket description, we have an action of
ZZ
.ZZ
is not subject to garbage collection, hence, having a strong reference inZZ._action_hash
should keep the action alive, and thus the elliptic curve (namely the different copies of E that are created in the loop). Anyway, even in that case we would see the drastic slow-down. Indeed that would be the consequence. I think that would not be acceptable: If the user keeps an action, then s/he can rightfully expect that domain and codomain are automatically kept alive.
The weakref domain and codomain in functors problem could be tackled by adding an optional parameter for the use of weakref.
By default, the behavior of functors would be as before with strong refs (and the option to false).
For the coercion models we would set the option to True and use weakref whenever possible.
Replying to @jpflori:
As I posted above in ZZ._action_hash equality is tested with "==" (rather than identity with "is") so in the code of the ticket description where only "one" curve is used, only one curve gets stored in _action_hash.
Yes, but then I don't understand why there is no error: In some places, the coercion model tests for identity, and raises a big fat "coercion BUG" otherwise.
The weakref domain and codomain in functors problem could be tackled by adding an optional parameter for the use of weakref.
By default, the behavior of functors would be as before with strong refs (and the option to false).
For the coercion models we would set the option to True and use weakref whenever possible.
Again, I believe that it must not be up to the user: "Evidently" (for a human), the different copies of E in the while loop can be garbage collected. It is our job to hammer the "evidence" into Sage. We shouldn't simply say "let the user decide" whether he wants a leak or a potential disaster: I would call it a disaster if one has an action and suddenly its codomain is gone.
Replying to @simon-king-jena:
Yes, but then I don't understand why there is no error: In some places, the coercion model tests for identity, and raises a big fat "coercion BUG" otherwise.
I'll have a look at it, I'm not completely up to what the coercion model actually does :) I guess the doc could also be extended on that :)
The weakref domain and codomain in functors problem could be tackled by adding an optional parameter for the use of weakref.
By default, the behavior of functors would be as before with strong refs (and the option to false).
For the coercion models we would set the option to True and use weakref whenever possible.
Again, I believe that it must not be up to the user: "Evidently" (for a human), the different copies of E in the while loop can be garbage collected. It is our job to hammer the "evidence" into Sage. We shouldn't simply say "let the user decide" whether he wants a leak or a potential disaster: I would call it a disaster if one has an action and suddenly its codomain is gone.
That would not really be up to the user if by default the functor does not use weakref, but if we change this behavior for the coercion model by switching the option. It would be fatly documented not to set the option to True for general use, so normal use would not lead to problems. Of course it all depends on what you mean by "up to the user". One could still intentionally set the option to true, and then cry that its codomain disappeared...
In the verify_action of the coerce model there is a PY_TYPE_CHECK(action, IntegerMulAction) that might explain the absence of fat BUG.
Anyway, it does not make much sense to me that the _action_hash dict in the Parent class uses "==" rather than "is", especially since the TripleDict of the coercion model do. What do yo think? Have you any good reason to justify such a choice that I might have missed?
Replying to @jpflori:
Anyway, it does not make much sense to me that the _action_hash dict in the Parent class uses "==" rather than "is", especially since the TripleDict of the coercion model do.
The old TripleDict
didn't - I changed that only in #715. Perhaps it is debatable whether that change is fine. But:
What do yo think? Have you any good reason to justify such a choice that I might have missed?
See sage/structure/coerce.pyx:
if y_mor is not None:
all.append("Coercion on right operand via")
all.append(y_mor)
if res is not None and res is not y_mor.codomain():
raise RuntimeError, ("BUG in coercion model: codomains not equal!", x_mor, y_mor)
That is why I think one should test for identity, not equality.
On the other hand, we also have
# Make sure the domains are correct
if R_map.domain() is not R:
if fix:
connecting = R_map.domain().coerce_map_from(R)
if connecting is not None:
R_map = R_map * connecting
if R_map.domain() is not R:
raise RuntimeError, ("BUG in coercion model, left domain must be original parent", R, R_map)
if S_map is not None and S_map.domain() is not S:
if fix:
connecting = S_map.domain().coerce_map_from(S)
if connecting is not None:
S_map = S_map * connecting
if S_map.domain() is not S:
raise RuntimeError, ("BUG in coercion model, right domain must be original parent", S, S_map)
in the same file. These lines apparently cope with the fact that there are non-unique parents, and they suggest that ==
is the right thing to do in the coerce caches.
But I think this is a discussion that belongs to #715.
My suggestion at #715 is to use a different way of comparison in the case of actions respectively coerce maps. This complies with existing defaults in sage/structure/coerce.pyx
I agree to move the discussion for "==" and "is" to #715 (and also agree with your solution).
So, what about the original problem here and your thoughts about moving the weakref from the action itself to its domains and codomains?
If you really dislike the idea of having an option switched off by default (and a priori only on in the coercion model and not outside it unless one knows what he is doing), we could mimick what you wanna do with the TripleDict and have the usual Functor and a Functor_weakref version (or you can go the other way around and add an extra arg to the TripleDict constructor taking the equality operator as argument :) rather than having two classes)?
Interestingly, the experimental patch for #715 that I have not submitted yet (need more tests) seems to be almost enough to fix the memory leak from the ticket description!
Namely:
sage: <the usual loop>
Traceback
...
KeyboardInterrupt:
sage: import gc
sage: gc.collect()
585
sage: from sage.schemes.generic.homset import SchemeHomsetModule_abelian_variety_coordinates_field
sage: LE = [x for x in gc.get_objects() if isinstance(x,SchemeHomsetModule_abelian_variety_coordinates_field)]
sage: len(LE)
2
And the experimental patch is not using weak references to the values of the TripleDict
- only to the keys! Perhaps this ticket will eventually be a duplicate of #715?
Anyway, I need more tests and will probably submit the patch to #715 later today.
If you used "==" for equality testing for actions in the coercion model (as in the parent caches) and let "j=j" in the "usual piece of code", this is not surprising because for "==" all the curves are equal, so there will be no duplication nor memory leak.
To perform a better test, you should replace j=j by j=K.random_element() in the constructor of the elliptic curve (as I did a few tickets up) to get really different curves (i.e. for "is" and for "=="). I fear the memory leak is still there with this second example...
Experimental: Have to versions of TripleDict
, using "==" or "is" for comparison
Attachment: trac715_two_tripledicts.patch.gz
Sorry, I have just posted a patch to the wrong ticket. The new patch belongs to #715, not to here. Just ignore it.
Concerning "wrong ticket": Shouldn't we consider this ticket a duplicate of #715? After all, the examples from the two ticket descriptions are almost identical.
That's what I originally suggested :) See the first comments on this ticket.
When I found about ticket #715 I copied the description from here to there (I must have introduced some typographical difference in the way) because I thought the problem belonged there and the original description of ticket #715 was non-existent.
Replying to @jpflori:
That's what I originally suggested :) See the first comments on this ticket.
When I found about ticket #715 I copied the description from here to there (I must have introduced some typographical difference in the way) because I thought the problem belonged there and the original description of ticket #715 was non-existent.
I see. And I also found that I was originally responsible for not making it a duplicate.
The reason was that I found another leak: It is the strong cache for the homsets, and that is not addressed in #715.
So (question to the release manager), what shall we do? Mark this as a duplicate and open a different ticket for the cache of homsets? Or change the ticket description such that it is only about homsets, not about elliptic curves?
before tagging that ticket as a duplicate of #715, I'd like to check the leak is indeed fixed with the (upcoming) patch of #715.
Paul
I fear the work on #715 is far from being finished...
Description changed:
---
+++
@@ -1,17 +1,26 @@
-The following piece of code leaks memory.
-
- sage: K = GF(1<<55,'t')
- sage: a = K.random_element()
- sage: while 1:
- ....: E = EllipticCurve(j=a); P = E.random_point(); 2*P;
+Originally, this ticket was about the following memory leak when computing with elliptic curves:
-The problem seems to occur while resolving the action of ZZ on E.
+```
+sage: K = GF(1<<55,'t')
+sage: a = K.random_element()
+sage: while 1:
+....: E = EllipticCurve(j=a); P = E.random_point(); 2*P;
+```
-It does not happen if:
+This example is in fact solved by #715. However, while working on that ticket, another leak has been found, namely
-- one does not change the curve in the loop
+```
+sage: for p in prime_range(10^5):
+....: K = GF(p)
+....: a = K(0)
+....:
+sage: import gc
+sage: gc.collect()
+0
+```
-- does P+P instead of a multiplication
+So, I suggest to start with #715 and solve the second memory leak on top of it. It seems that a strong cache for homsets is to blame. I suggest to use the weak `TripleDict` instead, which were introduced in #715.
+
**Apply**
The original example of this ticket is in fact fixed with #715, but I think the other problem should be dealt with here, namely by using a weak cache for homsets.
Note that with the patch applied on top of #715, the memleak is fixed:
sage: for p in prime_range(10^5):
....: K = GF(p)
....: a = K(0)
....:
sage: import gc
sage: gc.collect()
8320
sage: LE = [x for x in gc.get_objects() if isinstance(x,type(K))]
sage: len(LE)
2
sage: LE
[Finite Field of size 2, Finite Field of size 99991]
Later today, I'll run doctests.
Note that the patch was written before the latest version of the patch from #715 was created. In particular, it uses TripleDict
, but should better use TripleDictById
(which hasn't been there when I wrote the patch).
Not bad: make ptest
results in only one error with the patch and its dependencies applied on top of sage-5.0.prealpha0:
sage -t -force_lib "devel/sage/sage/rings/polynomial/multi_polynomial_libsingular.pyx"
Exception AttributeError: 'PolynomialRing_field_with_category' object has no attribute '_modulus' in ignored
Exception AttributeError: 'PolynomialRing_field_with_category' object has no attribute '_modulus' in ignored
**********************************************************************
File "/home/simon/SAGE/sage-5.0.prealpha0/devel/sage/sage/rings/polynomial/multi_polynomial_libsingular.pyx", line 418:
sage: len(ring_refcount_dict) == n
Expected:
True
Got:
False
**********************************************************************
1 items had failures:
1 of 18 in __main__.example_4
***Test Failed*** 1 failures.
For whitespace errors, see the file /home/simon/.sage//tmp/multi_polynomial_libsingular_6755.py
[3.6 s]
----------------------------------------------------------------------
The following tests failed:
sage -t -force_lib "devel/sage/sage/rings/polynomial/multi_polynomial_libsingular.pyx"
Total time for all tests: 3.6 seconds
It turns out that the failing test had in fact a wrong design. It was:
sage: import gc
sage: from sage.rings.polynomial.multi_polynomial_libsingular import MPolynomialRing_libsingular
sage: from sage.libs.singular.ring import ring_refcount_dict
sage: n = len(ring_refcount_dict)
sage: R = MPolynomialRing_libsingular(GF(547), 2, ('x', 'y'), TermOrder('degrevlex', 2))
sage: len(ring_refcount_dict) == n + 1
True
sage: Q = copy(R) # indirect doctest
sage: p = R.gen(0) ^2+R.gen(1)^2
sage: q = copy(p)
sage: del R
sage: del Q
sage: del p
sage: del q
sage: gc.collect() # random output
sage: len(ring_refcount_dict) == n
True
Hence, before n is defined, no garbage collection takes place. This is, of course, not correct: The ring_refcount_dict may contain references to a ring created in another doctest, that is only garbage collected in the line before len(ring_refcount_dict)==n
.
In other words: The test did not fail because there was a memory leak.
When I insert a garbage collection right before the definition of n, the test works, and in addition the warning about AttributeError
being ignored vanishes.
The patch fixes the memory leak caused by a strong homset cache (which was not addressed by #715):
sage: for p in prime_range(10^3):
....: K = GF(p)
....: a = K(0)
....:
sage: import gc
sage: gc.collect()
3128
sage: LE = [x for x in gc.get_objects() if isinstance(x,type(K))]
sage: LE
[Finite Field of size 2, Finite Field of size 997]
The patch adds a new doctest demonstrating that it is fixed.
Needs review!
I'll this patch as soon as 715 is finally closed.
I think the ticket title also needs to be changed to reflect the issue addressed, namely that when homset are created, a strong ref is kept forever in sage.???.homset.
When applying the new patch from #715, all tests pass. But when also applying the patch from here, make ptest
results in
sage -t -force_lib devel/sage/sage/libs/singular/ring.pyx # 6 doctests failed
The failing tests concern ring_refcount_dict
- and it is perhaps not surprising that a patch enabling garbage collection of rings has an influence on refcount, such as:
sage: ring_ptr = set(ring_refcount_dict.keys()).difference(prev_rings).pop()
Exception raised:
Traceback (most recent call last):
File "/home/simon/SAGE/sage-5.0.prealpha0/local/bin/ncadoctest.py", line 1231, in run_one_test
self.run_one_example(test, example, filename, compileflags)
File "/home/simon/SAGE/sage-5.0.prealpha0/local/bin/sagedoctest.py", line 38, in run_one_example
OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
File "/home/simon/SAGE/sage-5.0.prealpha0/local/bin/ncadoctest.py", line 1172, in run_one_example
compileflags, 1) in test.globs
File "<doctest __main__.example_8[9]>", line 1, in <module>
ring_ptr = set(ring_refcount_dict.keys()).difference(prev_rings).pop()###line 452:
sage: ring_ptr = set(ring_refcount_dict.keys()).difference(prev_rings).pop()
KeyError: 'pop from an empty set'
Anyway, I need to analyse what happens, and provide a new patch after catching some sleep.
Jean-Pierre, I hope you like the new ticket title!
Yes, better title !
I've not have any time to have a look at these cache tickets in the last few days, but be sure I'll have a look soon.
Use the weak TripleDict
from #715 for the cache of homsets
Attachment: trac11521_triple_homset.patch.gz
The doctest problem was easy enough to solve: When starting the example with a garbage collection, then things work. Actually I am surprised that the doctest framework does not do a garbage collection at the beginning of each example, in order to bring the system in a defined state - I'll ask on sage-devel.
Anyway, since the problem is solved and the patch updated, stuff is now ready for review!
Apply trac11521_triple_homset.patch
Ive look at the patch and has nothing to say except for weakref being imported twice, I'll provided a reviewer patch for that.
Ha, maybe we should mention the caching system in the doc as well, I'm ready to do that in a reviewer patch as well.
However, I'll wait for #715 to be closed before positive reviewing this one, just in case the changes there have consequences here (which should not be the case).
Replying to @jpflori:
Ha, maybe we should mention the caching system in the doc as well, I'm ready to do that in a reviewer patch as well.
Oops, you are right! I thought I had add to the docs, but I just added a doctest, no elaborate explanation.
While finally looking at this ticket seriously and adding some doc, I remarked that if you run
sage: V = VectorSpace(QQ,3)
sage: H = Hom(V,V)
sage: H is Hom(V,V)
False
With this tickets patches but also WITHOUT them.
Is this a consequence of #9138 ?
What's strange is that V does not break the unique parent assumption:
sage: V is VectorSpace(QQ, 3)
True
Is that intended because there is a ? It does not look very right to me.
The same "problem" occurs playing for example with finite fields...
With more classical stuff, the cache seems to work as intended
sage: H = Hom(ZZ, QQ)
sage: H is Hom(ZZ, QQ)
True
This last point let me think that we can discuss and fix the VectorSpace situation elsewhere if this is indeed a problem.
My concern here is that it's hard to provide a "smart" example where the hom set is cached and where it can be garbage collected when its domainand codomain are.
Id on't think deleting ZZ and QQ is a good idea :) (and they would be refed elsewhere anyway)
My bad, in fact the "issue" seems to be that X.Hom(Y, category) succeeds but returns non identical results.
Attachment: trac_11521-reviewer.patch.gz
Reviewer patch; added doc
Here is a reviewer patch added a little doc.
I hope you will find it satisfactory.
If so, feel free to set the ticket to positive review, I personnally feel happy with youre code.
My above rant, if it should be addressed, should be elsewhere.
Description changed:
---
+++
@@ -6,7 +6,6 @@
sage: while 1:
....: E = EllipticCurve(j=a); P = E.random_point(); 2*P;
- This example is in fact solved by #715. However, while working on that ticket, another leak has been found, namely
@@ -18,10 +17,9 @@
sage: gc.collect()
0
TripleDict
instead, which were introduced in #715.Apply
-attachment: trac11521_triple_homset.patch + attachment: trac11521_triple_homset.patch + attachment: trac_11521-reviewer.patch
To reiterate what I just posted at #715, I've just successfully run doctests against 5.0.beta10 with qseries
trac715_one_triple_dict.patch
trac_715-reviewer.patch
trac_715-rebase_11599.patch
trac11521_triple_homset.patch
trac_11521-reviewer.patch
The reviewer patch from jpflori looks fine to me. Is this ready to go now?
Reviewer: Jean-Pierre Flori
The reviewer patch looks fine to me. Sorry that I did not react a few days ago. Since Jean-Pierre said I should set it to positive review, I am now doing so.
Description changed:
---
+++
@@ -19,7 +19,8 @@
So, I suggest to start with #715 and solve the second memory leak on top of it. It seems that a strong cache for homsets is to blame. I suggest to use the weak TripleDict
instead, which were introduced in #715.
-Apply +To be merged with #715. Apply
+* the patches from #715
Attachment: trac_11521_homset_weakcache_combined.patch.gz
Use the weak TripleDict from #715 for the cache of homsets. Includes the reviewer patch
I have created a combined patch, by folding the two patches (i.e., mine and Jean-Pierre's reviewer patch).
With #715 and this patch, all doctests pass on my machine. So, from my perspective, it is a positive review, but we should wait for Jean-Pierre's results on 32 bit.
For the patchbot:
Apply trac_11521_homset_weakcache_combined.patch
Description changed:
---
+++
@@ -22,5 +22,4 @@
**To be merged with #715**. Apply
* the patches from #715
-* [attachment: trac11521_triple_homset.patch](https://github.com/sagemath/sage-prod/files/10653173/trac11521_triple_homset.patch.gz)
-* [attachment: trac_11521-reviewer.patch](https://github.com/sagemath/sage-prod/files/10653174/trac_11521-reviewer.patch.gz)
+* [attachment: trac_11521_homset_weakcache_combined.patch](https://github.com/sagemath/sage-prod/files/10653175/trac_11521_homset_weakcache_combined.patch.gz)
I'm just confirming that applying this patch & that at #715 to 5.0-beta13 on a 32-bit linux machine, all tests pass.
Merged: sage-5.1.beta0
Unmerged because unmerging #12313 while keeping #11521 gives on OS X 10.6:
$ ./sage -t --verbose devel/sage/sage/misc/cachefunc.pyx
[...]
Trying:
P = QQ['a, b, c, d']; (a, b, c, d,) = P._first_ngens(4)###line 1038:_sage_ >>> P.<a,b,c,d> = QQ[]
Expecting nothing
------------------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occurred in Sage.
This probably occurred because a *compiled* component of Sage has a bug
in it and is not properly wrapped with sig_on(), sig_off(). You might
want to run Sage under gdb with 'sage -gdb' to debug this.
Sage will now terminate.
------------------------------------------------------------------------
Changed merged from sage-5.1.beta0 to none
Changed dependencies from #11900 #715 to #11900, #715, to be merged with #12313
Originally, this ticket was about the following memory leak when computing with elliptic curves:
This example is in fact solved by #715. However, while working on that ticket, another leak has been found, namely
So, I suggest to start with #715 and solve the second memory leak on top of it. It seems that a strong cache for homsets is to blame. I suggest to use the weak
TripleDict
instead, which were introduced in #715.To be merged with #715. Apply
Depends on #12969 Depends on #715
Dependencies: #12969; to be merged with #715
CC: @jpflori @nthiery
Component: coercion
Keywords: sd35
Author: Simon King, Nils Bruin
Reviewer: Jean-Pierre Flori, Nils Bruin, Simon King
Merged: sage-5.5.beta0
Issue created by migration from https://trac.sagemath.org/ticket/11521