Closed jpflori closed 10 years ago
Commit: 05fb569
New commits:
[changeset:05fb569] | Change SchemeMorphism back (to cope with a Cython bug), copying the new code from sage.categories.map.Map |
[changeset:8fd09d5] | Copying of PolynomialBaseringInjection and FormalCompositeMap |
[changeset:be37145] | Let SchemeMorphism inherit from Morphism, not from Element |
[changeset:0f38a2c] | Keep strong reference to codomain of weakened coerce maps Keep strong reference to domains of *registered* coercions |
[changeset:a53261d] | Keep a strong reference to the codomain of PrecomposedAction |
[changeset:1ff6f3f] | Add generic copy of maps. Fix copy of elements. Replace _(co)domain everywhere |
[changeset:61d818c] | Replace Map.(co)domain by constant functions, remove ._(co)domain |
[changeset:ebe82df] | Use a proper WeakValueDictionary for number fields |
[changeset:4685c73] | convert_map_from() should only store weak references Similar to coerce_map_from, the detected morphism should be stored only in a weak dictionary, not in a list. |
Replying to @nbruin:
I agree that there is a place for such strong connections, but I have severe reservations about declaring it [
.register_coercion()
] is the only way or even the default way to inform the system about coercions.
Well, I have mentioned ._coerce_map_from_(...)
in several previous posts, and if you look at my thematic tutorial on categories and coercion, you'll find that I consider this the default. And it only yields weak caching.
I have severe reservations about declaring that this code "will never be memory efficient" in sage.
I think that we want some particularly important coercions to be tied to the lifetime of the codomain, and thus we use .register_coercion()
, and we want other coercions to be tied to the minimum of the lifetimes of domain and codomain, and thus we use ._coerce_map_from_()
. I don't think we have a problem here.
Consider:
Qx.<x>=QQ[] K.<a>=NumberField(x^4-2) L.<b>=NumberField(x^2-2,embedding=a^2)
This fits perfectly in the "unchanging universe" model. Also note that the coercion system does not need to let L keep K alive, since the construction parameters, which get kept alive for the life of L by
CachedRepresentation
or something analogous, refer to K already.
It isn't CachedRepresentation
, but this doesn't matter.
Now consider
M.<b>=NumberField(x^2-2)
In the "unchanging universe" (and in sage as well) we have that M is distinct from L. However, I think it's unrealistic to expect that all "embeddings" etc. can be specified at construction time.
Again, nobody has claimed that everything needs to be declared at construction
time. There are some particularly important coercions registered at
construction time, namely the coerce embedding (if it exists then it is
unique) and those installed by .register_coercion()
. Everything else is
dynamical, based on _coerce_map_from_()
.
In my description from comment:84, note that the digraph is not totally
static. It has static parts (corresponding to coerce embeddings and coercions
fixed by .register_coercion()
) and dynamic shortcuts (corresponding to _coerce_map_from_
).
So I think, even though it's not possible currently in sage, that one should allow for
m1=Hom(M,K)([a^2]) m2=Hom(M,K)([-a^2]) M.register_embedding(m1)
I don't know if this is reasonable, but at least it is against what people
originally wanted with the coerce embedding. If you declare the coerce
embedding phi from a number field K
to, say, CC
, then you consider K
as a
subfield of CC
. If you provide another number field L
, which is isomorphic
to K
, with a different embedding psi into CC
, then adding an element of
K
to an element of L
is done by embedding both into CC
and then adding
complex numbers.
Since we think of K and L as different subfields of CC
and not as abstract
fields, we must consider K and L as different objects, and so the different
embedding must play a role in the cache key for K and L. This is why they have
to be provided at construction time.
It would be a totally different way of thinking if you tried to do the same
with CC.register_coercion(phi/psi)
or with
CC._coerce_map_from_(K/L)
. Namely, in both cases, you would not be able to
add elements of K and L, because neither K nor L would know about the
embedding. And in fact you would consider K and L as abstract fields, and you
would in fact want K is L
(at least if you fancy unique parents, which I
do...). And then the axioms for coercion would strike: There can be at most
one coercion from K
(i.e., L
) to CC
. Hence, you could not
simultaneously declare different embeddings of K
into CC
as coercions.
Since it pretty much seems to me that number theorists want to comfortable
compute with different isomorphic subfields of CC
, it would thus simply not
feasible to restrict oneself to .register_coercion
and _coerce_map_from_
:
One needs coerce embeddings, and one needs that they are part of the
defining data of a number field.
Note that the choice of
m1
orm2
here leads to different relations betweenM
andK
and hence different universes. In other words, our concept of "globally unique" is not powerful enough to capture the full identity of objects, which would include the coercion relations with objects that haven't been "discovered" yet.
I would state it differently. In order to define K
(a subfield of CC), there
is no way around providing the embedding during creation. "Discovering" a
coercion relation seems the wrong approach here.
And speaking about memory: The embedding of K into CC is stored as an attribute of K, not of CC. Hence, K keeps CC alive, but CC does not prevent K from garbage collection. So, I really don't understand where you see a problem.
In practice, we can usually work around that by for instance changing the names of generators and hence create artificially differently labelled objects but that's already not a possibility for creating different copies of
ZZ^n
, since there are no generator names to choose there.
Well, if one has obvious distinguishing data, such as an embedding, then there is nothing artificial when using them.
I think one has to accept the reality here: what we have is a collection of objects whose relations do change in time.
No. I don't see anything dynamic in your "embedded numberfield" examples. A subfield is a subfield is a subfield.
That's not the only thing coercion does. It may also find "common covering structures", which may lead to construction of new parents. Those definitely don't deserve to get nailed into memory. Yet, the code that creates these parents will look (to the coercion system) as a "user", so it would be using these strong-referencing coercion registration routines.
What you seem to mention here is the pushout construction. It mainly relies on "construction functors". I don't even know if it takes the coerce embeddings into account at all.
Anyway, the new parents created by pushouts indeed play the same role as parents
created by the user. Let's try to be more concrete. Let P and Q be
parents, you want to add an element p of P to an element q of Q, and the pushout
construction finds a parent R such that both P and Q coerce into R, allowing
to perform the addition in R, resulting in r=R(p)+R(q)
.
Now, it could be that R.register_coercion(P)
and R.register_coercion(Q)
are both executed in R.__init__
(but see the remark below). In the current
code (also in my branch), this would imply a strong reference chain from R to
both P and Q. Hence, even if you did del p,q,P,Q
, P and Q could not be
garbage collected.
But I don't think we should see this problematic, for several reasons:
Pushout constructions don't arise particularly often. Normally, either P coerces into Q or Q coerces into P, or both embed into the same parent anyway, and I have mentioned above: With a coerce embedding, the existance of R would not prevent P and Q from garbage collection (plus, it has nothing to do with pushout anyway...)
Is register_coercion
really used so often? I think _coerce_map_from_
is
more commonly used, and then the existence of R would not prevent P and
Q from garbage collection.
I think it may well be a feature, not a bug, that one at some point can just be left with the shortcuts and that the intermediates have fallen out.
How would you guarantee that you kept in mind enough shortcuts to not change connectivity by throwing away intermediates?
The natural way of staying close to "discovering a permanent universe" is by never throwing away anything that has been discovered, and I think we agree that's a "memory leak".
No, we disagree.
It is a memory leak if a connected component (not taking into account shortcuts) of the coercion graph can not be garbage collected, even though there is no external strong reference (which may be a coerce embedding) to any vertex of the connected component.
Note that this was exactly the problem with the example from the ticket
description! As I have pointed out in comment:18, it was not the case that
the problem lay in _coerce_from_list_
, because this was empty. In
particular, it was not the case that .register_coercion()
was to
blame.
Instead, the memory leak came from short-cuts, i.e., from the stuff
stored in _coerce_from_hash
, which can also be seen in
attachment: chain.png.
Hence, the quadratic field and CC did belong to different connected components of the coercion graph, but the shortcut kept Q alive.
And there surely is a means available to add a coercion that doesn't tie the lifespan of two parents too closely: Implement
P._coerce_map_from_(Q)
, which can return a map or simply "True" (in the latter case, conversion is used to coerce Q into P). The result is cached inP._coerce_from_hash
, but not inP._coerce_from_list
.You mean: implement
P._install_coerce_map_from_(m)
, which does:_coerce_from_hash[Domain(m)]=m
. I think it is quite important to be able to manipulate the coercion graph without having to modify library code.
This might be a good idea. So, we would have .register_coercion()
for
permanent pathways, and _install_coerce_map_from()
(with a similar
semantics, i.e., you can either just provide the parent or a whole morphism)
for impermanent shortcuts.
Let me try to summarise what is (or may be) left to do:
SchemeMorphism
,_install_coerce_map_from()
.cdef public
attribute _codomain
, since this would allow faster access than calling .codomain()
, and since the codomain will be strongly referenced anyway.Anything I forgot?
Changed work issues from Fix elliptic curves code to none
And I think I should do a further test: I will modify Parent.__init__
so that it prints the type of self to a log file, and so I'll see how many parents are created with and without the patch. If we see a sudden change in the statistics for some types, then it might point us to code that implicitly relies on a permanent cache.
Replying to @simon-king-jena:
Good, I think we at least are in sufficient agreement for the practical implications of what we need.
Let me try to summarise what is (or may be) left to do:
- Add a section explaining the current weak coercion model, to facilitate maintenance,
- Add
_install_coerce_map_from()
.
To clarify this point (and it might be helpful to put something along these lines in the documentation), it seems to me there would be 4 ways to put coercions in place:
_coerce_map_from_
. Since it's programmatic, it seems it can be rediscovered easily when parents get garbage collected and recreated, so it seems appropriate maps stemming from here do not lead to lifetime implications..register_coercion
)register_embedding
does that, but only can only accommodate one per domain)_coerce_map_from_
, feel much more heavy-weight (subclassing a whole parent just to extend _coerce_map_from_
may be appropriate for someone who is concerned with developing sage, but seems inappropriate to me for someone who is thinking about using sage to do a complicated computation.
- Perhaps: Let the string representation of a weakened map consist of a warning to not use this map outside of the coercion framework.
I think yes: Due to cyclic references, parents will usually survive until the next GC, which may be quite a while after the last reference is lost. So place where the map becomes liable to turn defunct may be quite distant from the place where the map if found to be defunct. People deserve a reminder about that.
Replying to @nbruin:
Replying to @simon-king-jena:
Let me try to summarise what is (or may be) left to do:
- Add a section explaining the current weak coercion model, to facilitate maintenance,
- Add
_install_coerce_map_from()
.To clarify this point (and it might be helpful to put something along these lines in the documentation), ...
But where?
I think the fourth point is desirable because the alternative, programmatic solutions via
_coerce_map_from_
, feel much more heavy-weight (subclassing a whole parent just to extend_coerce_map_from_
may be appropriate for someone who is concerned with developing sage, but seems inappropriate to me for someone who is thinking about using sage to do a complicated computation.
OK. But then, this method should be visible, hence, not starting with an underscore.
- Perhaps: Let the string representation of a weakened map consist of a warning to not use this map outside of the coercion framework.
I think yes: Due to cyclic references, parents will usually survive until the next GC, which may be quite a while after the last reference is lost. So place where the map becomes liable to turn defunct may be quite distant from the place where the map if found to be defunct. People deserve a reminder about that.
OK. Perhaps: If the map is weak but the domain reference is still available, then show the map as
"""WARNING: This %s map from %s to %s
may become defunct after the next garbage collection.
For usage outside of the coercion system, try to create a copy,
or apply the method `_make_strong_references()`"""%(self._repr_type(), self.domain(),self.codomain()
and if the domain is unavailable, then show the map as
"Defunct %s map"%self._repr_type()"
Replying to @simon-king-jena:
To clarify this point (and it might be helpful to put something along these lines in the documentation), ...
But where?
The thematic tutorial coercion_and_categories
would be a natural place, but would it be enough? Granted, the doc of register_embedding
, register_coercion
and install_coercion
should refer to each other and elaborate on the different use cases and should also mention _coerce_map_from_
.
With vanilla public/sage-git/master, I find that sage -t --all
results in 1083022 calls to Parent.__init__
, while with the branch from here it is called 1129534 times.
Hence, there is an increase in the number of parents being created. No surprise, since this ticket is about making parents garbage collectable in some situations.
Nevertheless, it might make sense to see whether some types of parents show a particularly strong increase, so that we can then decide whether we should have some stronger cache for these types.
I studied the differences in parent creation during sage -t --all
in more detail.
Absolute differences
Here are the 10 classes that have the most additional creations in the ticket branch compared with the public/sage-git/master branch (the list shows the absolute number of additional creations and the name of the class):
(19873, 'sage.rings.homset.RingHomset_generic')
(16597, 'sage.categories.homset.Homset')
(2839, 'sage.rings.finite_rings.integer_mod_ring.IntegerModRing_generic')
(2270, 'sage.rings.homset.RingHomset_quo_ring')
(2137, 'sage.rings.finite_rings.homset.FiniteFieldHomset')
(1960, 'sage.rings.number_field.morphism.NumberFieldHomset')
(1279, 'sage.sets.positive_integers.PositiveIntegers')
(851, 'sage.combinat.tableau.Tableaux_all')
(831, 'sage.modules.free_module_homspace.FreeModuleHomspace')
(481, 'sage.rings.polynomial.polynomial_ring.PolynomialRing_dense_mod_p')
Here are the "bottom 10" classes. As you can see, there are parents for which we have considerably less creations with the ticket than without, which comes as a surprise to me:
(-57, 'sage.sets.family.LazyFamily')
(-134, 'sage.combinat.words.words.Words_all')
(-134, 'sage.combinat.permutation.StandardPermutations_all')
(-136, 'sage.combinat.permutation.Permutations_set')
(-142, 'sage.combinat.subset.Subsets_sk')
(-158, 'sage.sets.non_negative_integers.NonNegativeIntegers')
(-166, 'sage.combinat.cartesian_product.CartesianProduct_iters')
(-170, 'sage.combinat.integer_list.IntegerListsLex')
(-253, 'sage.combinat.skew_partition.SkewPartitions_rowlengths')
(-3838, 'sage.sets.set.Set_object_enumerated')
Relative differences
Here are the 10 classes that have the biggest relative increase in number of creations (ticket compared with master):
+14.67% sage.combinat.tableau.Tableaux_all
+3.79% sage.combinat.skew_tableau.SemistandardSkewTableaux_all
+1.00% sage.combinat.skew_tableau.SkewTableaux
+1.00% sage.combinat.partition_tuple.PartitionTuples_all
+0.91% sage.rings.homset.RingHomset_quo_ring
+0.75% sage.categories.examples.sets_cat.PrimeNumbers_Facade
+0.67% sage.combinat.crystals.affine.AffineCrystalFromClassicalAndPromotion
+0.66% sage.groups.matrix_gps.homset.MatrixGroupHomset
+0.64% sage.combinat.partition_tuple.PartitionTuples_level
+0.60% sage.structure.list_clone_demo.IncreasingIntArrays
Here are the 10 classes with the biggest relative decrease in the number of creations:
-0.25% sage.combinat.crystals.kirillov_reshetikhin.KR_type_A2_with_category
-0.25% sage.combinat.crystals.kirillov_reshetikhin.KR_type_A2
-0.25% sage.categories.examples.finite_monoids.IntegerModMonoid
-0.33% sage.sets.integer_range.IntegerRangeEmpty
-0.33% sage.combinat.affine_permutation.AffinePermutationGroupTypeG
-0.33% sage.combinat.affine_permutation.AffinePermutationGroupTypeC
-0.38% sage.combinat.crystals.infinity_crystals.InfinityCrystalOfTableauxTypeD
-0.39% sage.combinat.permutation.CyclicPermutations
-0.40% sage.combinat.vector_partition.VectorPartitions
-0.54% sage.combinat.composition_tableau.CompositionTableaux_all
Conclusion
Even though the absolute differences in the creation of various kinds of homsets seem to be dramatic, the relative differences suggest that there is no serious problem here. There are only four classes that show an increase of at least 1%. Three of them are related with tableaux, that's why I add Nicolas to the ticket: Perhaps we want to change the cache for tableaux?
Concerning a new method install_coercion
: Wouldn't it be easier to provide register_coercion
with an optional argument permanent=True
, so that using the method with permanent=False
would do what you suggested for install_coercion
? I guess having two methods install_coercion
and register_coercion
could confuse the user.
Concerning documentation: I just found that the underscore methods of sage.structure.parent.Parent
are documented in the reference manual. Hence it should be no problem to add documentation of _coerce_map_from_
directly in-place.
And I just notice that the documentation of the module sage.structure.parent
starts with a "simple example of registering coercions", which I find rather obscure and which does things in a way that we would do differently today. E.g., it does not initialise the category, but overrides the method category()
. And it calls self._populate_coercion_lists_()
, which I have never seen in code created in the past few years.
Hence, I'll update this example.
Replying to @simon-king-jena:
And I just notice that the documentation of the module
sage.structure.parent
starts with a "simple example of registering coercions", which I find rather obscure and which does things in a way that we would do differently today. E.g., it does not initialise the category, but overrides the methodcategory()
. And it callsself._populate_coercion_lists_()
, which I have never seen in code created in the past few years.Hence, I'll update this example.
Hm. I am undecided.
Perhaps it would be better to focus here on fixing the memory leak (which, I think succeeded), only documenting with examples that it has worked.
Hence, on this ticket, I would just
SchemeMorphism
Everything else should perhaps better be done on a follow-up ticket:
permanent=True
option to register_coercion()
.What do you think?
Branch pushed to git repo; I updated commit sha1. New commits:
[changeset:452d216] | Add docs to SchemeMorphism |
I think there is a further technical thing I could do in the next commit: I have implemented __copy__
for some types of morphisms. But there already exist methods called _extra_slots()
and _update_slots()
, and I think in order to implement copying one should update these. This might (on a different ticket) also help to provide a default pickling for maps.
Branch pushed to git repo; I updated commit sha1. New commits:
[changeset:5168cfd] | Generic copy method for maps, using _update_slots Use a cdef _codomain, since the codomain is strongly refed anyway Add doctests |
Replying to @simon-king-jena:
Perhaps it would be better to focus here on fixing the memory leak (which, I think succeeded), only documenting with examples that it has worked.
Hence, on this ticket, I would just
- provide missing docs for
SchemeMorphism
Done in the current commit.
- Perhaps: Re-introduce a cdef public attribute _codomain, since this would allow faster access than calling .codomain(), and since the codomain will be strongly referenced anyway.
Done in the current commit.
In addition, I changed the new generic __copy__
method of maps so that it uses _update_slots
and _extra_slots
. This complies with how currently pickling is implemented by default. For several types of maps, I implemented copying accordingly.
- Let the string representation of a weakened map consist of a warning to not use this map outside of the coercion framework.
Still todo.
Everything else should perhaps better be done on a follow-up ticket:
- Add documentation explaining the current weak coercion model, to facilitate maintanance,
- Add
permanent=True
option toregister_coercion()
.
Do you agree that this shall be on a different ticket?
Replying to @simon-king-jena:
Even though the absolute differences in the creation of various kinds of homsets seem to be dramatic, the relative differences suggest that there is no serious problem here. There are only four classes that show an increase of at least 1%. Three of them are related with tableaux, that's why I add Nicolas to the ticket: Perhaps we want to change the cache for tableaux?
Could you post a quick summary (say in the ticket description/title) of what the current patch does?
Thanks!
Description changed:
---
+++
@@ -6,3 +6,105 @@
....:
(This is with 5.10.rc0)
+
+Problem analysis
+
+The quadratic field is created with a coerce embedding into CLF
. At the same
+time, this coerce embedding is stored in CLF._coerce_from_hash
:
+
+ +sage: phi = CLF.coerce_map_from(Q) +sage: phi is Q.coerce_embedding() +True +sage: Q in CLF._introspect_coerce()['_coerce_from_hash'] +True +
+The "coerce_from_hash" is a MonoDict
, hence, has only a weak reference to the key
+(Q, in this case). However, there still is a strong reference from
+CLF to the coerce map phi. And phi has a strong reference to its
+domain, thus, to Q. Hence, the existence of CLF prevents garbage collection of
+Q.
+
+And there is a second chain of strong references from CLF to Q: From CLF to
+phi to the parent of phi (i.e., a homset) to the domain Q of this homset.
+
+Suggested solution
+
+We can not turn the reference from CLF to phi into a weak reference, because
+then even a strong reference to Q would not prevent phi from garbage
+collection. Hence, we need to break the above mentioned reference chains in
+two points. In the attached branch, maps generally keep a strong reference to
+the codomain (this is important in composite maps and actions), but those used
+in the coercion system (and only there!!) will only have a weak
+reference to the domain, and they set the cdef ._parent
attribute to None
+(hence, we also override .parent()
, so that it reconstructs the homset if
+the weak reference to the domain is still valid).
+
+To preserve the domain()/codomain()
interface, I have removed the method
+domain()
and have replaced it by a cdef public attribute that will either
+hold a weak reference (which returns the domain when called, hence, the
+interface does not change) or a ConstantFunction
(which should actually be
+faster to call than a method). Since accessing a cdef attribute is still
+faster, the cdef attribute _codomain
is kept (since this will always be a
+strong reference), but _domain
has been removed.
+
+This "weakening of references" is done for the coercions found by
+discover_coerce_map_from()
stored into _coerce_from_hash
. So, this mainly
+happens for things done with _coerce_map_from_()
and with composite
+maps. Similarly for _convert_from_hash
.
+
+Weakening is not used on the maps that are explicitly registered by
+.register_embedding()
and .register_coercion()
. This is in order to
+preserve the connectivity of the coercion graph. The register_*
methods
+are only used on selected maps, that are of particular importance for the
+backtrack search in discover_coerce_map_from()
. These strong
+registrations do not propagate: Compositions of strongly registered
+coercions found by discover_coerce_map_from()
will be weakened.
+
+Since weakened maps should not be used outside of the coercion system, its
+string representation shows a warning to replace them by a copy. The attached
+branch implements copying of maps in some additional cases.
+
+SchemeMorphism
can not inherit from Morphism
, because of a bug with
+multiple inheritance of a Python class from Cython extension classes. But once
+this bug is fixed, we surely want to make SchemeMorphism
inherit from
+Morphism
. This transition is prepared here.
+
+In any case, the commit messages should give a concise description of what has
+been done.
+
+Still TODO
+
+Let the string representation of weakened maps point the user to the need of
+creating a copy.
+
+TODO in future tickets
+
+- Provide a documentation of the use of weak references in coercion, and of
.register_coercion()
that weakens the coercion._coerce_map_from_()
, but of course ._coerce_map_from()
could not easily+Effects on the overall functioning of Sage
+It is conceivable that some parts of Sage still suppose implicitly that stuff
+cached with UniqueRepresentation
is permanently cached, even though the
+seemingly permanent cache was not more than a consequence of a memory leak in
+the coercion system. With the attached branch, garbage collection of parent
+structures will much more often become possible. Hence, code that relied on a
+fake-permanent cache would now need to create the same parent repeatedly.
+I (Simon) have tested how many additional parent creations occur with the
+attached branch when running sage -t --all
. The findings are summarised in
+comment:107: The number of additional parent creations increased by not more
+than 1% for all but two parent classes (both related with tableaux). I also
+found that the time to run the tests did not significantly increase.
+Jean-Pierre has occasionally stated that some of his computations have been +infeasible with the memory leak in the above example. I hope that his +computations will now succeed.
Replying to @nthiery:
Could you post a quick summary (say in the ticket description/title) of what the current patch does?
Done. OK, the summary is actually not quick. Sorry.
Work Issues: String repr. of weakened maps; copying/pickling of maps
I think changing the string representation of weakened maps should be done here. And then, in a couple of tests, one needs to copy the map in order to get the test pass.
Therefore, I suggest to implement copying for all maps here as well, not on a different ticket. After all, it is not difficult: One just looks at the list of cdef attributes, and implements _extra_slots
and _update_slots
taking exactly these attributes into account. The only difficulty is to really catch all kinds of maps.
Note that in most cases phi == loads(dumps(phi))
would return False, but this is since comparison of maps is often not implemented---and this is what I will certainly not attempt to implement here.
Description changed:
---
+++
@@ -75,8 +75,9 @@
**__Still TODO__**
-Let the string representation of weakened maps point the user to the need of
+- Let the string representation of weakened maps point the user to the need of
creating a copy.
+- Provide copying for *all* kinds of maps.
**__TODO in future tickets__**
@@ -87,7 +88,6 @@
map. It would hence have the same effect as returning a map by
`._coerce_map_from_()`, but of course `._coerce_map_from()` could not easily
be changed in an interactive session.
-- provide copying for *all* kinds of maps.
**__Effects on the overall functioning of Sage__**
I wonder: Would it make sense to implement a generic comparison for maps, based on the dictionary returned by self._extra_slots({})
? Namely, these data are used for pickling and copying of maps, and thus it seems reasonable to me that two maps are equal if and only if the pickling data coincide.
What do you think? Worth trying? Better be done on a different ticket?
Replying to @simon-king-jena:
I wonder: Would it make sense to implement a generic comparison for maps, based on the dictionary returned by
self._extra_slots({})
? Namely, these data are used for pickling and copying of maps, and thus it seems reasonable to me that two maps are equal if and only if the pickling data coincide.
And the "weakened" copies of maps (with in the near future an easily distinguished string rep) would be equal to their counter parts? That may well be a desirable choice, but by no means uncontroversial. I don't think in coercion we ever depend on equality testing on maps do we? I think it's better done on a separate ticket.
Another suggestion about how to get the the strong references in for register_coercion. If the maps put in by register_coercion are used afterwards many times to derive other cached coercion maps from, it would perhaps be preferable to have them in a form that is readily usable for that, i.e., as "weakened" maps (it means the map can go straight into map compositions etc.). Otherwise we may well end up making copies repeatedly in the coercion framework.
We could get the strong connections in by, for instance, referencing the domains explicitly, say on an attribute _domains_with_registered_coercions_to_here
. The coercion map itself could simply live in the normal cache, as a weakened map.
The same applies to _register_embedding
, although perhaps the coercion discovery treats the store differently, and storing a strong reference to even a weakened map implies a strong reference to the codomain.
Other question, does _parent=None
imply a weakened map? (I guess isinstance(_domain,weakref.ref) definintely does) or are there other reasons for the parent to be unset?
Replying to @nbruin:
And the "weakened" copies of maps (with in the near future an easily distinguished string rep) would be equal to their counter parts?
Of course. Equality does (and should) not depend on weak references, I think
I don't think in coercion we ever depend on equality testing on maps do we?
We don't. Otherwise, people would have had comparison implemented already.
I think it's better done on a separate ticket.
Agreed.
Concerning "near future": I am already testing a new commit that provides
_extra_slots
and _update_slots
, add a test...Use the copy functionality on most tests that expose coerce maps. Hence, I replace
sage: R.coerce_map_from(P)
...
by
sage: copy(R.coerce_map_from(P))
and also add a link to this ticket. Not everywhere, but in most places.
Another suggestion about how to get the the strong references in for register_coercion. If the maps put in by register_coercion are used afterwards many times to derive other cached coercion maps from, it would perhaps be preferable to have them in a form that is readily usable for that, i.e., as "weakened" maps (it means the map can go straight into map compositions etc.). Otherwise we may well end up making copies repeatedly in the coercion framework.
Why do you think they would/should be copied inside of the coercion model? I thought we already had agreed that copying is needed when exposing a coerce map to the user (this is why I suggested that the string repr contains a warning!). But certainly not internally. This would be by far too slow.
Suppose you have two non-weakened maps phi and psi, and then do chi = phi*psi
(a composite map). When you then weaken chi
, neither phi
nor psi
would be changed. So, why copying?
We could get the strong connections in by, for instance, referencing the domains explicitly, say on an attribute
_domains_with_registered_coercions_to_here
.
This is already done in my current branch, and it is called _registered_domains
(simply a list).
The coercion map itself could simply live in the normal cache, as a weakened map.
No, because we need some container that only stores those maps that are considered in the backtracking algorithm. So, the current separate list _coerce_from_list
must be preserved.
The same applies to
_register_embedding
, although perhaps the coercion discovery treats the store differently, and storing a strong reference to even a weakened map implies a strong reference to the codomain.
All weakened maps still have a strong reference to the codomain. Only the reference to the domain will be weak. And register_embedding
is still simply assigning the embedding to an attribute _embedding
of the domain.
Other question, does
_parent=None
imply a weakened map? (I guess isinstance(_domain,weakref.ref) definintely does) or are there other reasons for the parent to be unset?
I guess one could test self._parent is None
, rather than typetest stuff. This should actually be faster.
Concerning "reasons": The parent is unset, because otherwise we have a chain of references from the map to the domain, namely via the parent (i.e., the homset). Hence, having a weak reference from the map to the domain would be futile if there is a reference from the map to its parent. Note that alternatively one could have a weak reference from the homset to the domain. But I think we have agreed above that we don't want this as a default.
Replying to @simon-king-jena:
Suppose you have two non-weakened maps phi and psi, and then do chi = phi*psi (a composite map). When you then weaken chi, neither phi nor psi would be changed. So, why copying?
Well, if you'd do that then chi
wouldn't really be a weakened map. Assuming maps act on the left, i.e. chi.domain()=psi.domain()
, the resulting structure would have a strong reference to its domain, via psi.domain().
The converse, making a strong composite out of weakened maps, shouldn't be a problem at all (except that if people start looking at the components, they'd be able to get their hands on weakened maps).
I think the coercion system makes a lot of map compositions, and they usually would have to be weakened. That's why it might be worth ensuring that the maps stored internally are already weakened.
This is already done in my current branch, and it is called
_registered_domains
(simply a list).
And the maps inserted into _coerce_from_hash
are weakened or not? Conceptually it would be a little easier if all of them are. Perhaps enforcing such a rule (or at least change most code to comply) is too costly, though.
No, because we need some container that only stores those maps that are considered in the backtracking algorithm. So, the current separate list
_coerce_from_list
must be preserved.
Ah. I didn't realize that. You say that _coerce_from_hash
is not considered by backtracking. Indeed, that changes things. In that case, _coerce_from_list
could be a "weak set" (e.g., a WeakValueDictionary
with trivial keys), since the maps are kept alive by their entries in _coerce_from_hash
, where the key is kept alive by the _registered_domains
. This would get rid of the garbage collection problem, if we ever want to have maps that help coercion discovery but don't have lifetime implications.
(this should go on a different ticket)
Branch pushed to git repo; I updated commit sha1. New commits:
[changeset:364b985] | Add warning to string repr of weakened maps. Implement copying for *all* kinds of maps. |
Replying to @nbruin:
Replying to @simon-king-jena:
Suppose you have two non-weakened maps phi and psi, and then do chi = phi*psi (a composite map). When you then weaken chi, neither phi nor psi would be changed. So, why copying?
Well, if you'd do that then
chi
wouldn't really be a weakened map. Assuming maps act on the left, i.e.chi.domain()=psi.domain()
, the resulting structure would have a strong reference to its domain, via psi.domain().
Correct. Would this be a problem? Let's see:
Let psi: A -> B
and phi: B -> C
be coerce maps, hence, chi=phi*psi: A -> C
is a coerce map as well. Assume that we have done B.register_coercion(psi)
, so that B prevents A from garbage collection. Assume that phi
is only stored in C._coerce_from_hash
, i.e., C would not prevent B from garbage collection. In other words, we assume that phi is weakened but psi isn't.
Let us assume first that we did not discover chi
as a coercion yet. If we have a strong reference to C but no external reference to B, then B and A could of course be garbage collected.
Now, let us assume that we did discover that chi
is a coercion and put it into C._coerce_from_hash
, in the attempt of weakening it. C would have a strong reference to chi
, which has a strong reference to its first map, psi
, which has a strong reference to both its domain A and codomain B. Hence, C would prevent both A and B from garbage collection.
I think this indeed qualifies as a memory leak, according to the definition I gave in some post above.
Difficult. Can this be solved, even with copying? I have to think about it.
This is already done in my current branch, and it is called
_registered_domains
(simply a list).And the maps inserted into
_coerce_from_hash
are weakened or not?
In my current branch, it is not weakened. Perhaps it should be. It would indeed be conceptually easier if a map is in the coercion system if and only if it is weakened. One could do it, since _registered_domains
would keep the domain alive, Note, however, that this would not suffice for fixing the memory leak described above. We would still have that chi
refers to psi
, which strongly refers to its codomain B (the codomain is always strong), and then B._registered_domains
strongly refers to A
.
In this situation,
B.register_coercion(mor)
.chi
was discovered but not registered.phi
was not registered.If C is alive, then we want that it does not prevent A or B from garbage collection. But if both A and C are alive, then chi
must remain a valid map, hence, B must be prevented from garbage collection. It follows that if C is alive then either A and B get collected together, or they both stay alive.
I should catch some sleep now, perhaps I'll find a solution to the puzzle tomorrow.
Description changed:
---
+++
@@ -70,14 +70,12 @@
this bug is fixed, we surely want to make `SchemeMorphism` inherit from
`Morphism`. This transition is prepared here.
+Weakened maps should only be used in the coercion system: A weakened map can become invalid by garbage collection, and the coercion system has the job to remove a map from the coercion cache as soon as it becomes invalid.
+
+Maps outside of the coercion system should be safe against invalidation. Hence, when we take a coerce map, then we should better create a non-weakened copy. The branch also provides copying (and pickling) for *all* kinds of maps and morphisms (hopefully no map/morphism class went unnoticed).
+
In any case, the commit messages should give a concise description of what has
been done.
-
-**__Still TODO__**
-
-- Let the string representation of weakened maps point the user to the need of
-creating a copy.
-- Provide copying for *all* kinds of maps.
**__TODO in future tickets__**
Changed work issues from String repr. of weakened maps; copying/pickling of maps to none
With the new commit I have pushed today, all doctest should pass.
Let me elaborate a bit more on the memory leak from comment:124.
First of all, this leak is not introduced by my branch. Hence, it would probably be better to attempt a fix on a different ticket, as the changes introduced in my branch already are big enough.
Now for a deeper analysis of what happens. I want to argue that '''there is only one scenario in which this leak occurs. This scenario rarely occurs and can easily be avoided.'''
Let phi: A -> B
and psi: B -> C
be maps (sorry for changing the names
compared with comment:124...), and define chi = psi*phi: A -> C
. We assume that phi
and psi
are coerce maps, and thus chi
is a coerce
map as well, but initially Sage is not aware of chi
.
chi
could be registered (i.e., C.register_coercion(chi)
), it could be that
C._coerce_map_from_(A)
provides a shortcut, or it could be that chi
is
discovered by the backtracking algorithm of the coercion system.
Registering chi
Of course, if chi
is explicitly registered as a coercion, then C will (with
the current code!!) keep A alive, and in order to not invalidate chi
, B will
be kept alive as well. I don't consider this a memory leak, since it is an
explicit registration.
_coerce_map_from_
Typically, C._coerce_map_from_(A)
just returns True, None or False, and not
a map. If it returns true, then a direct conversion chi'
from A to C is
stored as coercion. Note that chi'
is not a composite map. So, we would be
in a totally different situation. Since chi'
has no reference to B, to phi
or to psi
, there is no leak in this case.
Theoretically, C._coerce_map_from_(A)
could return the composite map
chi
. This would be possible, and it would create a memory leak. Hence, we
learn that _coerce_map_from_
should better not return a composite map. I
don't think we can automatically avoid a leak in this case.
Discovering chi
by backtracking
We need to distinguish cases, since there are different ways of how the
coercion system became aware of phi
and psi
.
The punchline is: '''If chi
is discovered by backtracking then psi
is
stored in C._coerce_from_list
.''' Without psi
being on this list,
backtracking won't find chi
.
Hence, there has C.register_coercion(psi)
been done. With the current code, it
means that C will keep B alive.
There remain only three cases:
Assume that phi
is a registered coercion. Hence, with the current code, B
keeps A alive, and still C will keep B alive. Hence, C also keeps A
alive. Adding chi
to C._coerce_from_hash[A]
won't change these lifetime
dependencies. No leak.
Assume that phi
is a coerce embedding. Hence, A will keep B alive, and
still C will keep B alive, but neither C nor B keep A alive. In particular, phi
is weakened,
so that there is no strong reference from phi
to A. Adding chi
to
C._coerce_from_hash[A]
will not change these lifetime dependencies, since
the key A is only weakly referenced. No leak.
Assume that phi
has been discovered by backtracking or has been provided by
B._coerce_map_from_(A)
as a short-cut. In particular, it is weak and only has a weak
reference to A. Then, still C keeps B alive, but B does not keep A alive, nor does A keep B
alive, and C will also not keep A alive. If we put
C._coerce_from_hash[A]=chi
, then again C will not prevent A from garbage
collection, since A is only weakly referenced in the MonoDict
, and if
there is no external reference to C, then a strong reference to A will not
be enough to keep B alive. No leak.
Conclusion
A composite map can only arise in the coercion system, (1) if it is explicitly
registered, or (2) if the second map of the composition is explicitly
registered, or (3) if the composite map is returned by _coerce_map_from_
.
I think case (1) does not constitute a leak. I have shown that there is no memory leak in case (2). Case (3) is a leak, but this case can easily be avoided by returning a "simple" map that is mathematically equivalent to the composite map.
PS: Since teaching will start next week for me, I will probably not be able to fix the leak from comment:124, even if you succeed to convince me that it really is a leak. So, I guess it is safe to start reviewing the attached branch, I think it will not change in the next few days...
PPS: I just found that Sage's coercion system is clever enough to find a composite map if phi
is registered as coerce embedding and psi
is a short-cut:
sage: A = Aclass()
sage: B = Bclass()
sage: C = Cclass()
sage: phi = sage.categories.map.Map(A,B)
sage: A.register_embedding(phi)
sage: psi = C.coerce_map_from(B)
sage: print psi
Generic map:
From: <class '__main__.Bclass'>
To: <class '__main__.Cclass'>
WARNING: This map has apparently been used internally
in the coercion system. It may become defunct in the next
garbage collection. Please use a copy.
sage: print phi
Generic map:
From: <class '__main__.Aclass'>
To: <class '__main__.Bclass'>
WARNING: This map has apparently been used internally
in the coercion system. It may become defunct in the next
garbage collection. Please use a copy.
sage: C.coerce_map_from(A)
Composite map:
From: <class '__main__.Aclass'>
To: <class '__main__.Cclass'>
WARNING: This map has apparently been used internally
in the coercion system. It may become defunct in the next
garbage collection. Please use a copy.
Here, before discovering and caching the composite map, A keeps B alive because of the embedding, and C neither keeps A nor B alive. After caching the composite map, A still keeps B alive, and C still does not keep A alive, because it only occurs as weak key in a MonoDict
.
But we have
sage: del psi, B, A
sage: import gc
sage: _ = gc.collect()
sage: len([x for x in gc.get_objects() if isinstance(x,Aclass)])
0
sage: len([x for x in gc.get_objects() if isinstance(x,Bclass)])
1
So, why is B not garbage collected? To be investigated, I need to hurry now.
Replying to @simon-king-jena:
sage: del psi, B, A sage: import gc sage: _ = gc.collect() sage: len([x for x in gc.get_objects() if isinstance(x,Aclass)]) 0 sage: len([x for x in gc.get_objects() if isinstance(x,Bclass)]) 1
So, why is B not garbage collected? To be investigated, I need to hurry now.
Argh. Because simply I forgot to delete phi...
Let's try again, this time without leaving a reference to the maps.
sage: import gc
sage: class Aclass(Parent): pass
sage: class Bclass(Parent): pass
sage: class Cclass(Parent):
....: def _coerce_map_from_(self, P):
....: if isinstance(P, Bclass):
....: return sage.categories.map.Map(P,self)
....:
sage: A = Aclass()
sage: B = Bclass()
sage: C = Cclass()
sage: A.register_embedding(sage.categories.map.Map(A,B))
sage: C.has_coerce_map_from(A)
True
sage: del A,B
sage: gc.collect()
862
sage: len([x for x in gc.get_objects() if isinstance(x,Aclass)])
0
sage: len([x for x in gc.get_objects() if isinstance(x,Bclass)])
0
So, no leak.
I think my analysis is now complete: A memory leak caused by composite coerce maps will only arise if C._coerce_map_from_(A)
returns the composite map. I have also shown that there is no need to let C._coerce_map_from_(A)
return a composite map.
Hence, I would argue that C._coerce_map_from_(A)
returning a composite map is a misuse. Granted, it is far from obvious that it is a misuse. I feel tempted to investigate how often composite maps are actually returned by _coerce_map_from_
.
Replying to @simon-king-jena:
Hence, I would argue that
C._coerce_map_from_(A)
returning a composite map is a misuse. Granted, it is far from obvious that it is a misuse. I feel tempted to investigate how often composite maps are actually returned by_coerce_map_from_
.
For example, I found that during startup of Sage composite maps are returned by Q._coerce_map_from_(P)
for the following values of P, Q:
Coercion Rational Field to Complex Field with 2 bits of precision
Coercion Rational Field to Complex Field with 53 bits of precision
Coercion <type 'int'> to Univariate Polynomial Ring in x over Integer Ring
Coercion Integer Ring to Complex Field with 2 bits of precision
Coercion <type 'int'> to Complex Field with 53 bits of precision
Coercion <type 'int'> to Real Interval Field with 53 bits of precision
Coercion <type 'int'> to Univariate Polynomial Ring in x over Rational Field
Coercion <type 'int'> to Real Interval Field with 64 bits of precision
Coercion Complex Lazy Field to Complex Double Field
Coercion <type 'int'> to Univariate Polynomial Ring in x over Algebraic Real Field
I guess in all of these cases the domain and the "middle parent" will be immortal anyway.
Replying to @simon-king-jena:
First of all, this leak is not introduced by my branch. Hence, it would probably be better to attempt a fix on a different ticket, as the changes introduced in my branch already are big enough.
Agreed.
Typically,
C._coerce_map_from_(A)
just returns True, None or False, and not a map. If it returns true, then a direct conversionchi'
from A to C is stored as coercion.
Hm, I agree that the references are in that case out of reach of what we're considering here, but all this is saying is "it is okay to use conversion as a coercion from A to C". This conversion still has to be programmed/stored/discovered somewhere, so I'd expect that some conversion cache might be liable to hold a strong reference.
The punchline is: '''If
chi
is discovered by backtracking thenpsi
is stored inC._coerce_from_list
.''' Withoutpsi
being on this list, backtracking won't findchi
.Hence, there has
C.register_coercion(psi)
been done. With the current code, it means that C will keep B alive.
Right, I was already expecting something along these lines when you explained the function of coerce_from_list
: The backbone of the coercion framework presently requires lifetime specifications to be explicit and it seems this is not just a by-product of the implementation, it seems to be part of the spec. That's fine by itself. Whether having such implications is sufficient for sage in the future remains to be seen, but changing that is a redesign problem that would need to be carefully considered (just as it might be desirable to allow multiple embeddings to be registered)
As a consequence, in the present model, register_coercion(...,strong=false)
would not be advisable.
Case (3) is a leak, but this case can easily be avoided by returning a "simple" map that is mathematically equivalent to the composite map.
This would be a necessary step to avoid the leak, and all the coercion system can do, but if programmed in a generic way, the references causing the leak would likely still be present internally.
Replying to @nbruin:
The punchline is: '''If
chi
is discovered by backtracking thenpsi
is stored inC._coerce_from_list
.''' Withoutpsi
being on this list, backtracking won't findchi
.
Modulo the oversight I have corrected in my previous posts:
If a composed map chi
is discovered by backtracking, then either the second map is registered as a coercion (hence, C keeps B alive) or the first map is the coerce embedding of A (hence, A keeps B alive). But, as I have shown, storing the composed map as coercion from A to C (in C._coerce_from_hash[A]
) does not cause a leak, at least not with the attached branch.
Right, I was already expecting something along these lines when you explained the function of
coerce_from_list
: The backbone of the coercion framework presently requires lifetime specifications to be explicit and it seems this is not just a by-product of the implementation, it seems to be part of the spec. That's fine by itself. Whether having such implications is sufficient for sage in the future remains to be seen, but changing that is a redesign problem that would need to be carefully considered
Agreed. Currently, the coercion system operates on a virtual digraph, and I could actually imagine that this digraph could become an actual object with a fast graph backend. This might give more flexibility for our coercion system. But this would require a major rewrite.
(just as it might be desirable to allow multiple embeddings to be registered)
Why? What should be done with these embeddings?
Replying to @simon-king-jena:
Why? What should be done with these embeddings?
I don't have a direct application in mind (hence the might), but just for symmetry it seems appropriate.
One possible example would be someone working on some weak approximation problem, having a whole bunch of number fields K with specified embeddings in CC as well as Qp (for some p). In this application, Qp may well be just as immortal as CC is, so using register_coercion would not express the right life time implications: The K should get deleted while Qp remains, just as CC remains.
This would be accomplished by letting K have embeddings in both CC and Qp. I'm not claiming at this point that using coercion is the most appropriate tool to express the relations in this scenario.
There is one benefit one gets from having maps recognized as coercions: A lot of derived structures can now be automatically get built with the appropriate maps between them via pushout constructions. If you just have some maps lying around, constructing the corresponding derived maps will be a lot of work.
That's my reason to really care about an expressive coercion system. My experience with magma, which tends to have a much more restricted notion of coercion, has taught me that building these maps can be a lot of silly work. It would be great if one could "borrow" the coercion system for that every now and again (this is one of the reasons why I think some context manager that can put in "temporary" coercions would be great: inside the context manager one would request the derived map, store it, and then return the coercion system to its original state)
I'm not so sure that the coercion system was designed with embeddings as an alternative to registered coercions: If you register a map only as an embedding (and not also as a coercion on the codomain) you can end up with the coercion system yielding non-transitive results:
class pA(Parent): pass
class pB(Parent): pass
class pC(Parent): pass
A=pA()
B=pB()
C=pC()
BtoA=Hom(B,A)(lambda x: A(x))
AtoC=Hom(A,C)(lambda x: C(x))
A.register_coercion(BtoA)
A.register_embedding(AtoC)
C.coerce_map_from(A) #finds the right map
A.coerce_map_from(B) #finds the right map
C.coerce_map_from(B) #returns none!
(other combinations of using register_coercion
and register_embedding
do lead to the appropriate discovery)
so, if we want to view the coercion framework as a digraph and valid coercions as paths in this digraph, then an arbitrary combination of register_embedding
and register_coercion
may lead to invalid manipulations of the graph (i.e., leading to a state where the system fails to provide consistent (transitive) results).
This wasn't such a problem before, but since we are now tying lifetime implications to how a coercion is registered, I think it now becomes apparent.
Replying to @nbruin:
{{{ A.register_coercion(BtoA) A.register_embedding(AtoC) C.coerce_map_from(A) #finds the right map A.coerce_map_from(B) #finds the right map C.coerce_map_from(B) #returns none! }}}
IF this is a bug then it should be dealt with on a new ticket. Note, however, that C
can not know about the embedding of A
into C
, and not even about the mere existence of A
. So, how could it possibly be aware of a coercion from B
to C
via A
? Hence, I am not sure if this qualifies as a bug or as a misuse.
This wasn't such a problem before, but since we are now tying lifetime implications to how a coercion is registered, I think it now becomes apparent.
I don't quite follow this argument. Anyway, I should now do some more project related work, namely #12630,
Replying to @simon-king-jena:
IF this is a bug then it should be dealt with on a new ticket.
Agreed.
Note, however, that
C
can not know about the embedding ofA
intoC
, and not even about the mere existence ofA
.
Are you suggesting that C.coerce_map_from(A)
already fails? If we follow the
digraph model for coercion, then whether there's a path from A to C is a
property of the graph, not something the vertices "know" about. Perhaps the
following more symmetrically formulated code (which does the same thing anyway)
is more convincing:
sage: G=get_coercion_model()
sage: G.discover_coercion(A,C)
(Generic morphism ..., None)
sage: G.discover_coercion(B,A) #currently unweakened with your patch!
(Generic morphism ..., None)
sage: sage: G.discover_coercion(B,C)
None
So, how could it possibly be aware of a coercion from
B
toC
viaA
? Hence, I am not sure if this qualifies as a bug or as a misuse.
Are you claiming that register_embedding
is not supposed to add edges to the
same graph as register_coercion
does? I don't see how that would lead to a
useful model.
This wasn't such a problem before, but since we are now tying lifetime implications to how a coercion is registered, I think it now becomes apparent.
I don't quite follow this argument.
As the documentation of register_embedding
shows, it was originally considered
to be a rather non-essential component; just some coercion map that gets
"blessed" as a particularly canonical one, but doesn't actually has a
very different effect (in fact, as we see, really a more limited effect), so
people would just have used register_coercion
or (even more flexible)
_coerce_map_from_
.
With less strong references, there is more reason to use register_embedding
:
it expresses that the domain should keep the codomain alive rather than the
other way around; the kind of thing that wouldn't be expressed usefully before
anyway.
Replying to @simon-king-jena:
Replying to @nbruin:
A.register_coercion(BtoA) A.register_embedding(AtoC) C.coerce_map_from(A) #finds the right map A.coerce_map_from(B) #finds the right map C.coerce_map_from(B) #returns none!
IF this is a bug then it should be dealt with on a new ticket.
This is now #15303.
Any info what exactly slowed down the tableaux classes?
The following quickly eats up memory:
(This is with 5.10.rc0)
Problem analysis
The quadratic field is created with a coerce embedding into
CLF
. At the same time, this coerce embedding is stored inCLF._coerce_from_hash
:The "coerce_from_hash" is a
MonoDict
, hence, has only a weak reference to the key (Q, in this case). However, there still is a strong reference from CLF to the coerce map phi. And phi has a strong reference to its domain, thus, to Q. Hence, the existence of CLF prevents garbage collection of Q.And there is a second chain of strong references from CLF to Q: From CLF to phi to the parent of phi (i.e., a homset) to the domain Q of this homset.
Suggested solution
We can not turn the reference from CLF to phi into a weak reference, because then even a strong reference to Q would not prevent phi from garbage collection. Hence, we need to break the above mentioned reference chains in two points. In the attached branch, maps generally keep a strong reference to the codomain (this is important in composite maps and actions), but those used in the coercion system (and only there!!) will only have a weak reference to the domain, and they set the cdef
._parent
attribute toNone
(hence, we also override.parent()
, so that it reconstructs the homset if the weak reference to the domain is still valid).To preserve the
domain()/codomain()
interface, I have removed the methoddomain()
and have replaced it by a cdef public attribute that will either hold a weak reference (which returns the domain when called, hence, the interface does not change) or aConstantFunction
(which should actually be faster to call than a method). Since accessing a cdef attribute is still faster, the cdef attribute_codomain
is kept (since this will always be a strong reference), but_domain
has been removed.This "weakening of references" is done for the coercions found by
discover_coerce_map_from()
stored into_coerce_from_hash
. So, this mainly happens for things done with_coerce_map_from_()
and with composite maps. Similarly for_convert_from_hash
.Weakening is not used on the maps that are explicitly registered by
.register_embedding()
and.register_coercion()
. This is in order to preserve the connectivity of the coercion graph. Theregister_*
methods are only used on selected maps, that are of particular importance for the backtrack search indiscover_coerce_map_from()
. These strong registrations do not propagate: Compositions of strongly registered coercions found bydiscover_coerce_map_from()
will be weakened.Since weakened maps should not be used outside of the coercion system, its string representation shows a warning to replace them by a copy. The attached branch implements copying of maps in some additional cases.
SchemeMorphism
can not inherit fromMorphism
, because of a bug with multiple inheritance of a Python class from Cython extension classes. But once this bug is fixed, we surely want to makeSchemeMorphism
inherit fromMorphism
. This transition is prepared here.Weakened maps should only be used in the coercion system: A weakened map can become invalid by garbage collection, and the coercion system has the job to remove a map from the coercion cache as soon as it becomes invalid.
Maps outside of the coercion system should be safe against invalidation. Hence, when we take a coerce map, then we should better create a non-weakened copy. The branch also provides copying (and pickling) for all kinds of maps and morphisms (hopefully no map/morphism class went unnoticed).
In any case, the commit messages should give a concise description of what has been done.
TODO in future tickets
.register_coercion()
that weakens the coercion map. It would hence have the same effect as returning a map by._coerce_map_from_()
, but of course._coerce_map_from()
could not easily be changed in an interactive session.Effects on the overall functioning of Sage
It is conceivable that some parts of Sage still suppose implicitly that stuff cached with
UniqueRepresentation
is permanently cached, even though the seemingly permanent cache was not more than a consequence of a memory leak in the coercion system. With the attached branch, garbage collection of parent structures will much more often become possible. Hence, code that relied on a fake-permanent cache would now need to create the same parent repeatedly.I (Simon) have tested how many additional parent creations occur with the attached branch when running
sage -t --all
. The findings are summarised in comment:107: The number of additional parent creations increased by not more than 1% for all but two parent classes (both related with tableaux). I also found that the time to run the tests did not significantly increase.Jean-Pierre has occasionally stated that some of his computations have been infeasible with the memory leak in the above example. I hope that his computations will now succeed.
CC: @simon-king-jena @nbruin @nthiery @anneschilling @zabrocki
Component: number fields
Keywords: QuadraticField
Author: Simon King, Travis Scrimshaw, Jean-Pierre Flori
Branch:
00b3e2f
Reviewer: Nils Bruin, Jean-Pierre Flori
Issue created by migration from https://trac.sagemath.org/ticket/14711