Weak references in the coercion graph

sagemath / sage

Main repository of SageMath

https://www.sagemath.org

Other

1.44k stars 480 forks source link

Weak references in the coercion graph #14711

Closed jpflori closed 10 years ago

jpflori commented 11 years ago

The following quickly eats up memory:

sage: for D in xrange(2,2**32):
....:     QuadraticField(-D);
....:

(This is with 5.10.rc0)

Problem analysis

The quadratic field is created with a coerce embedding into CLF. At the same time, this coerce embedding is stored in CLF._coerce_from_hash:

sage: phi = CLF.coerce_map_from(Q)
sage: phi is Q.coerce_embedding()
True
sage: Q in CLF._introspect_coerce()['_coerce_from_hash']
True

The "coerce_from_hash" is a MonoDict, hence, has only a weak reference to the key (Q, in this case). However, there still is a strong reference from CLF to the coerce map phi. And phi has a strong reference to its domain, thus, to Q. Hence, the existence of CLF prevents garbage collection of Q.

And there is a second chain of strong references from CLF to Q: From CLF to phi to the parent of phi (i.e., a homset) to the domain Q of this homset.

Suggested solution

We can not turn the reference from CLF to phi into a weak reference, because then even a strong reference to Q would not prevent phi from garbage collection. Hence, we need to break the above mentioned reference chains in two points. In the attached branch, maps generally keep a strong reference to the codomain (this is important in composite maps and actions), but those used in the coercion system (and only there!!) will only have a weak reference to the domain, and they set the cdef ._parent attribute to None (hence, we also override .parent(), so that it reconstructs the homset if the weak reference to the domain is still valid).

To preserve the domain()/codomain() interface, I have removed the method domain() and have replaced it by a cdef public attribute that will either hold a weak reference (which returns the domain when called, hence, the interface does not change) or a ConstantFunction (which should actually be faster to call than a method). Since accessing a cdef attribute is still faster, the cdef attribute _codomain is kept (since this will always be a strong reference), but _domain has been removed.

This "weakening of references" is done for the coercions found by discover_coerce_map_from() stored into _coerce_from_hash. So, this mainly happens for things done with _coerce_map_from_() and with composite maps. Similarly for _convert_from_hash.

Weakening is not used on the maps that are explicitly registered by .register_embedding() and .register_coercion(). This is in order to preserve the connectivity of the coercion graph. The register_* methods are only used on selected maps, that are of particular importance for the backtrack search in discover_coerce_map_from(). These strong registrations do not propagate: Compositions of strongly registered coercions found by discover_coerce_map_from() will be weakened.

Since weakened maps should not be used outside of the coercion system, its string representation shows a warning to replace them by a copy. The attached branch implements copying of maps in some additional cases.

SchemeMorphism can not inherit from Morphism, because of a bug with multiple inheritance of a Python class from Cython extension classes. But once this bug is fixed, we surely want to make SchemeMorphism inherit from Morphism. This transition is prepared here.

Weakened maps should only be used in the coercion system: A weakened map can become invalid by garbage collection, and the coercion system has the job to remove a map from the coercion cache as soon as it becomes invalid.

Maps outside of the coercion system should be safe against invalidation. Hence, when we take a coerce map, then we should better create a non-weakened copy. The branch also provides copying (and pickling) for all kinds of maps and morphisms (hopefully no map/morphism class went unnoticed).

In any case, the commit messages should give a concise description of what has been done.

TODO in future tickets

Provide a documentation of the use of weak references in coercion, and of different ways of registering coercions, with their different impacts on garbage collecion.
Provide a version of .register_coercion() that weakens the coercion map. It would hence have the same effect as returning a map by ._coerce_map_from_(), but of course ._coerce_map_from() could not easily be changed in an interactive session.

Effects on the overall functioning of Sage

It is conceivable that some parts of Sage still suppose implicitly that stuff cached with UniqueRepresentation is permanently cached, even though the seemingly permanent cache was not more than a consequence of a memory leak in the coercion system. With the attached branch, garbage collection of parent structures will much more often become possible. Hence, code that relied on a fake-permanent cache would now need to create the same parent repeatedly.

I (Simon) have tested how many additional parent creations occur with the attached branch when running sage -t --all. The findings are summarised in comment:107: The number of additional parent creations increased by not more than 1% for all but two parent classes (both related with tableaux). I also found that the time to run the tests did not significantly increase.

Jean-Pierre has occasionally stated that some of his computations have been infeasible with the memory leak in the above example. I hope that his computations will now succeed.

CC: @simon-king-jena @nbruin @nthiery @anneschilling @zabrocki

Component: number fields

Keywords: QuadraticField

Author: Simon King, Travis Scrimshaw, Jean-Pierre Flori

Branch: 00b3e2f

Reviewer: Nils Bruin, Jean-Pierre Flori

Issue created by migration from https://trac.sagemath.org/ticket/14711

simon-king-jena commented 11 years ago

Commit: 05fb569

simon-king-jena commented 11 years ago

New commits:

`[changeset:05fb569]`	`Change SchemeMorphism back (to cope with a Cython bug), copying the new code from sage.categories.map.Map`
`[changeset:8fd09d5]`	`Copying of PolynomialBaseringInjection and FormalCompositeMap`
`[changeset:be37145]`	`Let SchemeMorphism inherit from Morphism, not from Element`
`[changeset:0f38a2c]`	`Keep strong reference to codomain of weakened coerce maps Keep strong reference to domains of registered coercions`
`[changeset:a53261d]`	`Keep a strong reference to the codomain of PrecomposedAction`
`[changeset:1ff6f3f]`	`Add generic copy of maps. Fix copy of elements. Replace _(co)domain everywhere`
`[changeset:61d818c]`	`Replace Map.(co)domain by constant functions, remove ._(co)domain`
`[changeset:ebe82df]`	`Use a proper WeakValueDictionary for number fields`
`[changeset:4685c73]`	`convert_map_from() should only store weak references Similar to coerce_map_from, the detected morphism should be stored only in a weak dictionary, not in a list.`

simon-king-jena commented 11 years ago

comment:99

Replying to @nbruin:

I agree that there is a place for such strong connections, but I have severe reservations about declaring it [.register_coercion()] is the only way or even the default way to inform the system about coercions.

Well, I have mentioned ._coerce_map_from_(...) in several previous posts, and if you look at my thematic tutorial on categories and coercion, you'll find that I consider this the default. And it only yields weak caching.

I have severe reservations about declaring that this code "will never be memory efficient" in sage.

I think that we want some particularly important coercions to be tied to the lifetime of the codomain, and thus we use .register_coercion(), and we want other coercions to be tied to the minimum of the lifetimes of domain and codomain, and thus we use ._coerce_map_from_(). I don't think we have a problem here.

Consider:
Qx.<x>=QQ[]
K.<a>=NumberField(x^4-2)
L.<b>=NumberField(x^2-2,embedding=a^2)
This fits perfectly in the "unchanging universe" model. Also note that the coercion system does not need to let L keep K alive, since the construction parameters, which get kept alive for the life of L by CachedRepresentation or something analogous, refer to K already.

It isn't CachedRepresentation, but this doesn't matter.

Now consider
M.<b>=NumberField(x^2-2)
In the "unchanging universe" (and in sage as well) we have that M is distinct from L. However, I think it's unrealistic to expect that all "embeddings" etc. can be specified at construction time.

Again, nobody has claimed that everything needs to be declared at construction time. There are some particularly important coercions registered at construction time, namely the coerce embedding (if it exists then it is unique) and those installed by .register_coercion(). Everything else is dynamical, based on _coerce_map_from_().

In my description from comment:84, note that the digraph is not totally static. It has static parts (corresponding to coerce embeddings and coercions fixed by .register_coercion()) and dynamic shortcuts (corresponding to _coerce_map_from_).

So I think, even though it's not possible currently in sage, that one should allow for
m1=Hom(M,K)([a^2])
m2=Hom(M,K)([-a^2])
M.register_embedding(m1)

I don't know if this is reasonable, but at least it is against what people originally wanted with the coerce embedding. If you declare the coerce embedding phi from a number field K to, say, CC, then you consider K as a subfield of CC. If you provide another number field L, which is isomorphic to K, with a different embedding psi into CC, then adding an element of K to an element of L is done by embedding both into CC and then adding complex numbers.

Since we think of K and L as different subfields of CC and not as abstract fields, we must consider K and L as different objects, and so the different embedding must play a role in the cache key for K and L. This is why they have to be provided at construction time.

It would be a totally different way of thinking if you tried to do the same with CC.register_coercion(phi/psi) or with CC._coerce_map_from_(K/L). Namely, in both cases, you would not be able to add elements of K and L, because neither K nor L would know about the embedding. And in fact you would consider K and L as abstract fields, and you would in fact want K is L (at least if you fancy unique parents, which I do...). And then the axioms for coercion would strike: There can be at most one coercion from K (i.e., L) to CC. Hence, you could not simultaneously declare different embeddings of K into CC as coercions.

Since it pretty much seems to me that number theorists want to comfortable compute with different isomorphic subfields of CC, it would thus simply not feasible to restrict oneself to .register_coercion and _coerce_map_from_: One needs coerce embeddings, and one needs that they are part of the defining data of a number field.

Note that the choice of m1 or m2 here leads to different relations between M and K and hence different universes. In other words, our concept of "globally unique" is not powerful enough to capture the full identity of objects, which would include the coercion relations with objects that haven't been "discovered" yet.

I would state it differently. In order to define K (a subfield of CC), there is no way around providing the embedding during creation. "Discovering" a coercion relation seems the wrong approach here.

And speaking about memory: The embedding of K into CC is stored as an attribute of K, not of CC. Hence, K keeps CC alive, but CC does not prevent K from garbage collection. So, I really don't understand where you see a problem.

In practice, we can usually work around that by for instance changing the names of generators and hence create artificially differently labelled objects but that's already not a possibility for creating different copies of ZZ^n, since there are no generator names to choose there.

Well, if one has obvious distinguishing data, such as an embedding, then there is nothing artificial when using them.

I think one has to accept the reality here: what we have is a collection of objects whose relations do change in time.

No. I don't see anything dynamic in your "embedded numberfield" examples. A subfield is a subfield is a subfield.

That's not the only thing coercion does. It may also find "common covering structures", which may lead to construction of new parents. Those definitely don't deserve to get nailed into memory. Yet, the code that creates these parents will look (to the coercion system) as a "user", so it would be using these strong-referencing coercion registration routines.

What you seem to mention here is the pushout construction. It mainly relies on "construction functors". I don't even know if it takes the coerce embeddings into account at all.

Anyway, the new parents created by pushouts indeed play the same role as parents created by the user. Let's try to be more concrete. Let P and Q be parents, you want to add an element p of P to an element q of Q, and the pushout construction finds a parent R such that both P and Q coerce into R, allowing to perform the addition in R, resulting in r=R(p)+R(q).

Now, it could be that R.register_coercion(P) and R.register_coercion(Q) are both executed in R.__init__ (but see the remark below). In the current code (also in my branch), this would imply a strong reference chain from R to both P and Q. Hence, even if you did del p,q,P,Q, P and Q could not be garbage collected.

But I don't think we should see this problematic, for several reasons:

Pushout constructions don't arise particularly often. Normally, either P coerces into Q or Q coerces into P, or both embed into the same parent anyway, and I have mentioned above: With a coerce embedding, the existance of R would not prevent P and Q from garbage collection (plus, it has nothing to do with pushout anyway...)
Is register_coercion really used so often? I think _coerce_map_from_ is more commonly used, and then the existence of R would not prevent P and Q from garbage collection.

I think it may well be a feature, not a bug, that one at some point can just be left with the shortcuts and that the intermediates have fallen out.

How would you guarantee that you kept in mind enough shortcuts to not change connectivity by throwing away intermediates?

The natural way of staying close to "discovering a permanent universe" is by never throwing away anything that has been discovered, and I think we agree that's a "memory leak".

No, we disagree.

It is a memory leak if a connected component (not taking into account shortcuts) of the coercion graph can not be garbage collected, even though there is no external strong reference (which may be a coerce embedding) to any vertex of the connected component.

Note that this was exactly the problem with the example from the ticket description! As I have pointed out in comment:18, it was not the case that the problem lay in _coerce_from_list_, because this was empty. In particular, it was not the case that .register_coercion() was to blame.

Instead, the memory leak came from short-cuts, i.e., from the stuff stored in _coerce_from_hash, which can also be seen in attachment: chain.png.

Hence, the quadratic field and CC did belong to different connected components of the coercion graph, but the shortcut kept Q alive.

And there surely is a means available to add a coercion that doesn't tie the lifespan of two parents too closely: Implement P._coerce_map_from_(Q), which can return a map or simply "True" (in the latter case, conversion is used to coerce Q into P). The result is cached in P._coerce_from_hash, but not in P._coerce_from_list.

You mean: implement P._install_coerce_map_from_(m), which does: _coerce_from_hash[Domain(m)]=m. I think it is quite important to be able to manipulate the coercion graph without having to modify library code.

This might be a good idea. So, we would have .register_coercion() for permanent pathways, and _install_coerce_map_from() (with a similar semantics, i.e., you can either just provide the parent or a whole morphism) for impermanent shortcuts.

simon-king-jena commented 11 years ago

comment:100

Let me try to summarise what is (or may be) left to do:

Add a section explaining the current weak coercion model, to facilitate maintanance,
I think I forgot to add some doc strings when I changed SchemeMorphism,
Add _install_coerce_map_from().
Perhaps: Let the string representation of a weakened map consist of a warning to not use this map outside of the coercion framework.
Perhaps: Re-introduce a cdef public attribute _codomain, since this would allow faster access than calling .codomain(), and since the codomain will be strongly referenced anyway.

Anything I forgot?

simon-king-jena commented 11 years ago

Changed work issues from Fix elliptic curves code to none

simon-king-jena commented 11 years ago

comment:102

And I think I should do a further test: I will modify Parent.__init__ so that it prints the type of self to a log file, and so I'll see how many parents are created with and without the patch. If we see a sudden change in the statistics for some types, then it might point us to code that implicitly relies on a permanent cache.

nbruin commented 11 years ago

comment:103

Replying to @simon-king-jena:

Good, I think we at least are in sufficient agreement for the practical implications of what we need.

Let me try to summarise what is (or may be) left to do:

Add a section explaining the current weak coercion model, to facilitate maintenance,

Add _install_coerce_map_from().

To clarify this point (and it might be helpful to put something along these lines in the documentation), it seems to me there would be 4 ways to put coercions in place:

A programmatic way, by supplying code in _coerce_map_from_. Since it's programmatic, it seems it can be rediscovered easily when parents get garbage collected and recreated, so it seems appropriate maps stemming from here do not lead to lifetime implications.
A way to put a coercion in that ensures that the codomain keeps the domain alive (.register_coercion)
A way to put a coercion in that ensures that the domain keeps the codomain alive (register_embedding does that, but only can only accommodate one per domain)
A way to put a coercion in that does not imply any life support between domain and codomain. Someone who starts out should probably not use this, because garbage collection can lead to surprising results. It may be required to avoid memory problems. I think the fourth point is desirable because the alternative, programmatic solutions via _coerce_map_from_, feel much more heavy-weight (subclassing a whole parent just to extend _coerce_map_from_ may be appropriate for someone who is concerned with developing sage, but seems inappropriate to me for someone who is thinking about using sage to do a complicated computation.

Perhaps: Let the string representation of a weakened map consist of a warning to not use this map outside of the coercion framework.

I think yes: Due to cyclic references, parents will usually survive until the next GC, which may be quite a while after the last reference is lost. So place where the map becomes liable to turn defunct may be quite distant from the place where the map if found to be defunct. People deserve a reminder about that.

simon-king-jena commented 11 years ago

comment:104

Replying to @nbruin:

Replying to @simon-king-jena:

Let me try to summarise what is (or may be) left to do:

Add a section explaining the current weak coercion model, to facilitate maintenance,

Add _install_coerce_map_from().

To clarify this point (and it might be helpful to put something along these lines in the documentation), ...

But where?

I think the fourth point is desirable because the alternative, programmatic solutions via _coerce_map_from_, feel much more heavy-weight (subclassing a whole parent just to extend _coerce_map_from_ may be appropriate for someone who is concerned with developing sage, but seems inappropriate to me for someone who is thinking about using sage to do a complicated computation.

OK. But then, this method should be visible, hence, not starting with an underscore.

Perhaps: Let the string representation of a weakened map consist of a warning to not use this map outside of the coercion framework.

I think yes: Due to cyclic references, parents will usually survive until the next GC, which may be quite a while after the last reference is lost. So place where the map becomes liable to turn defunct may be quite distant from the place where the map if found to be defunct. People deserve a reminder about that.

OK. Perhaps: If the map is weak but the domain reference is still available, then show the map as

"""WARNING: This %s map from %s to %s
may become defunct after the next garbage collection.
For usage outside of the coercion system, try to create a copy,
or apply the method `_make_strong_references()`"""%(self._repr_type(), self.domain(),self.codomain()

and if the domain is unavailable, then show the map as

"Defunct %s map"%self._repr_type()"

simon-king-jena commented 11 years ago

comment:105

Replying to @simon-king-jena:

To clarify this point (and it might be helpful to put something along these lines in the documentation), ...

But where?

The thematic tutorial coercion_and_categories would be a natural place, but would it be enough? Granted, the doc of register_embedding, register_coercion and install_coercion should refer to each other and elaborate on the different use cases and should also mention _coerce_map_from_.

simon-king-jena commented 11 years ago

comment:106

With vanilla public/sage-git/master, I find that sage -t --all results in 1083022 calls to Parent.__init__, while with the branch from here it is called 1129534 times.

Hence, there is an increase in the number of parents being created. No surprise, since this ticket is about making parents garbage collectable in some situations.

Nevertheless, it might make sense to see whether some types of parents show a particularly strong increase, so that we can then decide whether we should have some stronger cache for these types.

simon-king-jena commented 11 years ago

comment:107

I studied the differences in parent creation during sage -t --all in more detail.

Absolute differences

Here are the 10 classes that have the most additional creations in the ticket branch compared with the public/sage-git/master branch (the list shows the absolute number of additional creations and the name of the class):

(19873, 'sage.rings.homset.RingHomset_generic')
(16597, 'sage.categories.homset.Homset')
(2839, 'sage.rings.finite_rings.integer_mod_ring.IntegerModRing_generic')
(2270, 'sage.rings.homset.RingHomset_quo_ring')
(2137, 'sage.rings.finite_rings.homset.FiniteFieldHomset')
(1960, 'sage.rings.number_field.morphism.NumberFieldHomset')
(1279, 'sage.sets.positive_integers.PositiveIntegers')
(851, 'sage.combinat.tableau.Tableaux_all')
(831, 'sage.modules.free_module_homspace.FreeModuleHomspace')
(481, 'sage.rings.polynomial.polynomial_ring.PolynomialRing_dense_mod_p')

Here are the "bottom 10" classes. As you can see, there are parents for which we have considerably less creations with the ticket than without, which comes as a surprise to me:

(-57, 'sage.sets.family.LazyFamily')
(-134, 'sage.combinat.words.words.Words_all')
(-134, 'sage.combinat.permutation.StandardPermutations_all')
(-136, 'sage.combinat.permutation.Permutations_set')
(-142, 'sage.combinat.subset.Subsets_sk')
(-158, 'sage.sets.non_negative_integers.NonNegativeIntegers')
(-166, 'sage.combinat.cartesian_product.CartesianProduct_iters')
(-170, 'sage.combinat.integer_list.IntegerListsLex')
(-253, 'sage.combinat.skew_partition.SkewPartitions_rowlengths')
(-3838, 'sage.sets.set.Set_object_enumerated')

Relative differences

Here are the 10 classes that have the biggest relative increase in number of creations (ticket compared with master):

+14.67% sage.combinat.tableau.Tableaux_all
+3.79% sage.combinat.skew_tableau.SemistandardSkewTableaux_all
+1.00% sage.combinat.skew_tableau.SkewTableaux
+1.00% sage.combinat.partition_tuple.PartitionTuples_all
+0.91% sage.rings.homset.RingHomset_quo_ring
+0.75% sage.categories.examples.sets_cat.PrimeNumbers_Facade
+0.67% sage.combinat.crystals.affine.AffineCrystalFromClassicalAndPromotion
+0.66% sage.groups.matrix_gps.homset.MatrixGroupHomset
+0.64% sage.combinat.partition_tuple.PartitionTuples_level
+0.60% sage.structure.list_clone_demo.IncreasingIntArrays

Here are the 10 classes with the biggest relative decrease in the number of creations:

-0.25% sage.combinat.crystals.kirillov_reshetikhin.KR_type_A2_with_category
-0.25% sage.combinat.crystals.kirillov_reshetikhin.KR_type_A2
-0.25% sage.categories.examples.finite_monoids.IntegerModMonoid
-0.33% sage.sets.integer_range.IntegerRangeEmpty
-0.33% sage.combinat.affine_permutation.AffinePermutationGroupTypeG
-0.33% sage.combinat.affine_permutation.AffinePermutationGroupTypeC
-0.38% sage.combinat.crystals.infinity_crystals.InfinityCrystalOfTableauxTypeD
-0.39% sage.combinat.permutation.CyclicPermutations
-0.40% sage.combinat.vector_partition.VectorPartitions
-0.54% sage.combinat.composition_tableau.CompositionTableaux_all

Conclusion

Even though the absolute differences in the creation of various kinds of homsets seem to be dramatic, the relative differences suggest that there is no serious problem here. There are only four classes that show an increase of at least 1%. Three of them are related with tableaux, that's why I add Nicolas to the ticket: Perhaps we want to change the cache for tableaux?

simon-king-jena commented 11 years ago

comment:108

Concerning a new method install_coercion: Wouldn't it be easier to provide register_coercion with an optional argument permanent=True, so that using the method with permanent=False would do what you suggested for install_coercion? I guess having two methods install_coercion and register_coercion could confuse the user.

simon-king-jena commented 11 years ago

comment:109

Concerning documentation: I just found that the underscore methods of sage.structure.parent.Parent are documented in the reference manual. Hence it should be no problem to add documentation of _coerce_map_from_ directly in-place.

simon-king-jena commented 11 years ago

comment:110

And I just notice that the documentation of the module sage.structure.parent starts with a "simple example of registering coercions", which I find rather obscure and which does things in a way that we would do differently today. E.g., it does not initialise the category, but overrides the method category(). And it calls self._populate_coercion_lists_(), which I have never seen in code created in the past few years.

Hence, I'll update this example.

simon-king-jena commented 11 years ago

comment:111

Replying to @simon-king-jena:

And I just notice that the documentation of the module sage.structure.parent starts with a "simple example of registering coercions", which I find rather obscure and which does things in a way that we would do differently today. E.g., it does not initialise the category, but overrides the method category(). And it calls self._populate_coercion_lists_(), which I have never seen in code created in the past few years.

Hence, I'll update this example.

Hm. I am undecided.

Perhaps it would be better to focus here on fixing the memory leak (which, I think succeeded), only documenting with examples that it has worked.

Hence, on this ticket, I would just

provide missing docs for SchemeMorphism
Let the string representation of a weakened map consist of a warning to not use this map outside of the coercion framework.
Perhaps: Re-introduce a cdef public attribute _codomain, since this would allow faster access than calling .codomain(), and since the codomain will be strongly referenced anyway.

Everything else should perhaps better be done on a follow-up ticket:

Add documentation explaining the current weak coercion model, to facilitate maintanance,
Add permanent=True option to register_coercion().

What do you think?

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 11 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

[changeset:452d216] Add docs to SchemeMorphism

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 11 years ago

Changed commit from 05fb569 to 452d216

simon-king-jena commented 11 years ago

comment:113

I think there is a further technical thing I could do in the next commit: I have implemented __copy__ for some types of morphisms. But there already exist methods called _extra_slots() and _update_slots(), and I think in order to implement copying one should update these. This might (on a different ticket) also help to provide a default pickling for maps.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 11 years ago

Changed commit from 452d216 to 5168cfd

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 11 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

[changeset:5168cfd] Generic copy method for maps, using _update_slots Use a cdef _codomain, since the codomain is strongly refed anyway Add doctests

simon-king-jena commented 11 years ago

comment:115

Replying to @simon-king-jena:

Perhaps it would be better to focus here on fixing the memory leak (which, I think succeeded), only documenting with examples that it has worked.

Hence, on this ticket, I would just

provide missing docs for SchemeMorphism

Done in the current commit.

Perhaps: Re-introduce a cdef public attribute _codomain, since this would allow faster access than calling .codomain(), and since the codomain will be strongly referenced anyway.

Done in the current commit.

In addition, I changed the new generic __copy__ method of maps so that it uses _update_slots and _extra_slots. This complies with how currently pickling is implemented by default. For several types of maps, I implemented copying accordingly.

Let the string representation of a weakened map consist of a warning to not use this map outside of the coercion framework.

Still todo.

Everything else should perhaps better be done on a follow-up ticket:

Add documentation explaining the current weak coercion model, to facilitate maintanance,

Add permanent=True option to register_coercion().

Do you agree that this shall be on a different ticket?

nthiery commented 11 years ago

comment:116

Replying to @simon-king-jena:

Even though the absolute differences in the creation of various kinds of homsets seem to be dramatic, the relative differences suggest that there is no serious problem here. There are only four classes that show an increase of at least 1%. Three of them are related with tableaux, that's why I add Nicolas to the ticket: Perhaps we want to change the cache for tableaux?

Could you post a quick summary (say in the ticket description/title) of what the current patch does?

Thanks!

simon-king-jena commented 11 years ago

Description changed:

--- 
+++ 
@@ -6,3 +6,105 @@
 ....:

(This is with 5.10.rc0) + +Problem analysis + +The quadratic field is created with a coerce embedding into CLF. At the same +time, this coerce embedding is stored in CLF._coerce_from_hash: + + +sage: phi = CLF.coerce_map_from(Q) +sage: phi is Q.coerce_embedding() +True +sage: Q in CLF._introspect_coerce()['_coerce_from_hash'] +True + +The "coerce_from_hash" is a MonoDict, hence, has only a weak reference to the key +(Q, in this case). However, there still is a strong reference from +CLF to the coerce map phi. And phi has a strong reference to its +domain, thus, to Q. Hence, the existence of CLF prevents garbage collection of +Q. + +And there is a second chain of strong references from CLF to Q: From CLF to +phi to the parent of phi (i.e., a homset) to the domain Q of this homset. + +Suggested solution + +We can not turn the reference from CLF to phi into a weak reference, because +then even a strong reference to Q would not prevent phi from garbage +collection. Hence, we need to break the above mentioned reference chains in +two points. In the attached branch, maps generally keep a strong reference to +the codomain (this is important in composite maps and actions), but those used +in the coercion system (and only there!!) will only have a weak +reference to the domain, and they set the cdef ._parent attribute to None +(hence, we also override .parent(), so that it reconstructs the homset if +the weak reference to the domain is still valid). + +To preserve the domain()/codomain() interface, I have removed the method +domain() and have replaced it by a cdef public attribute that will either +hold a weak reference (which returns the domain when called, hence, the +interface does not change) or a ConstantFunction (which should actually be +faster to call than a method). Since accessing a cdef attribute is still +faster, the cdef attribute _codomain is kept (since this will always be a +strong reference), but _domain has been removed. + +This "weakening of references" is done for the coercions found by +discover_coerce_map_from() stored into _coerce_from_hash. So, this mainly +happens for things done with _coerce_map_from_() and with composite +maps. Similarly for _convert_from_hash. + +Weakening is not used on the maps that are explicitly registered by +.register_embedding() and .register_coercion(). This is in order to +preserve the connectivity of the coercion graph. The register_* methods +are only used on selected maps, that are of particular importance for the +backtrack search in discover_coerce_map_from(). These strong +registrations do not propagate: Compositions of strongly registered +coercions found by discover_coerce_map_from() will be weakened. + +Since weakened maps should not be used outside of the coercion system, its +string representation shows a warning to replace them by a copy. The attached +branch implements copying of maps in some additional cases. + +SchemeMorphism can not inherit from Morphism, because of a bug with +multiple inheritance of a Python class from Cython extension classes. But once +this bug is fixed, we surely want to make SchemeMorphism inherit from +Morphism. This transition is prepared here. + +In any case, the commit messages should give a concise description of what has +been done. + +Still TODO + +Let the string representation of weakened maps point the user to the need of +creating a copy. + +TODO in future tickets + +- Provide a documentation of the use of weak references in coercion, and of

different ways of registering coercions, with their different impacts on
garbage collecion. +- Provide a version of .register_coercion() that weakens the coercion
map. It would hence have the same effect as returning a map by
._coerce_map_from_(), but of course ._coerce_map_from() could not easily
be changed in an interactive session. +- provide copying for all kinds of maps.
+Effects on the overall functioning of Sage
+It is conceivable that some parts of Sage still suppose implicitly that stuff +cached with UniqueRepresentation is permanently cached, even though the +seemingly permanent cache was not more than a consequence of a memory leak in +the coercion system. With the attached branch, garbage collection of parent +structures will much more often become possible. Hence, code that relied on a +fake-permanent cache would now need to create the same parent repeatedly.
+I (Simon) have tested how many additional parent creations occur with the +attached branch when running sage -t --all. The findings are summarised in +comment:107: The number of additional parent creations increased by not more +than 1% for all but two parent classes (both related with tableaux). I also +found that the time to run the tests did not significantly increase.
+Jean-Pierre has occasionally stated that some of his computations have been +infeasible with the memory leak in the above example. I hope that his +computations will now succeed.

simon-king-jena commented 11 years ago

comment:117

Replying to @nthiery:

Could you post a quick summary (say in the ticket description/title) of what the current patch does?

Done. OK, the summary is actually not quick. Sorry.

simon-king-jena commented 11 years ago

Work Issues: String repr. of weakened maps; copying/pickling of maps

simon-king-jena commented 11 years ago

comment:118

I think changing the string representation of weakened maps should be done here. And then, in a couple of tests, one needs to copy the map in order to get the test pass.

Therefore, I suggest to implement copying for all maps here as well, not on a different ticket. After all, it is not difficult: One just looks at the list of cdef attributes, and implements _extra_slots and _update_slots taking exactly these attributes into account. The only difficulty is to really catch all kinds of maps.

Note that in most cases phi == loads(dumps(phi)) would return False, but this is since comparison of maps is often not implemented---and this is what I will certainly not attempt to implement here.

simon-king-jena commented 11 years ago

Description changed:

--- 
+++ 
@@ -75,8 +75,9 @@

 **__Still TODO__**

-Let the string representation of weakened maps point the user to the need of
+- Let the string representation of weakened maps point the user to the need of
 creating a copy.
+- Provide copying for *all* kinds of maps.

 **__TODO in future tickets__**

@@ -87,7 +88,6 @@
   map. It would hence have the same effect as returning a map by
   `._coerce_map_from_()`, but of course `._coerce_map_from()` could not easily
   be changed in an interactive session.
-- provide copying for *all* kinds of maps.

 **__Effects on the overall functioning of Sage__**

simon-king-jena commented 11 years ago

comment:119

I wonder: Would it make sense to implement a generic comparison for maps, based on the dictionary returned by self._extra_slots({})? Namely, these data are used for pickling and copying of maps, and thus it seems reasonable to me that two maps are equal if and only if the pickling data coincide.

What do you think? Worth trying? Better be done on a different ticket?

nbruin commented 11 years ago

comment:120

Replying to @simon-king-jena:

I wonder: Would it make sense to implement a generic comparison for maps, based on the dictionary returned by self._extra_slots({})? Namely, these data are used for pickling and copying of maps, and thus it seems reasonable to me that two maps are equal if and only if the pickling data coincide.

And the "weakened" copies of maps (with in the near future an easily distinguished string rep) would be equal to their counter parts? That may well be a desirable choice, but by no means uncontroversial. I don't think in coercion we ever depend on equality testing on maps do we? I think it's better done on a separate ticket.

Another suggestion about how to get the the strong references in for register_coercion. If the maps put in by register_coercion are used afterwards many times to derive other cached coercion maps from, it would perhaps be preferable to have them in a form that is readily usable for that, i.e., as "weakened" maps (it means the map can go straight into map compositions etc.). Otherwise we may well end up making copies repeatedly in the coercion framework.

We could get the strong connections in by, for instance, referencing the domains explicitly, say on an attribute _domains_with_registered_coercions_to_here. The coercion map itself could simply live in the normal cache, as a weakened map.

The same applies to _register_embedding, although perhaps the coercion discovery treats the store differently, and storing a strong reference to even a weakened map implies a strong reference to the codomain.

Other question, does _parent=None imply a weakened map? (I guess isinstance(_domain,weakref.ref) definintely does) or are there other reasons for the parent to be unset?

simon-king-jena commented 11 years ago

comment:121

Replying to @nbruin:

And the "weakened" copies of maps (with in the near future an easily distinguished string rep) would be equal to their counter parts?

Of course. Equality does (and should) not depend on weak references, I think

I don't think in coercion we ever depend on equality testing on maps do we?

We don't. Otherwise, people would have had comparison implemented already.

I think it's better done on a separate ticket.

Agreed.

Concerning "near future": I am already testing a new commit that provides

special string representation for weakened maps
copying for all maps and morphisms (at least those that I was able to find in the sources). It was very dull work: Look up the cdef slots, implement _extra_slots and _update_slots, add a test...
Use the copy functionality on most tests that expose coerce maps. Hence, I replace
```
sage: R.coerce_map_from(P)
...
```
by
```
sage: copy(R.coerce_map_from(P))
```
and also add a link to this ticket. Not everywhere, but in most places.

Another suggestion about how to get the the strong references in for register_coercion. If the maps put in by register_coercion are used afterwards many times to derive other cached coercion maps from, it would perhaps be preferable to have them in a form that is readily usable for that, i.e., as "weakened" maps (it means the map can go straight into map compositions etc.). Otherwise we may well end up making copies repeatedly in the coercion framework.

Why do you think they would/should be copied inside of the coercion model? I thought we already had agreed that copying is needed when exposing a coerce map to the user (this is why I suggested that the string repr contains a warning!). But certainly not internally. This would be by far too slow.

Suppose you have two non-weakened maps phi and psi, and then do chi = phi*psi (a composite map). When you then weaken chi, neither phi nor psi would be changed. So, why copying?

We could get the strong connections in by, for instance, referencing the domains explicitly, say on an attribute _domains_with_registered_coercions_to_here.

This is already done in my current branch, and it is called _registered_domains (simply a list).

The coercion map itself could simply live in the normal cache, as a weakened map.

No, because we need some container that only stores those maps that are considered in the backtracking algorithm. So, the current separate list _coerce_from_list must be preserved.

The same applies to _register_embedding, although perhaps the coercion discovery treats the store differently, and storing a strong reference to even a weakened map implies a strong reference to the codomain.

All weakened maps still have a strong reference to the codomain. Only the reference to the domain will be weak. And register_embedding is still simply assigning the embedding to an attribute _embedding of the domain.

Other question, does _parent=None imply a weakened map? (I guess isinstance(_domain,weakref.ref) definintely does) or are there other reasons for the parent to be unset?

I guess one could test self._parent is None, rather than typetest stuff. This should actually be faster.

Concerning "reasons": The parent is unset, because otherwise we have a chain of references from the map to the domain, namely via the parent (i.e., the homset). Hence, having a weak reference from the map to the domain would be futile if there is a reference from the map to its parent. Note that alternatively one could have a weak reference from the homset to the domain. But I think we have agreed above that we don't want this as a default.

nbruin commented 11 years ago

comment:122

Replying to @simon-king-jena:

Suppose you have two non-weakened maps phi and psi, and then do chi = phi*psi (a composite map). When you then weaken chi, neither phi nor psi would be changed. So, why copying?

Well, if you'd do that then chi wouldn't really be a weakened map. Assuming maps act on the left, i.e. chi.domain()=psi.domain(), the resulting structure would have a strong reference to its domain, via psi.domain().

The converse, making a strong composite out of weakened maps, shouldn't be a problem at all (except that if people start looking at the components, they'd be able to get their hands on weakened maps).

I think the coercion system makes a lot of map compositions, and they usually would have to be weakened. That's why it might be worth ensuring that the maps stored internally are already weakened.

This is already done in my current branch, and it is called _registered_domains (simply a list).

And the maps inserted into _coerce_from_hash are weakened or not? Conceptually it would be a little easier if all of them are. Perhaps enforcing such a rule (or at least change most code to comply) is too costly, though.

No, because we need some container that only stores those maps that are considered in the backtracking algorithm. So, the current separate list _coerce_from_list must be preserved.

Ah. I didn't realize that. You say that _coerce_from_hash is not considered by backtracking. Indeed, that changes things. In that case, _coerce_from_list could be a "weak set" (e.g., a WeakValueDictionary with trivial keys), since the maps are kept alive by their entries in _coerce_from_hash, where the key is kept alive by the _registered_domains. This would get rid of the garbage collection problem, if we ever want to have maps that help coercion discovery but don't have lifetime implications.

(this should go on a different ticket)

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 11 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

[changeset:364b985] Add warning to string repr of weakened maps. Implement copying for *all* kinds of maps.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 11 years ago

Changed commit from 5168cfd to 364b985

simon-king-jena commented 11 years ago

comment:124

Replying to @nbruin:

Replying to @simon-king-jena:

Suppose you have two non-weakened maps phi and psi, and then do chi = phi*psi (a composite map). When you then weaken chi, neither phi nor psi would be changed. So, why copying?

Well, if you'd do that then chi wouldn't really be a weakened map. Assuming maps act on the left, i.e. chi.domain()=psi.domain(), the resulting structure would have a strong reference to its domain, via psi.domain().

Correct. Would this be a problem? Let's see:

Let psi: A -> B and phi: B -> C be coerce maps, hence, chi=phi*psi: A -> C is a coerce map as well. Assume that we have done B.register_coercion(psi), so that B prevents A from garbage collection. Assume that phi is only stored in C._coerce_from_hash, i.e., C would not prevent B from garbage collection. In other words, we assume that phi is weakened but psi isn't.

Let us assume first that we did not discover chi as a coercion yet. If we have a strong reference to C but no external reference to B, then B and A could of course be garbage collected.

Now, let us assume that we did discover that chi is a coercion and put it into C._coerce_from_hash, in the attempt of weakening it. C would have a strong reference to chi, which has a strong reference to its first map, psi, which has a strong reference to both its domain A and codomain B. Hence, C would prevent both A and B from garbage collection.

I think this indeed qualifies as a memory leak, according to the definition I gave in some post above.

Difficult. Can this be solved, even with copying? I have to think about it.

This is already done in my current branch, and it is called _registered_domains (simply a list).

And the maps inserted into _coerce_from_hash are weakened or not?

In my current branch, it is not weakened. Perhaps it should be. It would indeed be conceptually easier if a map is in the coercion system if and only if it is weakened. One could do it, since _registered_domains would keep the domain alive, Note, however, that this would not suffice for fixing the memory leak described above. We would still have that chi refers to psi, which strongly refers to its codomain B (the codomain is always strong), and then B._registered_domains strongly refers to A.

In this situation,

we want to have a strong reference from B to A, since we used B.register_coercion(mor).
we must not have a strong reference from C to A, since chi was discovered but not registered.
we must not have a strong reference from C to B, since phi was not registered.
we could live with a strong reference from A to B, I guess.

If C is alive, then we want that it does not prevent A or B from garbage collection. But if both A and C are alive, then chi must remain a valid map, hence, B must be prevented from garbage collection. It follows that if C is alive then either A and B get collected together, or they both stay alive.

I should catch some sleep now, perhaps I'll find a solution to the puzzle tomorrow.

simon-king-jena commented 11 years ago

Description changed:

--- 
+++ 
@@ -70,14 +70,12 @@
 this bug is fixed, we surely want to make `SchemeMorphism` inherit from
 `Morphism`. This transition is prepared here.

+Weakened maps should only be used in the coercion system: A weakened map can become invalid by garbage collection, and the coercion system has the job to remove a map from the coercion cache as soon as it becomes invalid.
+
+Maps outside of the coercion system should be safe against invalidation. Hence, when we take a coerce map, then we should better create a non-weakened copy. The branch also provides copying (and pickling) for *all* kinds of maps and morphisms (hopefully no map/morphism class went unnoticed).
+
 In any case, the commit messages should give a concise description of what has
 been done.
-
-**__Still TODO__**
-
-- Let the string representation of weakened maps point the user to the need of
-creating a copy.
-- Provide copying for *all* kinds of maps.

 **__TODO in future tickets__**

simon-king-jena commented 11 years ago

Changed work issues from String repr. of weakened maps; copying/pickling of maps to none

simon-king-jena commented 11 years ago

comment:125

With the new commit I have pushed today, all doctest should pass.

simon-king-jena commented 11 years ago

comment:126

Let me elaborate a bit more on the memory leak from comment:124.

First of all, this leak is not introduced by my branch. Hence, it would probably be better to attempt a fix on a different ticket, as the changes introduced in my branch already are big enough.

Now for a deeper analysis of what happens. I want to argue that '''there is only one scenario in which this leak occurs. This scenario rarely occurs and can easily be avoided.'''

Let phi: A -> B and psi: B -> C be maps (sorry for changing the names compared with comment:124...), and define chi = psi*phi: A -> C. We assume that phi and psi are coerce maps, and thus chi is a coerce map as well, but initially Sage is not aware of chi.

chi could be registered (i.e., C.register_coercion(chi)), it could be that C._coerce_map_from_(A) provides a shortcut, or it could be that chi is discovered by the backtracking algorithm of the coercion system.

Registering chi

Of course, if chi is explicitly registered as a coercion, then C will (with the current code!!) keep A alive, and in order to not invalidate chi, B will be kept alive as well. I don't consider this a memory leak, since it is an explicit registration.

_coerce_map_from_

Typically, C._coerce_map_from_(A) just returns True, None or False, and not a map. If it returns true, then a direct conversion chi' from A to C is stored as coercion. Note that chi' is not a composite map. So, we would be in a totally different situation. Since chi' has no reference to B, to phi or to psi, there is no leak in this case.

Theoretically, C._coerce_map_from_(A) could return the composite map chi. This would be possible, and it would create a memory leak. Hence, we learn that _coerce_map_from_ should better not return a composite map. I don't think we can automatically avoid a leak in this case.

Discovering chi by backtracking

We need to distinguish cases, since there are different ways of how the coercion system became aware of phi and psi.

The punchline is: '''If chi is discovered by backtracking then psi is stored in C._coerce_from_list.''' Without psi being on this list, backtracking won't find chi.

Hence, there has C.register_coercion(psi) been done. With the current code, it means that C will keep B alive.

There remain only three cases:

Assume that phi is a registered coercion. Hence, with the current code, B keeps A alive, and still C will keep B alive. Hence, C also keeps A alive. Adding chi to C._coerce_from_hash[A] won't change these lifetime dependencies. No leak.
Assume that phi is a coerce embedding. Hence, A will keep B alive, and still C will keep B alive, but neither C nor B keep A alive. In particular, phi is weakened, so that there is no strong reference from phi to A. Adding chi to C._coerce_from_hash[A] will not change these lifetime dependencies, since the key A is only weakly referenced. No leak.
Assume that phi has been discovered by backtracking or has been provided by B._coerce_map_from_(A) as a short-cut. In particular, it is weak and only has a weak reference to A. Then, still C keeps B alive, but B does not keep A alive, nor does A keep B alive, and C will also not keep A alive. If we put C._coerce_from_hash[A]=chi, then again C will not prevent A from garbage collection, since A is only weakly referenced in the MonoDict, and if there is no external reference to C, then a strong reference to A will not be enough to keep B alive. No leak.

Conclusion

A composite map can only arise in the coercion system, (1) if it is explicitly registered, or (2) if the second map of the composition is explicitly registered, or (3) if the composite map is returned by _coerce_map_from_.

I think case (1) does not constitute a leak. I have shown that there is no memory leak in case (2). Case (3) is a leak, but this case can easily be avoided by returning a "simple" map that is mathematically equivalent to the composite map.

simon-king-jena commented 11 years ago

comment:127

PS: Since teaching will start next week for me, I will probably not be able to fix the leak from comment:124, even if you succeed to convince me that it really is a leak. So, I guess it is safe to start reviewing the attached branch, I think it will not change in the next few days...

simon-king-jena commented 11 years ago

comment:128

PPS: I just found that Sage's coercion system is clever enough to find a composite map if phi is registered as coerce embedding and psi is a short-cut:

sage: A = Aclass()
sage: B = Bclass()
sage: C = Cclass()
sage: phi = sage.categories.map.Map(A,B)
sage: A.register_embedding(phi)
sage: psi = C.coerce_map_from(B)
sage: print psi
Generic map:
  From: <class '__main__.Bclass'>
  To:   <class '__main__.Cclass'>

        WARNING: This map has apparently been used internally
        in the coercion system. It may become defunct in the next
        garbage collection. Please use a copy.
sage: print phi
Generic map:
  From: <class '__main__.Aclass'>
  To:   <class '__main__.Bclass'>

        WARNING: This map has apparently been used internally
        in the coercion system. It may become defunct in the next
        garbage collection. Please use a copy.
sage: C.coerce_map_from(A)
Composite map:
  From: <class '__main__.Aclass'>
  To:   <class '__main__.Cclass'>

        WARNING: This map has apparently been used internally
        in the coercion system. It may become defunct in the next
        garbage collection. Please use a copy.

Here, before discovering and caching the composite map, A keeps B alive because of the embedding, and C neither keeps A nor B alive. After caching the composite map, A still keeps B alive, and C still does not keep A alive, because it only occurs as weak key in a MonoDict.

But we have

sage: del psi, B, A
sage: import gc
sage: _ = gc.collect()
sage: len([x for x in gc.get_objects() if isinstance(x,Aclass)])
0
sage: len([x for x in gc.get_objects() if isinstance(x,Bclass)])
1

So, why is B not garbage collected? To be investigated, I need to hurry now.

simon-king-jena commented 11 years ago

comment:129

Replying to @simon-king-jena:

sage: del psi, B, A
sage: import gc
sage: _ = gc.collect()
sage: len([x for x in gc.get_objects() if isinstance(x,Aclass)])
0
sage: len([x for x in gc.get_objects() if isinstance(x,Bclass)])
1

So, why is B not garbage collected? To be investigated, I need to hurry now.

Argh. Because simply I forgot to delete phi...

Let's try again, this time without leaving a reference to the maps.

sage: import gc
sage: class Aclass(Parent): pass                  
sage: class Bclass(Parent): pass                  
sage: class Cclass(Parent):                       
....:    def _coerce_map_from_(self, P):
....:        if isinstance(P, Bclass):
....:            return sage.categories.map.Map(P,self)
....:         
sage: A = Aclass()
sage: B = Bclass()
sage: C = Cclass()
sage: A.register_embedding(sage.categories.map.Map(A,B))
sage: C.has_coerce_map_from(A)
True
sage: del A,B
sage: gc.collect()
862
sage: len([x for x in gc.get_objects() if isinstance(x,Aclass)])
0
sage: len([x for x in gc.get_objects() if isinstance(x,Bclass)])
0

So, no leak.

I think my analysis is now complete: A memory leak caused by composite coerce maps will only arise if C._coerce_map_from_(A) returns the composite map. I have also shown that there is no need to let C._coerce_map_from_(A) return a composite map.

Hence, I would argue that C._coerce_map_from_(A) returning a composite map is a misuse. Granted, it is far from obvious that it is a misuse. I feel tempted to investigate how often composite maps are actually returned by _coerce_map_from_.

simon-king-jena commented 11 years ago

comment:130

Replying to @simon-king-jena:

Hence, I would argue that C._coerce_map_from_(A) returning a composite map is a misuse. Granted, it is far from obvious that it is a misuse. I feel tempted to investigate how often composite maps are actually returned by _coerce_map_from_.

For example, I found that during startup of Sage composite maps are returned by Q._coerce_map_from_(P) for the following values of P, Q:

Coercion Rational Field to Complex Field with 2 bits of precision
Coercion Rational Field to Complex Field with 53 bits of precision
Coercion <type 'int'> to Univariate Polynomial Ring in x over Integer Ring
Coercion Integer Ring to Complex Field with 2 bits of precision
Coercion <type 'int'> to Complex Field with 53 bits of precision
Coercion <type 'int'> to Real Interval Field with 53 bits of precision
Coercion <type 'int'> to Univariate Polynomial Ring in x over Rational Field
Coercion <type 'int'> to Real Interval Field with 64 bits of precision
Coercion Complex Lazy Field to Complex Double Field
Coercion <type 'int'> to Univariate Polynomial Ring in x over Algebraic Real Field

I guess in all of these cases the domain and the "middle parent" will be immortal anyway.

nbruin commented 11 years ago

comment:131

Replying to @simon-king-jena:

First of all, this leak is not introduced by my branch. Hence, it would probably be better to attempt a fix on a different ticket, as the changes introduced in my branch already are big enough.

Agreed.

Typically, C._coerce_map_from_(A) just returns True, None or False, and not a map. If it returns true, then a direct conversion chi' from A to C is stored as coercion.

Hm, I agree that the references are in that case out of reach of what we're considering here, but all this is saying is "it is okay to use conversion as a coercion from A to C". This conversion still has to be programmed/stored/discovered somewhere, so I'd expect that some conversion cache might be liable to hold a strong reference.

The punchline is: '''If chi is discovered by backtracking then psi is stored in C._coerce_from_list.''' Without psi being on this list, backtracking won't find chi.

Hence, there has C.register_coercion(psi) been done. With the current code, it means that C will keep B alive.

Right, I was already expecting something along these lines when you explained the function of coerce_from_list: The backbone of the coercion framework presently requires lifetime specifications to be explicit and it seems this is not just a by-product of the implementation, it seems to be part of the spec. That's fine by itself. Whether having such implications is sufficient for sage in the future remains to be seen, but changing that is a redesign problem that would need to be carefully considered (just as it might be desirable to allow multiple embeddings to be registered)

As a consequence, in the present model, register_coercion(...,strong=false) would not be advisable.

Case (3) is a leak, but this case can easily be avoided by returning a "simple" map that is mathematically equivalent to the composite map.

This would be a necessary step to avoid the leak, and all the coercion system can do, but if programmed in a generic way, the references causing the leak would likely still be present internally.

simon-king-jena commented 11 years ago

comment:132

Replying to @nbruin:

The punchline is: '''If chi is discovered by backtracking then psi is stored in C._coerce_from_list.''' Without psi being on this list, backtracking won't find chi.

Modulo the oversight I have corrected in my previous posts:

If a composed map chi is discovered by backtracking, then either the second map is registered as a coercion (hence, C keeps B alive) or the first map is the coerce embedding of A (hence, A keeps B alive). But, as I have shown, storing the composed map as coercion from A to C (in C._coerce_from_hash[A]) does not cause a leak, at least not with the attached branch.

Right, I was already expecting something along these lines when you explained the function of coerce_from_list: The backbone of the coercion framework presently requires lifetime specifications to be explicit and it seems this is not just a by-product of the implementation, it seems to be part of the spec. That's fine by itself. Whether having such implications is sufficient for sage in the future remains to be seen, but changing that is a redesign problem that would need to be carefully considered

Agreed. Currently, the coercion system operates on a virtual digraph, and I could actually imagine that this digraph could become an actual object with a fast graph backend. This might give more flexibility for our coercion system. But this would require a major rewrite.

(just as it might be desirable to allow multiple embeddings to be registered)

Why? What should be done with these embeddings?

nbruin commented 11 years ago

comment:133

Replying to @simon-king-jena:

Why? What should be done with these embeddings?

I don't have a direct application in mind (hence the might), but just for symmetry it seems appropriate.

One possible example would be someone working on some weak approximation problem, having a whole bunch of number fields K with specified embeddings in CC as well as Qp (for some p). In this application, Qp may well be just as immortal as CC is, so using register_coercion would not express the right life time implications: The K should get deleted while Qp remains, just as CC remains.

This would be accomplished by letting K have embeddings in both CC and Qp. I'm not claiming at this point that using coercion is the most appropriate tool to express the relations in this scenario.

There is one benefit one gets from having maps recognized as coercions: A lot of derived structures can now be automatically get built with the appropriate maps between them via pushout constructions. If you just have some maps lying around, constructing the corresponding derived maps will be a lot of work.

That's my reason to really care about an expressive coercion system. My experience with magma, which tends to have a much more restricted notion of coercion, has taught me that building these maps can be a lot of silly work. It would be great if one could "borrow" the coercion system for that every now and again (this is one of the reasons why I think some context manager that can put in "temporary" coercions would be great: inside the context manager one would request the derived map, store it, and then return the coercion system to its original state)

nbruin commented 11 years ago

comment:134

I'm not so sure that the coercion system was designed with embeddings as an alternative to registered coercions: If you register a map only as an embedding (and not also as a coercion on the codomain) you can end up with the coercion system yielding non-transitive results:

class pA(Parent): pass
class pB(Parent): pass
class pC(Parent): pass

A=pA()
B=pB()
C=pC()

BtoA=Hom(B,A)(lambda x: A(x))
AtoC=Hom(A,C)(lambda x: C(x))
A.register_coercion(BtoA)
A.register_embedding(AtoC)

C.coerce_map_from(A) #finds the right map
A.coerce_map_from(B) #finds the right map
C.coerce_map_from(B) #returns none!

(other combinations of using register_coercion and register_embedding do lead to the appropriate discovery)

so, if we want to view the coercion framework as a digraph and valid coercions as paths in this digraph, then an arbitrary combination of register_embedding and register_coercion may lead to invalid manipulations of the graph (i.e., leading to a state where the system fails to provide consistent (transitive) results).

This wasn't such a problem before, but since we are now tying lifetime implications to how a coercion is registered, I think it now becomes apparent.

simon-king-jena commented 11 years ago

comment:135

Replying to @nbruin:

{{{
A.register_coercion(BtoA)
A.register_embedding(AtoC)

C.coerce_map_from(A) #finds the right map
A.coerce_map_from(B) #finds the right map
C.coerce_map_from(B) #returns none!
}}}

IF this is a bug then it should be dealt with on a new ticket. Note, however, that C can not know about the embedding of A into C, and not even about the mere existence of A. So, how could it possibly be aware of a coercion from B to C via A? Hence, I am not sure if this qualifies as a bug or as a misuse.

This wasn't such a problem before, but since we are now tying lifetime implications to how a coercion is registered, I think it now becomes apparent.

I don't quite follow this argument. Anyway, I should now do some more project related work, namely #12630,

nbruin commented 11 years ago

comment:136

Replying to @simon-king-jena:

IF this is a bug then it should be dealt with on a new ticket.

Agreed.

Note, however, that C can not know about the embedding of A into C, and not even about the mere existence of A.

Are you suggesting that C.coerce_map_from(A) already fails? If we follow the digraph model for coercion, then whether there's a path from A to C is a property of the graph, not something the vertices "know" about. Perhaps the following more symmetrically formulated code (which does the same thing anyway) is more convincing:

sage: G=get_coercion_model()
sage: G.discover_coercion(A,C)
(Generic morphism ..., None)
sage: G.discover_coercion(B,A) #currently unweakened with your patch!
(Generic morphism ..., None)
sage: sage: G.discover_coercion(B,C)
None

So, how could it possibly be aware of a coercion from B to C via A? Hence, I am not sure if this qualifies as a bug or as a misuse.

Are you claiming that register_embedding is not supposed to add edges to the same graph as register_coercion does? I don't see how that would lead to a useful model.

This wasn't such a problem before, but since we are now tying lifetime implications to how a coercion is registered, I think it now becomes apparent.

I don't quite follow this argument.

As the documentation of register_embedding shows, it was originally considered to be a rather non-essential component; just some coercion map that gets "blessed" as a particularly canonical one, but doesn't actually has a very different effect (in fact, as we see, really a more limited effect), so people would just have used register_coercion or (even more flexible) _coerce_map_from_.

With less strong references, there is more reason to use register_embedding: it expresses that the domain should keep the codomain alive rather than the other way around; the kind of thing that wouldn't be expressed usefully before anyway.

nbruin commented 11 years ago

comment:137

Replying to @simon-king-jena:

Replying to @nbruin:
A.register_coercion(BtoA)
A.register_embedding(AtoC)

C.coerce_map_from(A) #finds the right map
A.coerce_map_from(B) #finds the right map
C.coerce_map_from(B) #returns none!
IF this is a bug then it should be dealt with on a new ticket.

This is now #15303.

darijgr commented 10 years ago

comment:138

Any info what exactly slowed down the tableaux classes?

Previous Next