sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.33k stars 453 forks source link

Memleak in UniqueRepresentation, @cached_method #12215

Closed vbraun closed 11 years ago

vbraun commented 12 years ago

The documentation says that UniqueRepresentation uses weak refs, but this was switched over to the @cached_method decorator. The latter does currently use strong references, so unused unique parents stay in memory forever:

import sage.structure.unique_representation
len(sage.structure.unique_representation.UniqueRepresentation.__classcall__.cache)

for i in range(2,1000):
    ring = ZZ.quotient(ZZ(i))
    vectorspace = ring^2

import gc
gc.collect()
len(sage.structure.unique_representation.UniqueRepresentation.__classcall__.cache)

Related tickets:

Further notes:

Apply

CC: @simon-king-jena @jdemeyer @mwhansen @vbraun @jpflori

Component: memleak

Keywords: UniqueRepresentation cached_method caching

Author: Simon King

Reviewer: Nils Bruin

Merged: sage-5.7.beta1

Issue created by migration from https://trac.sagemath.org/ticket/12215

simon-king-jena commented 12 years ago
comment:48

Sorry, I was mistaken! It is not two times the same! The first time it is over the rational field, the second time over the integer ring. So, forget my previous questions.

simon-king-jena commented 12 years ago
comment:49

Now I understand the problem:

SymmetricFunctions is a subclass of UniqueRepresentation. By my patch, UniqueRepresentation is using weak references. Apparently SymmetricFunctions therefore can be garbage collected, but - and now comes the strange point - the coercion system still recalls that a coercion to them has already been registered.

Anyway, with my patch, SymmetricFunctions(ZZ) and SymmetricFunctions(QQ)` are created repeatedly, and that's bad.

simon-king-jena commented 12 years ago
comment:50

I updated the second patch, which should solve the problem!!

First of all: The segfault in the tests of sage/libs/pari/gen.pyx was due to my test for the new dealloc method. Following Jeroen's advice, I removed it and stated in the docs that Sage not crashing at exit is an indirect doctest.

Then, all failures in sage/combinat could be fixed by using a strong cache for SymmetricFunctions(...). So, I simply overrode the __classcall__ method inherited from UniqueRepresentation.

I just tested that with the new patch all tests in sage/combinat and sage/libs/pari/gen.pyx pass. The others passed even with the old patch version, so that I am confident that they will pass as well (of course, one must try!).

simon-king-jena commented 12 years ago

Changed work issues from fix it... to none

simon-king-jena commented 12 years ago
comment:51

FWIW, make ptest succeeded.

simon-king-jena commented 12 years ago
comment:52

As I said in my previous post, the tests pass with this patch. The tests also pass with the patch from #12313. However, there are three segfaults that occur when both patches are applied. I have difficulties to trace it down.

simon-king-jena commented 12 years ago
comment:53

Cc to Volker, because I expect he has enough knowledge to give me some advice on how I could trace down the following segfault.

With #12313 and the patch from here, sage -t -verbose -force_lib "devel/sage/doc/en/bordeaux_2008/half_integral.rst" segfaults. By inspection of the core file, I found that the segfault occurs during deallocation of a functor.

For debugging, I added a __dealloc__ method to sage.categories.functor.Functor that writes the type and the address of self and of the two cdef attributes __domain and __codomain to some file. The same is done during initialisation of the functor.

And the last lines of the resulting file (before the segfault) are:

Dealloc Functor <type 'sage.structure.coerce_actions.LeftModuleAction'> at 71023056
  Domain <class 'sage.categories.groupoid.Groupoid'> at 75636560
  Codom. <class 'sage.categories.commutative_rings.CommutativeRings'> at 15429144
Dealloc Functor <type 'sage.structure.coerce_actions.LeftModuleAction'> at 71023056
  Domain <type 'NoneType'> at 140661532564960
  Codom. <type 'NoneType'> at 140661532564960

In other words, the functor is deallocated twice, which is a legitimate reason to segfault.

How can I find out why Sage tries to deallocate it twice?

vbraun commented 12 years ago
comment:54

Is it actually being finalized twice? To me, it seems that just the malloc bin was reused for a second LeftModuleAction instance. In particular, why would domain and codomain be different in the second destructor call.

simon-king-jena commented 12 years ago
comment:55

Replying to @vbraun:

In particular, why would domain and codomain be different in the second destructor call.

Because domain and codomain were deleted the first time. The second time, they already are NoneType.

simon-king-jena commented 12 years ago
comment:56

Replying to @vbraun:

Is it actually being finalized twice? To me, it seems that just the malloc bin was reused for a second LeftModuleAction instance.

And I do believe it is the same instance. Namely, if what you say was right, then we should see a call to "init" between the two deallocations (I made both init and dealloc write to the same log file). But the two deallocations followed directly: No initialisation and no other deallocation in between.

simon-king-jena commented 12 years ago
comment:57

No progress on my side. For my project, it probably means that I have to pick between two evils: Either live with the memleak that would be fixed in #12313, or live with the memleak that would be fixed here. Bad.

simon-king-jena commented 12 years ago
comment:58

Now that's weird:

When I define

    def __dealloc__(self):
        if self.__domain is not None:
            Py_INCREF(self.__domain)
        if self.__codomain is not None:
            Py_INCREF(self.__codomain)

for sage.categories.functor.Functor, then the segfault disappears.

Can this be a solution? It looks weird.

simon-king-jena commented 12 years ago

Description changed:

--- 
+++ 
@@ -14,9 +14,15 @@

Related tickets:

simon-king-jena commented 12 years ago
comment:60

I have updated the second patch, which was about fixing segfaults anyway.

As I already stated: I find it weird that the problem is solved by incrementing the reference count of the domain and codomain of an action when the action is deallocated. But it works, i.e., the doctests that used to segfault with #12313 and the old version of the patches run fine with the new patch version.

I need an expert opinion, though, and the full test suite is also to be run.

Concerning memleaks, here is the example from the ticket description.

With #12313 and the patches from here:

sage: import sage.structure.unique_representation
sage: len(sage.structure.unique_representation.UniqueRepresentation.__classcall__.cache)
135
sage: 
sage: for i in range(2,1000):
....:         ring = ZZ.quotient(ZZ(i))
....:     vectorspace = ring^2
....: 
sage: import gc
sage: gc.collect()
16641
sage: len(sage.structure.unique_representation.UniqueRepresentation.__classcall__.cache)
227

With #12313 only:

sage: import sage.structure.unique_representation
sage: len(sage.structure.unique_representation.UniqueRepresentation.__classcall__.cache)
151
sage: 
sage: for i in range(2,1000):
....:         ring = ZZ.quotient(ZZ(i))
....:     vectorspace = ring^2
....: 
sage: import gc
sage: gc.collect()
3805
sage: len(sage.structure.unique_representation.UniqueRepresentation.__classcall__.cache)
5142

So, it is a clear progress, and IIRC the patches comprise tests against at least one memory leak that is fixed. Needs review!

Apply trac12215_weak_cached_function.patch trac12215_segfault_fixes.patch

simon-king-jena commented 12 years ago

Work Issues: Fix two tests

simon-king-jena commented 12 years ago
comment:61

With sage-5.0.prealpha0 plus #11780, #11290, #715, #11521, #12313 and the patches from here, make ptest results in

        sage -t  -force_lib devel/sage/sage/combinat/sf/sf.py # 1 doctests failed
        sage -t  -force_lib devel/sage/sage/categories/category.py # 1 doctests failed

So, it needs work (because all tests pass when the patches from here are not applied), but it should hopefully be easy to fix.

vbraun commented 12 years ago
comment:62

I tried the following in cdef class Action:

    def __cinit__(self):
        print 'Action __cinit__ ' + str(id(self))

    def __dealloc__(self):
        print 'Action __dealloc__ ' + str(id(self))

then I do get occasionally reused id (=memory address in CPython), for example

    Action __cinit__ 105376976
    Action __dealloc__ 105376976
    Action __cinit__ 105376976
    Action __dealloc__ 105376976

But I don't see any double finalizers without the object being constructed in-between. I also don't get any segfault in bordeaux_2008/half_integral.rst.

vbraun commented 12 years ago
comment:63

For the record, I have these patches applied on top of sage-4.8.rc0:

12221_debug.patch
trac_12247_var_construction.patch
9138_flat.patch
trac11900_category_speedup_combined.patch
trac11900_only_fix_singleton_hash.patch
trac11900_doctest.patch
11115_flat.patch
trac_11115_docfix.patch
trac12215_weak_cached_function.patch
trac12215_segfault_fixes.patch

removed the Py_INCREF(self.__domain) and Py_INCREF(self.__codomain) bandaid. Still no segfault.

simon-king-jena commented 12 years ago
comment:64

Replying to @vbraun:

For the record, I have these patches applied on top of sage-4.8.rc0:

12221_debug.patch
trac_12247_var_construction.patch
9138_flat.patch
trac11900_category_speedup_combined.patch
trac11900_only_fix_singleton_hash.patch
trac11900_doctest.patch
11115_flat.patch
trac_11115_docfix.patch
trac12215_weak_cached_function.patch
trac12215_segfault_fixes.patch

removed the Py_INCREF(self.__domain) and Py_INCREF(self.__codomain) bandaid. Still no segfault.

Sure. As I stated in some post above, the segfault only results when applying both #12313 (hence, its dependency #715 as well) and the (old) patches from here.

If you only have the (old or new) patches from here or only have #715+#12313 then there is no segfault.

vbraun commented 12 years ago
comment:65

I ran all doctests and there are a few crashes in functor.so elsewhere. I didn't have to apply any additional patches. It dies with

Action __cinit__ 84546128
Action __dealloc__ 84546128
Action __cinit__ 84546128
Action __dealloc__ 84546128
Action __cinit__ 84628736
Action __cinit__ 84546128
Action __dealloc__ 84546128
Action __dealloc__ 84546128
/home/vbraun/opt/sage-4.8.rc0/local/lib/libcsage.so(print_backtrace+0x31)[0x7fcc0db1adf6]
/home/vbraun/opt/sage-4.8.rc0/local/lib/libcsage.so(sigdie+0x14)[0x7fcc0db1ae28]
/home/vbraun/opt/sage-4.8.rc0/local/lib/libcsage.so(sage_signal_handler+0x20c)[0x7fcc0db1aa76]

It seems that its just memory corruption that manifests itself by freeing the object twice. But the error is presumably elsewhere. Also the gdb stack trace is completely corrupted.

vbraun commented 12 years ago
comment:66

Here is the stack trace:

#0  0x00007ffaadb88511 in __pyx_tp_dealloc_4sage_10categories_7functor_Functor (o=0x63ed250) at sage/categories/functor.c:2845
#1  0x00007ffaad970cc8 in __pyx_tp_dealloc_4sage_10categories_6action_Action (o=0x63ed250) at sage/categories/action.c:5943
#2  0x00007ffaad5485a0 in __pyx_tp_dealloc_4sage_9structure_14coerce_actions_ModuleAction (o=0x63ed250) at sage/structure/coerce_actions.c:7505
#3  0x00007ffabbcf8f0c in type_call (type=<optimized out>, args=0x63e09e0, kwds=0x0) at Objects/typeobject.c:748
#4  0x00007ffabbca27a3 in PyObject_Call (func=0x7ffaad754ec0, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2492
#5  0x00007ffaad53ffbb in __pyx_pf_4sage_9structure_14coerce_actions_1detect_element_action (__pyx_self=0x0, __pyx_args=0x63fbb40, __pyx_kwds=0x0)
    at sage/structure/coerce_actions.c:4616
#6  0x00007ffabbca27a3 in PyObject_Call (func=0x2683dd0, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2492
#7  0x00007ffaaeb0ea32 in __pyx_f_4sage_9structure_6parent_6Parent_discover_action (__pyx_v_self=0x644ab00, __pyx_v_S=0x6448770, 
    __pyx_v_op=0x7ffab525aea8, __pyx_v_self_on_left=1) at sage/structure/parent.c:16618
#8  0x00007ffaaed48057 in __pyx_f_4sage_9structure_10parent_old_6Parent_get_action_c_impl (__pyx_v_self=0x644ab00, __pyx_v_S=0x6448770, 
    __pyx_v_op=0x7ffab525aea8, __pyx_v_self_on_left=1) at sage/structure/parent_old.c:3312
#9  0x00007ffaaed47ea2 in __pyx_pf_4sage_9structure_10parent_old_6Parent_4get_action_impl (__pyx_v_self=0x644ab00, __pyx_args=0x63fb910, 
    __pyx_kwds=0x0) at sage/structure/parent_old.c:3258
#10 0x00007ffabbca27a3 in PyObject_Call (func=0x636a5a8, arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2492
#11 0x00007ffaaed46ee7 in __pyx_f_4sage_9structure_10parent_old_6Parent_get_action_c (__pyx_v_self=0x644ab00, __pyx_v_S=0x6448770, 
    __pyx_v_op=0x7ffab525aea8, __pyx_v_self_on_left=1, __pyx_skip_dispatch=0) at sage/structure/parent_old.c:2935
#12 0x00007ffaaed4f19d in __pyx_f_4sage_9structure_10parent_old_6Parent__get_action_ (__pyx_v_self=0x644ab00, __pyx_v_other=0x6448770, 
    __pyx_v_op=0x7ffab525aea8, __pyx_v_self_on_left=1, __pyx_skip_dispatch=0) at sage/structure/parent_old.c:6228
#13 0x00007ffaaeb0b17c in __pyx_f_4sage_9structure_6parent_6Parent_get_action (__pyx_v_self=0x644ab00, __pyx_v_S=0x6448770, __pyx_skip_dispatch=0, 
    __pyx_optional_args=0x7fff38b8e2f0) at sage/structure/parent.c:15635
#14 0x00007ffaae1fa2e6 in __pyx_f_4sage_9structure_6coerce_24CoercionModel_cache_maps_discover_action (__pyx_v_self=0x26286d0, __pyx_v_R=0x644ab00, 
    __pyx_v_S=0x6448770, __pyx_v_op=0x7ffab525aea8, __pyx_skip_dispatch=0) at sage/structure/coerce.c:12473
#15 0x00007ffaae1f6564 in __pyx_f_4sage_9structure_6coerce_24CoercionModel_cache_maps_get_action (__pyx_v_self=0x26286d0, __pyx_v_R=0x644ab00, 
    __pyx_v_S=0x6448770, __pyx_v_op=0x7ffab525aea8, __pyx_skip_dispatch=0) at sage/structure/coerce.c:11424
#16 0x00007ffaae1e64e2 in __pyx_f_4sage_9structure_6coerce_24CoercionModel_cache_maps_bin_op (__pyx_v_self=0x26286d0, __pyx_v_x=0x6354b48, 
    __pyx_v_y=0x63e36b0, __pyx_v_op=0x7ffab525aea8, __pyx_skip_dispatch=0) at sage/structure/coerce.c:6583
#17 0x00007ffaae448f03 in __pyx_pf_4sage_9structure_7element_6Vector_1__mul__ (__pyx_v_left=0x6354b48, __pyx_v_right=0x63e36b0)
    at sage/structure/element.c:16130
#18 0x00007ffabbc9dc5f in binary_op1 (v=0x6354b48, w=0x63e36b0, op_slot=16) at Objects/abstract.c:917
#19 0x00007ffabbca0cc8 in PyNumber_Multiply (v=0x6354b48, w=0x63e36b0) at Objects/abstract.c:1188
#20 0x00007ffa9be33b68 in __pyx_f_4sage_5rings_13residue_field_12ReductionMap__call_ (__pyx_v_self=0x63f10e8, __pyx_v_x=0x6405108, 
    __pyx_skip_dispatch=0) at sage/rings/residue_field.c:8140

within coercion_model.bin_op() (frame 17) there are calls to Python methods (PyObject_Call), and in there the garbage collector is free to run. I suspect that this is what is happening somewhere...

simon-king-jena commented 12 years ago
comment:67

Replying to @vbraun:

I ran all doctests and there are a few crashes in functor.so elsewhere. I didn't have to apply any additional patches.

What exactly do you mean? Do you have the old patches from here applied (i.e., without the new __dealloc__ method), or does the segfault even occur with the new patches?

Is it normal that both you and me see segfaults, and it seems to be analogous problems (namely double deallocation), but we see it in different examples and with different patches (namely, even with the old patches from here, all tests pass for me)?

It dies with ... It seems that its just memory corruption that manifests itself by freeing the object twice.

So, you can confirm that it is the same object.

But the error is presumably elsewhere. Also the gdb stack trace is completely corrupted.

That sounds like one should write a complete log of all python code executed - according to your suggestion that the error somewhere occurs during a Python method.

simon-king-jena commented 12 years ago
comment:68

Here is some more info on the segfault.

Setting: I have sage-5.0.prealpha0 plus #11780, #11290, #715, #11521, #12313 and the patches from here, removing the __dealloc__ method introduced by the last patch.

The segfault is triggered by doing

sage: half_integral_weight_modform_basis(DirichletGroup(16,QQ).1, 3, 10)
[]
sage: half_integral_weight_modform_basis(DirichletGroup(16,QQ).1, 5, 10)
/home/simon/SAGE/sage-5.0.prealpha0/local/lib/libcsage.so(print_backtrace+0x31)[0x7fe047add9c6]
/home/simon/SAGE/sage-5.0.prealpha0/local/lib/libcsage.so(sigdie+0x14)[0x7fe047add9f8]
/home/simon/SAGE/sage-5.0.prealpha0/local/lib/libcsage.so(sage_signal_handler+0x20c)[0x7fe047add646]
/lib64/libpthread.so.0(+0xfd00)[0x7fe04cd80d00]
...

When I revert the lines, that's to say, if I do

sage: half_integral_weight_modform_basis(DirichletGroup(16,QQ).1, 5, 10)
[q - 2*q^3 - 2*q^5 + 4*q^7 - q^9 + O(q^10)]
sage: half_integral_weight_modform_basis(DirichletGroup(16,QQ).1, 3, 10)
[]
sage: quit
Exiting Sage (CPU time 0m2.02s, Wall time 0m20.49s).

**********************************************************************

Oops, Sage crashed. We do our best to make it stable, but...

A crash report was automatically generated with the following information:
  - A verbatim copy of the crash traceback.
  - A copy of your input history during this session.
  - Data on your current Sage configuration.

It was left in the file named:
        '/home/simon/.sage/ipython/Sage_crash_report.txt'
If you can email this file to the developers, the information in it will help
them in understanding and correcting the problem.

You can mail it to: sage-support at sage-support@googlegroups.com
with the subject 'Sage Crash Report'.

If you want to do it now, the following command will work (under Unix):
mail -s 'Sage Crash Report' sage-support@googlegroups.com < /home/simon/.sage/ipython/Sage_crash_report.txt

To ensure accurate tracking of this issue, please file a report about it at:
http://trac.sagemath.org/sage_trac

Press enter to exit:

I was tracing all python commands for the first variant of the segfault. The last few lines of the log are as follows:

sage.categories.pushout:__call__:2125         if self.p == other.p:
sage.categories.pushout:__call__:2126             from sage.all import Infinity
sage.categories.pushout:__call__:2127             if self.prec == other.prec:
sage.categories.pushout:__call__:2128                 extras = self.extras.copy()
sage.categories.pushout:__call__:3102     except CoercionException:
sage.categories.pushout:__call__:3104     except (TypeError, ValueError, AttributeError, NotImplementedError), ex:
sage.categories.pushout:__call__:3108         raise CoercionException(ex)
weakref:__call__:49             self = selfref()
weakref:__call__:50             if self is not None:
weakref:__call__:51                 del self.data[wr.key]
sage.rings.power_series_ring:__call__:556         s = "Power Series Ring in %s over %s"%(self.variable_name(), self.base_ring())
sage.rings.power_series_ring:__call__:557         if self.is_sparse():
sage.rings.power_series_ring:__call__:562         return self.__is_sparse
sage.rings.power_series_ring:__call__:559         return s

So, indeed it seems that the problem has something to do with weak references. There is an item of a weak value dictionary deleted right before segfaulting.

To do: Find out what item of what dictionary is deleted, why it is deleted, and how deletion can be prevented.

simon-king-jena commented 12 years ago
comment:69

I was also tracing the deletion of items of weak value dictionaries: I was writing the key to a log file whenever an item was deleted.

Already when starting sage, we see that the same key (and presumably the same value as well) is deleted repeatedly:

...
((<class 'sage.categories.category.JoinCategory'>, (Category of semirings, Category of infinite enumerated sets)), ())
((<class 'sage.categories.groupoid.Groupoid'>, Integer Ring), ())
((<class 'sage.categories.groupoid.Groupoid'>, Rational Field), ())
((<class 'sage.categories.groupoid.Groupoid'>, Rational Field), ())
((<class 'sage.categories.groupoid.Groupoid'>, Rational Field), ())
((<class 'sage.categories.groupoid.Groupoid'>, Complex Lazy Field), ())
((<class 'sage.categories.groupoid.Groupoid'>, Rational Field), ())

When issuing the first line of the crashing example and repeating it, we see something like

...
((<class 'sage.categories.groupoid.Groupoid'>, Complex Lazy Field), ())
((<class 'sage.categories.groupoid.Groupoid'>, Complex Lazy Field), ())
((<class 'sage.categories.groupoid.Groupoid'>, Cyclotomic Field of order 4 and degree 2), ())
((<class 'sage.matrix.matrix_space.MatrixSpace'>, Rational Field, 0, 0, False), ())
((<class 'sage.matrix.matrix_space.MatrixSpace'>, Rational Field, 10, 0, False), ())
((5, 0, 'prealpha0'), (Rational Field, 0, False, None))
((<class 'sage.matrix.matrix_space.MatrixSpace'>, Rational Field, 0, 0, False), ())
((<class 'sage.matrix.matrix_space.MatrixSpace'>, Rational Field, 10, 0, False), ())

And at crashing, one has

((<class 'sage.matrix.matrix_space.MatrixSpace'>, Ring of integers modulo 46337, 4, 10, False), ())
((<class 'sage.categories.vector_spaces.VectorSpaces'>, Ring of integers modulo 46337), ())
((<class 'sage.matrix.matrix_space.MatrixSpace'>, Integer Ring, 4, 10, False), ())
((<class 'sage.matrix.matrix_space.MatrixSpace'>, Rational Field, 4, 10, False), ())
((<class 'sage.categories.groupoid.Groupoid'>, Power Series Ring in q over Rational Field), ())
((<class 'sage.categories.groupoid.Groupoid'>, Power Series Ring in q over Rational Field), ())
((<class 'sage.categories.groupoid.Groupoid'>, Power Series Ring in q over Rational Field), ())
((<class 'sage.categories.groupoid.Groupoid'>, Power Series Ring in q over Integer Ring), ())

Conclusion:

The occurring keys indicate that the deletions occur in UniqueRepresentation. While using weak references for UniqueRepresentation fixes memory leaks, it seems that far too often stuff is removed that would actually still be needed. Certainly it is bad for speed, and it seems that it is also responsible for the segmentation faults.

I am not sure how that problem should best be addressed.

simon-king-jena commented 12 years ago

Changed work issues from Fix two tests to Fix a coercion problem in sage.combinat.sf.sf

simon-king-jena commented 12 years ago
comment:70

I think I have not properly stated that with the latest patches applied to sage-5.0.prealpha0, the segfault is gone. However, at least when I also have a couple of other tickets (#11780, #12290, #715. #11521, #12313, #12357, #12351, #7797), I get one coercion error in sage.combinat.sf.sf.

To be precise, I do not get that error when I only have all the other patches. So, it really seems to be caused by the patches from here. Trying to track it down...

simon-king-jena commented 12 years ago
comment:71

That's odd. The failing test is from the __call__ method in sage.combinat.sf.sf. When I execute things in the command line, I get the following:

sage: Sym = SymmetricFunctions(QQ[x])
sage: p = Sym.p(); s = Sym.s()
sage: P = p[1].parent()
sage: S = s[1].parent()
sage: P.coerce_map_from(S)
Generic morphism:
  From: Symmetric Function Algebra over Univariate Polynomial Ring in x over Rational Field, Schur symmetric functions as basis
  To:   Symmetric Function Algebra over Univariate Polynomial Ring in x over Rational Field, Power symmetric functions as basis
sage: S.coerce_map_from(P)
Generic morphism:
  From: Symmetric Function Algebra over Univariate Polynomial Ring in x over Rational Field, Power symmetric functions as basis
  To:   Symmetric Function Algebra over Univariate Polynomial Ring in x over Rational Field, Schur symmetric functions as basis

However, when the same is executed as a doctest, then there is no coercion map between S and P. Could it be that some other doctest is messing with the coercion maps, and my patch (perhaps in combination with #715 and #11521) reveals it?

simon-king-jena commented 12 years ago
comment:72

That's even odder. With #11780, #12290, #715. #11521, #12313, #12357, #12351, #7797 and #12645 (so, adding #12645, which only changes the rst markup in sage/combinat/sf/sf.py), all tests in sage/combinat pass.

Anyway. Since the second patch is in conflict with #12645 anyway, I am rebasing it. Since the doctest error has vanished, I put it back to "needs review", even though I wish I knew what was the reason for the temporary problem.

simon-king-jena commented 12 years ago

Changed dependencies from #11115 #11900 to #11115 #11900 #12645

simon-king-jena commented 12 years ago

Changed work issues from Fix a coercion problem in sage.combinat.sf.sf to none

simon-king-jena commented 12 years ago
comment:73

Bad. Meanwhile I work on top of sage-5.0.beta7. This time, it is the first patch that creates a coercion error in sage/combinat/sf/sf.py. Needs work.

simon-king-jena commented 12 years ago

Work Issues: coercion in symmetric function algebras

simon-king-jena commented 12 years ago
comment:74

Even worse: After applying related tickets (#715, #11521, #12313, #12357) to sage-5.0.beta13, 16 out of 18 hunks fail to apply. So, I need to find out where the problem comes from.

simon-king-jena commented 12 years ago

Changed dependencies from #11115 #11900 #12645 to #11115 #11900 #12645 #11599

simon-king-jena commented 12 years ago

Changed work issues from coercion in symmetric function algebras to Rebase wrt #11599. Coercion in symmetric function algebras

simon-king-jena commented 12 years ago
comment:75

It comes from #11599, which fixes the same docstring misformattings that I fix in my patch as well...

simon-king-jena commented 12 years ago
comment:76

Arrgh. With #715, #11521, #12313, #11943, #11935, #12357 and #7797 on top of sage-5.0.beta13, all tests pass. But adding the (rebased) patch from here, I get failures in

        sage -t  -force_lib "devel/sage/sage/structure/coerce_dict.pyx"
        sage -t  -force_lib "devel/sage/sage/combinat/sf/macdonald.py"
        sage -t  -force_lib "devel/sage/sage/combinat/sf/llt.py"
        sage -t  -force_lib "devel/sage/sage/combinat/sf/jack.py"
        sage -t  -force_lib "devel/sage/sage/combinat/sf/kschur.py"
        sage -t  -force_lib "devel/sage/sage/combinat/sf/hall_littlewood.py"
        sage -t  -force_lib "devel/sage/sage/combinat/sf/sfa.py"
        sage -t  -force_lib "devel/sage/sage/combinat/sf/multiplicative.py"
        sage -t  -force_lib "devel/sage/sage/combinat/sf/schur.py"
        sage -t  -force_lib "devel/sage/sage/combinat/species/library.py"
        sage -t  -force_lib "devel/sage/sage/combinat/combinatorial_algebra.py"
        sage -t  -force_lib "devel/sage/sage/categories/homset.py"

That's not good.

simon-king-jena commented 12 years ago
comment:77

Oops, I had only the first of the two patches from here applied. Nevertheless, it doesn't look good.

simon-king-jena commented 12 years ago
comment:78

I have rebased the first patch relative to #11599.

With both patches, one "only" has errors in

        sage -t  -force_lib "devel/sage/sage/structure/coerce_dict.pyx"
        sage -t  -force_lib "devel/sage/sage/combinat/sf/sf.py"
        sage -t  -force_lib "devel/sage/sage/categories/homset.py"

So, it still needs work, but it is less bad than I thought...

Apply trac12215_weak_cached_function.patch trac12215_segfault_fixes.patch

simon-king-jena commented 12 years ago

Changed work issues from Rebase wrt #11599. Coercion in symmetric function algebras to Coercion in symmetric function algebras

simon-king-jena commented 12 years ago
comment:79

A bit more detail: The tests in coerce_dict.pyx and homset.py fail even if only the first patch is applied. But the tests in sf.py pass if only the first patch is applied.

simon-king-jena commented 12 years ago
comment:80

I tested whether the problem comes from the combination of this ticket with #12357. But it turns out that the following test

        sage: K = GF(1<<55,'t')
        sage: for i in range(50):
        ...     a = K.random_element()
        ...     E = EllipticCurve(j=a)
        ...     P = E.random_point()
        ...     Q = 2*P
        sage: import gc
        sage: n = gc.collect()
        sage: from sage.schemes.elliptic_curves.ell_finite_field import EllipticCurve_finite_field
        sage: LE = [x for x in gc.get_objects() if isinstance(x, EllipticCurve_finite_field)]
        sage: len(LE)    # indirect doctest
        1

still fails. The test has been introduced in #12313. And of course it is not acceptable that #12313 makes a memory leak disappear, but #12215 makes it show up again.

simon-king-jena commented 12 years ago

Changed work issues from Coercion in symmetric function algebras to Keep the fix from #12313. Coercion in symmetric function algebras

simon-king-jena commented 12 years ago
comment:81

I think I located the problem. By some patch, I had introduced a weak dictionary in sage.structure.factory. But somehow I managed to remove the corresponding hunk from the patch. Now, I need to find out where that has happened...

simon-king-jena commented 12 years ago
comment:82

Aha! It turns out that I introduced the WeakValueDictionary in the first patch from here, but somehow I managed to delete it. Now the leak remains fixed, the patch is updated.

Apply trac12215_weak_cached_function.patch trac12215_segfault_fixes.patch

simon-king-jena commented 12 years ago
comment:83

With the updated version of the first patch (applied on top of #715, #11521, #12313, #11943 and #11935), the tests in sage/structure/coerce_dict and sage/categories/homset pass.

There remains the problem with symmetric functions, but this is due to the second patch...

simon-king-jena commented 12 years ago
comment:84

What exactly is the problem?

It is

            sage: S = SymmetricFunctions(ZZ)
            sage: S.inject_shorthands()
            doctest:...: RuntimeWarning: redefining global value `e`
            doctest:...: RuntimeWarning: redefining global value `m`
            sage: s[1] + e[2] * p[1,1] + 2*h[3] + m[2,1]
            s[1] - 2*s[1, 1, 1] + s[1, 1, 1, 1] + s[2, 1] + 2*s[2, 1, 1] + s[2, 2] + 2*s[3] + s[3, 1]

The last line fails with an error when doctesting, but works fine when doing the same in an interactive session.

simon-king-jena commented 12 years ago
comment:85

The failure is really strange. If one does

sage: S = SymmetricFunctions(ZZ)
sage: S.inject_shorthands()
sage: e.has_coerce_map_from(m)

on the command line, then one gets the answer "True". Doing the same in a separate doctest, one still gets "True". But doing the same in line 384 of sage.combinat.sf.sf.py, one gets "False". So, there seems to be a nasty diffcult-to-debug side effect, which apparently was introduced by the second patch.

simon-king-jena commented 12 years ago
comment:86

The error disappears if one does not override that __classcall__ method of symmetric function algebras. However, by the first ticket, it uses a weak cache, which results in many errors elsewhere...

But if I recall correctly, there has been a recent ticket dealing with coercion for symmetric functions. Perhaps a miracle occurs and the strongly cached custom __classcall__ can be cancelled (count the words that start with "c"...)?

simon-king-jena commented 12 years ago
comment:87

How unfortunate. If I remove the custom (strongly cached) __classcall__ of symmetric function algebras, I get

The following tests failed:

    sage -t  -force_lib "devel/sage/sage/combinat/sf/macdonald.py"
    sage -t  -force_lib "devel/sage/sage/combinat/sf/llt.py"
    sage -t  -force_lib "devel/sage/sage/combinat/sf/jack.py"
    sage -t  -force_lib "devel/sage/sage/combinat/sf/kschur.py"
    sage -t  -force_lib "devel/sage/sage/combinat/sf/hall_littlewood.py"
    sage -t  -force_lib "devel/sage/sage/combinat/sf/classical.py"
    sage -t  -force_lib "devel/sage/sage/combinat/sf/sfa.py"
    sage -t  -force_lib "devel/sage/sage/combinat/sf/elementary.py"
    sage -t  -force_lib "devel/sage/sage/combinat/sf/multiplicative.py"
    sage -t  -force_lib "devel/sage/sage/combinat/sf/schur.py"
    sage -t  -force_lib "devel/sage/sage/combinat/sf/homogeneous.py"
    sage -t  -force_lib "devel/sage/sage/combinat/species/library.py"
    sage -t  -force_lib "devel/sage/sage/combinat/combinatorial_algebra.py"

But if one has a custom strong cache for symmetric function algebras, then one has the single failure in

    sage -t -force_lib "devel/sage/sage/combinat/sf/sf.py"