sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.44k stars 481 forks source link

Sage crashes when inverting/dividing large number field elements #20693

Closed a6be518e-e92a-4ea3-ad2e-c326b54a0773 closed 8 years ago

a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago

This ticket used to be about a crash that occurred when computing newforms for a certain character of modulus 23 in sage 7.2. Here's how to reproduce it (you have to wait 10 minutes or so until the crash happens):

sage: D=DirichletGroup(23)
sage: c=D.gen()^2
sage: N=Newforms(c,6, names='a')

It turned out that this was due to NTL running out of FFT primes when inverting number field elements with humongous denominators. Moreover, it turned out that we only ran into this problem in the example (and other examples in the comments) because the function _invert_c_() of a number field element was doing unnecessary work.

Component: number fields

Author: Stephan Ehlen

Branch/Commit: eb3da68

Reviewer: Peter Bruin, Fredrik Stromberg

Issue created by migration from https://trac.sagemath.org/ticket/20693

a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago

Description changed:

--- 
+++ 
@@ -1,9 +1,11 @@
 Computing newforms for a certain character of modulus 23 fails in sage 7.2.
 Here's how to reproduce it (you have to wait 10 minutes or so until the crash happens):

+```
 sage: D=DirichletGroup(23)
 sage: c=D.gen()^2
 sage: N=Newforms(c,6, names='a')
+```

 Then I get
a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago
comment:2

Let me add something that might be helpful in finding the bug: I get the same / a similar crash when doing the following, which is how I discovered it in the first place. It confirms what the crash looks like above, that the real bug lies somewhere in the linear algebra and/or number field code.

sage: D=DirichletGroup(23)
sage: c=D.gen()^2
sage: M=ModularSymbols(c,6,sign=1)
sage: S=M.cuspidal_subspace().new_subspace()
sage: A=D[0]
sage: v  = A.dual_eigenvector(names='a', lift=False)

And crash...

------------------------------------------------------------------------
0   signals.so                          0x00000001067855c5 print_backtrace + 37
------------------------------------------------------------------------
Unhandled SIGABRT: An abort() occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------
[1]    7238 abort      ~/Documents/math/devel/sage-git/sage

And let me also add that I tried a binary distribution of sage 7.1 on yet another machine as well with the Newforms(...) function and got the same crash.

a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago
comment:4

I would like to add a remark: something really extremely bad happened with the implementation of relative Number Fields. It was always slow but now, and this might be related to this crash here, spaces that I computed using older versions of sage (6.1 to be concrete and give a pointer) within minutes or up to an hour or a bit more, do not finish to be computed within a day now. A concrete example is Gamma0(19), character [zeta18^2], weight 14. I had the modular symbols space computed with sage 6.1 and stored the cputime used to compute it: 3780s. And now it didn't finish within 7 hours. I believe this happens in the very same function computing the dual_eigenvector. To be precise, it happens when coercing the elements of the base ring into the extension created within that code. I'm not sure if it is related to the crash or if it is a different problem but both of the problems are in fact problems with relative number fields, it seems

nbruin commented 8 years ago
comment:5

I confirm the crash and I'm getting the same traceback.

For regressions in relative number field performance: it would be nice to have a smaller example where both 6.1 and 7.2 run in reasonable time. We can then just profile the code. There's a good chance that something will be sticking out there, leading to the place of the regression.

jdemeyer commented 8 years ago
comment:6

@sagetrac-ehlen: are you writing code or intend to write code to fix this? For me, seeing an Author filled in is a good reason not to investigate the bug.

jdemeyer commented 8 years ago
comment:7

Replying to @sagetrac-ehlen:

sage: D=DirichletGroup(23)
sage: c=D.gen()^2
sage: M=ModularSymbols(c,6,sign=1)
sage: S=M.cuspidal_subspace().new_subspace()
sage: A=D[0]
sage: v  = A.dual_eigenvector(names='a', lift=False)

For me, this gives

AttributeError: 'DirichletGroup_class_with_category.element_class' object has no attribute 'dual_eigenvector'

and no crash...

jdemeyer commented 8 years ago

Replying to @sagetrac-ehlen:

sage: D=DirichletGroup(23)
sage: c=D.gen()^2
sage: N=Newforms(c,6, names='a')

I let this run for a few minutes and didn't get a crash. Do you have a simpler crashing example?

pjbruin commented 8 years ago
comment:9

To investigate the crash, it may help to add sig_on()...sig_off() around the NTL calls in sage.rings.number_field.number_field_element.NumberFieldElement._invert_c_().

For the slowdown, I tried the following with increasing weights:

sage: G = DirichletGroup(17)
sage: %prun Newforms(G[2], 8, names='a')

Indeed, the method dual_eigenvector() takes up more and more of the time; more precisely, most of it is spent in a number of calls to the PARI function eltreltoabs():

      403   21.366    0.053   21.366    0.053 {method '_eltreltoabs' of 'sage.libs.pari.gen.gen' objects}
       17    1.843    0.108    2.290    0.135 {method 'echelon_form' of 'sage.matrix.matrix_cyclo_dense.Matrix_cyclo_dense' objects}
        1    1.706    1.706    1.963    1.963 {method 'nonpivots' of 'sage.matrix.matrix0.Matrix' objects}
        1    1.549    1.549    4.381    4.381 {method 'height' of 'sage.matrix.matrix_cyclo_dense.Matrix_cyclo_dense' objects}
        1    1.135    1.135   24.522   24.522 module.py:1076(dual_eigenvector)
     4056    0.603    0.000   14.743    0.004 matrix_space.py:1270(matrix)
72532/72531    0.477    0.000    0.928    0.000 number_field.py:9303(_element_constructor_)
65119/65111    0.457    0.000    1.202    0.000 polynomial_ring_constructor.py:50(PolynomialRing)
[...]

Perhaps #18727, #18740 and/or #252 could be relevant for this problem?

a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago
comment:10

@jdemeyer

I'm sorry, I guess I misinterpreted the "Author" field. I probably won't write code for this as I think the bugs and performance problems come from relative extensions of number fields and I don't know much about the code (and most of it is pari in some way, I guess).

My example in #20693 comment:2 was missing lines, sorry again. I guess what I meant was

sage: D=DirichletGroup(23)
sage: c=D.gen()^2
sage: M=ModularSymbols(c,6,sign=1)
sage: S=M.cuspidal_subspace().new_subspace()
sage: Dec = S.decomposition()
sage: A=Dec[0]
sage: v =A.dual_eigenvector(names='a', lift=False)

To get a crash you have to let it run quite some time (I don't remember how long exactly it was, maybe 15 minutes, I can restart it and let you know). I'm not sure if there much simpler/faster examples that crash but I can check.

I can come up with more examples for sure. Indeed you have to wait quite a bit until you see a crash sometimes, I had a computation running for Gamma0(19) and weight 14 for 2 days or so until I saw the crash (but I didn't run out of memory or anything else trivial and it also used to work on the same machine with the same space (and sage should really be able to do it!).

@nbruin I'll try to find some time to check out sage 6.1 again and compute some spaces that also run in sage 7.2 and then get back to you with some profiling results.

I did similar profiling tests as you did, @pjbruin and discovered that the main reason for this being slow is that elements in the base ring (a cyclotomic field) get coerced into a relative extension (obtained by adjoining the roots of a hecke polynomial). This is damn slow for some reason (note that it is stated in the documentation of relative number fields that doing arithmetic in number field towers is very slow but it seems that it got worse than it used to be). I discovered that such operations can be accelerated by very stupid means. For instance, instead of coercing an element e of K into an extension L directly, write it in terms of a power basis (e.list()), coerce the generator of K into L and create the element corresponding to e in L directly. I can document some examples later.

a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago

Changed author from Stephan Ehlen to none

a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago
comment:12

@jdemeyer, OK, so the corrected example I gave in #20693 comment:10 crashes after at most 43 minutes for me. I can probably give examples that crash faster, but not sure. The runtime is most certainly related to the degree of the relative extension (the dimension of the modular symbols space) of the cyclotomic field and it seems that we need to have this large enough to really cause a crash.

pjbruin commented 8 years ago
comment:13

Replying to @sagetrac-ehlen:

I did similar profiling tests as you did, @pjbruin and discovered that the main reason for this being slow is that elements in the base ring (a cyclotomic field) get coerced into a relative extension (obtained by adjoining the roots of a hecke polynomial). This is damn slow for some reason (note that it is stated in the documentation of relative number fields that doing arithmetic in number field towers is very slow but it seems that it got worse than it used to be). I discovered that such operations can be accelerated by very stupid means. For instance, instead of coercing an element e of K into an extension L directly, write it in terms of a power basis (e.list()), coerce the generator of K into L and create the element corresponding to e in L directly. I can document some examples later.

It turns out that NumberField_relative.__base_inclusion() uses a PARI function that is too general (and hence too slow) for this purpose. I opened #20749 to address this.

pjbruin commented 8 years ago
comment:14

Replying to @pjbruin:

For the slowdown, I tried the following with increasing weights:

sage: G = DirichletGroup(17)
sage: %prun Newforms(G[2], 8, names='a')

Indeed, the method dual_eigenvector() takes up more and more of the time; more precisely, most of it is spent in a number of calls to the PARI function eltreltoabs():

After #20749, dual_eigenvector() only takes up 28% of the time; before, it was 69%.

a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago
comment:15

My old examples still crash with #20749 applied. Unfortunately, it does not seem like these were very easy to compute with sage 6.1; everything I tested so far runs in fact much slower in sage 6.1 (which is good, I guess). [Let me mention, although a bit off-topic, that magma computes the newforms in weight 4 in the example above in 0.5s and in weight 6 in 7.5s.]

Now, as for the crash, first of all there is no crash with #20749 when running

N=Newforms(DirichletGroup(23).gen()^2,6, names='a')

However, I now get the more informative

PariError: the PARI stack overflows (current size: 2147483648; maximum size: 2147483648)
You can use pari.allocatemem() to change the stack size and try again.

After doing so:

sage: pari.allocatemem(2147483648*4)

I get a crash again. For some reason this crash is not informative at all on OSX, I will have to run it on Linux to see the backtrace, I guess.

Also, your level 17 example is quite good because by increasing the weight only by one to 9, I the same behaviour (PARI stack overflow, crash when increasing it). Try:

sage: c=DirichletGroup(17).gen()
sage: %time Newforms(c,9, names='a')
pjbruin commented 8 years ago
comment:16

From the traceback it appears that NTL runs out of primes for its FFT-based ZZX_XGCD function. We should catch the NTL error in NumberFieldElement._invert_c_() and use an alternative extended GCD implementation if that happens.

a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago
comment:17

Replying to @pjbruin:

From the traceback it appears that NTL runs out of primes for its FFT-based ZZX_XGCD function. We should catch the NTL error in NumberFieldElement._invert_c_() and use an alternative extended GCD implementation if that happens.

Indeed, after finally managing to get sage run in gdb on OSX, I get

#4  0x00000001076cc353 in NTL::UseFFTPrime(long) () from /Users/stephan/sage/local/lib/libntl.25.dylib
#5  0x00000001077bfaf8 in NTL::zz_pContext::zz_pContext(NTL::INIT_FFT_STRUCT const&, long) ()
   from /Users/stephan/sage/local/lib/libntl.25.dylib
#6  0x00000001077bfd07 in NTL::zz_p::FFTInit(long) () from /Users/stephan/sage/local/lib/libntl.25.dylib
#7  0x0000000107743dba in NTL::resultant(NTL::ZZ&, NTL::ZZX const&, NTL::ZZX const&, long) ()
   from /Users/stephan/sage/local/lib/libntl.25.dylib
#8  0x0000000107749594 in NTL::XGCD(NTL::ZZ&, NTL::ZZX&, NTL::ZZX&, NTL::ZZX const&, NTL::ZZX const&, long) ()
   from /Users/stephan/sage/local/lib/libntl.25.dylib
#9  0x000000018b253179 in __pyx_f_4sage_5rings_12number_field_20number_field_element_18NumberFieldElement__invert_c_(__pyx_obj_4sage_5rings_12number_field_20number_field_element_NumberFieldElement*, NTL::ZZX*, NTL::ZZ*) ()

Is this a bug in NTL? Is it documented anywhere that this might fail? Can we know in advance that this will happen so that we don't have to run into _invert_c_c()?

pjbruin commented 8 years ago
comment:18

After applying #20749, #20759 and #20791, the examples in the ticket description and in comment:10 crash already after about 1-2 minutes; on this ticket we can now focus on the crash instead of the slowness.

pjbruin commented 8 years ago
comment:19

Replying to @sagetrac-ehlen:

I would like to add a remark: something really extremely bad happened with the implementation of relative Number Fields. It was always slow but now, and this might be related to this crash here, spaces that I computed using older versions of sage (6.1 to be concrete and give a pointer) within minutes or up to an hour or a bit more, do not finish to be computed within a day now. A concrete example is Gamma0(19), character [zeta18^2], weight 14. I had the modular symbols space computed with sage 6.1 and stored the cputime used to compute it: 3780s. And now it didn't finish within 7 hours.

With the same branches applied as in comment:18, I tried

%prun Newforms(DirichletGroup(19)[2], 14, names='a')

and got a similar NTL crash as in comment:17 after about 1.5 hours.

a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago
comment:20

I tried to debug this and catch the NTL exception but I failed. Can someone help me? How do I properly catch the exception that NTL is probably raising? I tried adding except + to the declarations of ZZX_XGCD in ZZX.pxd and then subsequently to _invert_c and wrapped the call to _invert_c in try:...:except: but that didn't work... I'm probably doing something stupid.

pjbruin commented 8 years ago
comment:21

Replying to @sagetrac-ehlen:

I tried to debug this and catch the NTL exception but I failed. Can someone help me? How do I properly catch the exception that NTL is probably raising? I tried adding except + to the declarations of ZZX_XGCD in ZZX.pxd and then subsequently to _invert_c and wrapped the call to _invert_c in try:...:except: but that didn't work... I'm probably doing something stupid.

The method _invert_c() should be declared except * instead of except +, and you have to wrap the NTL calls in sig_on()...sig_off(). I made the following changes (not sure if the except + on ZZX_XGCD is actually necessary):

--- a/src/sage/libs/ntl/ZZX.pxd
+++ b/src/sage/libs/ntl/ZZX.pxd
@@ -35,7 +35,7 @@ cdef extern from "sage/libs/ntl/ntlwrap.cpp":
     void ZZX_div_ZZ "div"( ZZX_c x, ZZX_c a, ZZ_c b)
     long ZZX_deg "deg"( ZZX_c x )
     void ZZX_rem "rem"(ZZX_c r, ZZX_c a, ZZX_c b)
-    void ZZX_XGCD "XGCD"(ZZ_c r, ZZX_c s, ZZX_c t, ZZX_c a, ZZX_c b, long deterministic)
+    void ZZX_XGCD "XGCD"(ZZ_c r, ZZX_c s, ZZX_c t, ZZX_c a, ZZX_c b, long deterministic) except +
     void ZZX_content "content"(ZZ_c d, ZZX_c f)
     void ZZX_factor "factor"(ZZ_c c, vec_pair_ZZX_long_c factors, ZZX_c f, long verbose, long bnd)

--- a/src/sage/rings/number_field/number_field_element.pxd
+++ b/src/sage/rings/number_field/number_field_element.pxd
@@ -26,7 +26,7 @@ cdef class NumberFieldElement(FieldElement):
     cdef void _ntl_coeff_as_mpz(self, mpz_t z, long i)
     cdef void _ntl_denom_as_mpz(self, mpz_t z)

-    cdef void _invert_c_(self, ZZX_c *num, ZZ_c *den)
+    cdef void _invert_c_(self, ZZX_c *num, ZZ_c *den) except *
     cdef void _reduce_c_(self)
     cpdef ModuleElement _add_(self, ModuleElement right)
     cpdef ModuleElement _sub_(self, ModuleElement right)
--- a/src/sage/rings/number_field/number_field_element.pyx
+++ b/src/sage/rings/number_field/number_field_element.pyx
@@ -2258,7 +2258,7 @@ cdef class NumberFieldElement(FieldElement):
         """
         return long(self.polynomial())

-    cdef void _invert_c_(self, ZZX_c *num, ZZ_c *den):
+    cdef void _invert_c_(self, ZZX_c *num, ZZ_c *den) except *:
         """
         Computes the numerator and denominator of the multiplicative
         inverse of this element.
@@ -2276,11 +2276,13 @@ cdef class NumberFieldElement(FieldElement):
         """
         cdef ZZX_c t # unneeded except to be there
         cdef ZZX_c a, b
+        sig_on()
         ZZX_mul_ZZ( a, self.__numerator, self.__fld_denominator.x )
         ZZX_mul_ZZ( b, self.__fld_numerator.x, self.__denominator )
         ZZX_XGCD( den[0], num[0],  t, a, b, 1 )
         ZZX_mul_ZZ( num[0], num[0], self.__fld_denominator.x )
         ZZX_mul_ZZ( num[0], num[0], self.__denominator )
+        sig_off()

     def __invert__(self):
         """

Then I get

sage: D=DirichletGroup(23)
sage: c=D.gen()^2
sage: N=Newforms(c,6, names='a')
---------------------------------------------------------------------------
NTLError                                  Traceback (most recent call last)
<ipython-input-3-e9e314cc3f45> in <module>()
----> 1 N=Newforms(c,Integer(6), names='a')

/home/bruinpj/src/sage/local/lib/python2.7/site-packages/sage/modular/modform/constructor.pyc in Newforms(group, weight, base_ring, names)
    452 
    453     """
--> 454     return CuspForms(group, weight, base_ring).newforms(names)
    455 
    456 

/home/bruinpj/src/sage/local/lib/python2.7/site-packages/sage/modular/modform/space.pyc in newforms(self, names)
   1680             names = 'a'
   1681         return [ element.Newform(self, factors[i], names=(names+str(i)) )
-> 1682                  for i in range(len(factors)) ]
   1683 
   1684     def eisenstein_submodule(self):

/home/bruinpj/src/sage/local/lib/python2.7/site-packages/sage/modular/modform/element.pyc in __init__(self, parent, component, names, check)
   1070             if not component.is_simple():
   1071                 raise ValueError("component must be simple")
-> 1072         extension_field = component.eigenvalue(1,name=names).parent()
   1073         if extension_field != parent.base_ring(): # .degree() != 1 and rings.is_NumberField(extension_field):
   1074             assert extension_field.base_field() == parent.base_ring()

/home/bruinpj/src/sage/local/lib/python2.7/site-packages/sage/modular/hecke/module.pyc in eigenvalue(self, n, name)
   1304         if (arith.is_prime(n) or n==1):
   1305             Tn_e = self._eigen_nonzero_element(n)
-> 1306             an = self._element_eigenvalue(Tn_e, name=name)
   1307             _dict_set(ev, n, name, an)
   1308             return an

/home/bruinpj/src/sage/local/lib/python2.7/site-packages/sage/modular/hecke/module.pyc in _element_eigenvalue(self, x, name)
    694         if not x in self.ambient_hecke_module():
    695             raise ArithmeticError("x must be in the ambient Hecke module.")
--> 696         v = self.dual_eigenvector(names=name)
    697         return v.dot_product(x.element())
    698 

/home/bruinpj/src/sage/local/lib/python2.7/site-packages/sage/modular/hecke/module.pyc in dual_eigenvector(self, names, lift, nz)
   1197             x = self._eigen_nonzero_element()
   1198         alpha = w_lift.dot_product(x.element())
-> 1199         beta = ~alpha
   1200         w_lift = w_lift * beta
   1201         w = w * beta

/home/bruinpj/src/sage/src/sage/rings/number_field/number_field_element.pyx in sage.rings.number_field.number_field_element.NumberFieldElement.__invert__ (/home/bruinpj/src/sage/src/build/cythonized/sage/rings/number_field/number_field_element.cpp:22816)()
   2301         cdef NumberFieldElement x
   2302         x = self._new()
-> 2303         self._invert_c_(&x.__numerator, &x.__denominator)
   2304         x._reduce_c_()
   2305         return x

/home/bruinpj/src/sage/src/sage/rings/number_field/number_field_element.pyx in sage.rings.number_field.number_field_element.NumberFieldElement._invert_c_ (/home/bruinpj/src/sage/src/build/cythonized/sage/rings/number_field/number_field_element.cpp:22657)()
   2277         cdef ZZX_c t # unneeded except to be there
   2278         cdef ZZX_c a, b
-> 2279         sig_on()
   2280         ZZX_mul_ZZ( a, self.__numerator, self.__fld_denominator.x )
   2281         ZZX_mul_ZZ( b, self.__fld_numerator.x, self.__denominator )

/home/bruinpj/src/sage/src/sage/libs/ntl/error.pyx in sage.libs.ntl.error.NTL_error_callback (/home/bruinpj/src/sage/src/build/cythonized/sage/libs/ntl/error.cpp:794)()
     39 
     40 cdef void NTL_error_callback(const char* s) except *:
---> 41     raise NTLError(s)
     42 
     43 

NTLError: FFT prime index too large
jdemeyer commented 8 years ago
comment:22

Instead of cdef void ... except *, it's more efficient to use cdef int ... except -1. It behaves the same in practice but it doesn't need a call to PyErr_Occurred().

pjbruin commented 8 years ago
comment:23

Another solution, using PARI as a fallback:

--- a/src/sage/rings/number_field/number_field_element.pyx
+++ b/src/sage/rings/number_field/number_field_element.pyx
@@ -40,6 +40,8 @@ from sage.libs.gmp.mpz cimport *
 from sage.libs.gmp.mpq cimport *
 from sage.libs.mpfi cimport mpfi_t, mpfi_init, mpfi_set, mpfi_clear, mpfi_div_z, mpfi_init2, mpfi_get_prec, mpfi_set_prec
 from sage.libs.mpfr cimport mpfr_less_p, mpfr_greater_p, mpfr_greaterequal_p
+from sage.libs.ntl.error import NTLError
+
 from cpython.object cimport Py_EQ, Py_NE, Py_LT, Py_GT, Py_LE, Py_GE
 from sage.structure.sage_object cimport rich_to_bool

@@ -2297,9 +2299,14 @@ cdef class NumberFieldElement(FieldElement):
         if IsZero_ZZX(self.__numerator):
             raise ZeroDivisionError
         cdef NumberFieldElement x
-        x = self._new()
-        self._invert_c_(&x.__numerator, &x.__denominator)
-        x._reduce_c_()
+        try:
+            x = self._new()
+            sig_on()
+            self._invert_c_(&x.__numerator, &x.__denominator)
+            x._reduce_c_()
+            sig_off()
+        except NTLError:
+            x = self._parent(~self._pari_())
         return x

     def _integer_(self, Z=None):

It no longer crashes and returns an answer after about 20 minutes...

a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago
comment:24

@pjbruin: Great! Thanks a lot for your work!

However, the pari alternative seems to be kind of slow (which is much better than crashing, of course). With the example I gave in #20749, the pari inversion takes about 7 minutes but the code below, which mimics the _invert_c function, runs in about 1.5 minutes for me:

d = alpha.polynomial().denominator()
D = alpha.parent().absolute_polynomial().denominator()
r,s,tt = xgcd(alpha.polynomial().numerator()*D,alpha.parent().absolute_polynomial().numerator()*d)
b = s*d/D
F = alpha.parent()
beta = F(F.polynomial_quotient_ring()(b))

I don't like the last line of the code and I still have to make sure I understood everything correctly and it works in all cases but then I could submit a patch for the number field element to use a variant of this code as an alternative. What do you think?

pjbruin commented 8 years ago
comment:25

Replying to @sagetrac-ehlen:

However, the pari alternative seems to be kind of slow (which is much better than crashing, of course).

I would have hoped for PARI to be quite fast, but it seems this is not particularly optimised in PARI. In any case I didn't try hard to make it fast; it was just the easiest solution that didn't crash.

With the example I gave in #20749, the pari inversion takes about 7 minutes but the code below, which mimics the _invert_c function, runs in about 1.5 minutes for me:

d = alpha.polynomial().denominator()
D = alpha.parent().absolute_polynomial().denominator()
r,s,tt = xgcd(alpha.polynomial().numerator()*D,alpha.parent().absolute_polynomial().numerator()*d)
b = s*d/D
F = alpha.parent()
beta = F(F.polynomial_quotient_ring()(b))

I don't like the last line of the code and I still have to make sure I understood everything correctly and it works in all cases but then I could submit a patch for the number field element to use a variant of this code as an alternative. What do you think?

This is definitely a good start. The xgcd is implemented using FLINT, which suggests a slightly more direct approach: use the functions from sage.libs.flint.ntl_interface to convert alpha.__numerator and alpha.__denominator to FLINT fmpz_poly and fmpz objects, respectively (and similarly for the defining polynomial of the number field), then call the FLINT function fmpz_poly_xgcd, and then convert back to NTL. It could even be reasonable to simply always use FLINT instead of NTL in __invert__.

a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago
comment:26

Thanks for the hints - now working on it.

a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago
comment:27

Sorry for the delay.

It's a bit weird but I really must have made a mistake when I timed the implementation in comment:24, although I really don't know what went wrong. Anyway, I cannot reproduce the results and instead all my tests indicate that pari inversion is about as fast as xgcd and also as the following more direct approach using flint which can be used in number_field_element.pyx

cdef void _invert_flint_(self, ZZX_c *num, ZZ_c *den):
    cdef ZZX_c a, b
    ZZX_mul_ZZ( a, self.__numerator, self.__fld_denominator.x )
    ZZX_mul_ZZ( b, self.__fld_numerator.x, self.__denominator )
    cdef fmpz_poly_t a_f, b_f, t_f
    cdef fmpz_poly_t num_f #flint numerator
    cdef fmpz_t den_f #flint denominator
    fmpz_poly_init(a_f); fmpz_poly_init(b_f); 
    fmpz_poly_init(t_f); fmpz_poly_init(num_f); fmpz_init(den_f)
    fmpz_poly_set_ZZX(a_f, a) #convert to flint
    fmpz_poly_set_ZZX(b_f, b) #convert to flint
    fmpz_poly_xgcd_modular(den_f, num_f, t_f, a_f, b_f)
    fmpz_poly_get_ZZX(num[0], num_f) #convert back to NTL
    fmpz_get_ZZ(den[0], den_f) #convert back to NTL
    #finishing up
    ZZX_mul_ZZ(num[0], num[0], self.__fld_denominator.x )
    ZZX_mul_ZZ(num[0], num[0], self.__denominator )

So, I'm not sure if it's worth it to include the flint alternative. But I found something else that even avoids the NTLError to happen and makes the newforms computations way faster in many cases! I'm cleaning up my branch now after a lot of testing and will then push the changes, including your pari alternative.

a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago

Branch: u/ehlen/sage_crashes_when_computing_newforms

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Commit: 9218fb3

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

9218fb3removed unnecessary import
a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago
comment:30

OK, so I really tried a lot of things and I came to the conclusion that what is currently done in _invert_c_ of number_field_element.pyx is too much work, "causes" the NTLError in the cases at hand and inversion can be made much much faster with a minimal change.

The main point is this: Currently, given a number field element of the form x/d, where (if I understood it correctly), x is a polynomial with integer coeffs and d is an integer, modulo a polynomial M/D, where again M is a poly with integer coeffs and M an integer, the code sets

a = x*D
b = M*d

and computes r, s, t, such that

  r = s*a + t*b.

Thus, s/r is the inverse of a=x*D modulo b=M*d (with rational coefficients, in Q[X]/(dM)). Now, I don't really now why this is done and maybe I misunderstand something but I think taking dM here is unnecessary and slows everything down. Anyway, the result is that

  1 = s/r*D*d*x/d + t*D/(r*d)*d*M/D

And so the code returns s/r*D*d as the inverse.

What I did was to instead set

a = x*D
b = M

and then again compute r,s,t as above so that now s/r is the inverse of a modulo M with rational coefficients, i.e. in Q[X]/(M). This means that we get

  1 = s/r*D*d*x/d + t*D/r*M/D

which shows that modulo M/D, we get that s/r*D*d is the inverse. Now, after I wrote this down, it also seems that multiplying x by D is not needed. I will try this out and push the changes immediately.

Btw, all doctests pass.

I also added some doctests to make sure we always test that this issue here is resolved. However, since the example I had at hand is so large, it adds a lot of junk to the source code which I don't like very much. Is there any way to add an external test, that's not necessary in the dotstring in sage (no nosetests at all???)?

a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago
comment:31

These changes, together with what was done before have an enormous impact on computing newforms! Example, in the current branch:

sage: %time N=Newforms(DirichletGroup(17).gen(), 7, names='a')
CPU times: user 34.1 s, sys: 372 ms, total: 34.5 s
Wall time: 34.4 s

use to run for over 6 minutes in sage 7.1.

Moreover, as a remark: In fact, I tested a few cases against magma and it seems we are now able to beat magma even though sage provides somewhat more information since we have the relative extensions over the cyclotomic fields and magma provides absolute number fields. To compute the same space as above in magma takes only less than 2s but to get to to Fourier coefficients takes additional computing time. In sage I get 100 coefficients now in an additional 9s. In magma it takes 44s (although on a different machine which might very well be a bit slower). Anyway, subsequent calls to get more Fourier coefficients scale now very well in sage. To get 200 coefficients with 100 precomputed takes just 9s more but in magma it takes me 90s! So the investment made to compute the space in the beginning with sage pays certainly off if you need more Fourier coefficients.

a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago

Author: Stephan Ehlen

a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago
comment:33

I made the change mentioned above but currently I'm unable to push:

    STDERR: ssh: connect to host trac.sagemath.org port 22: No route to host

I will try again later. All tests pass (and the computation mentioned above is one second faster now but this might well be within the usual variation).

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

e4ba18dremove further unneded operations when inverting
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 9218fb3 to e4ba18d

a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago
comment:35

OK, so pushing worked now again. All tests pass and I did some randomized test for a couple of number fields which all worked (trying to make sure I didn't miss anything with the inversion, but maybe I did and don't see it in the test I do, so it would be very good if someone could check this thoroughly). Anyway, in these tests, I generated in a for loop 1000 times 2 elements r,s in a number field, and test if indeed r/s == ~(s/r) and ~r is the inverse of r and of course ~s the inverse of s... In sage 7.1 the routine usually (using timeit) runs 1.5s and in this branch about a second.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from e4ba18d to a26dd5f

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

a26dd5fSimpler doctest that crashes in sage 7.1 but works now.
pjbruin commented 8 years ago
comment:38

Changing the summary and component of this ticket to reflect the fact that this is purely a bug in the number field code.

Is there a doctest to check that Newforms now works correctly that takes less time? (It takes about 5 minutes on the system that I tested this on.) Alternatively, would you agree with just removing this doctest (given that the bug is in NumberFieldElement)?

The current branch just fixes inversion; division is still susceptible to the same bug. I will upload a commit that fixes _div_() in a similar way, and gets rid of the _invert_c_() method.

jdemeyer commented 8 years ago
comment:39

Replying to @pjbruin:

Changing the summary and component of this ticket to reflect the fact that this is purely a bug in the number field code.

Please also update the ticket description.

pjbruin commented 8 years ago
comment:40

Replying to @jdemeyer:

Replying to @pjbruin:

Changing the summary and component of this ticket to reflect the fact that this is purely a bug in the number field code.

Please also update the ticket description.

This appears to be impossible at the moment due to Trac server problems...

pjbruin commented 8 years ago

Changed branch from u/ehlen/sage_crashes_when_computing_newforms to u/pbruin/20693-sage_crashes_when_computing_newforms

pjbruin commented 8 years ago

Changed author from Stephan Ehlen to Stephan Ehlen, Peter Bruin

pjbruin commented 8 years ago

Changed commit from a26dd5f to 1620359

pjbruin commented 8 years ago
comment:41

Pushing my commit and changing the branch does seem to work.

pjbruin commented 8 years ago
comment:42

I did some cleaning up of _div_() and __invert__(). There seems to be no reason anymore to have a separate _invert_c_() method; I removed this. Note that _div_() is just __invert__() followed by _mul_(), with an intermediate reduction removed.

a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago
comment:43

@pjbruin Very good, I didn't look at _div_(), which was stupid ;-) Anyway, do we really want to have the code for inverting and multiplying duplicated in _div_()? I guess this is for performance reasons? I.e. would there be anything bad about just doing

return self._mul_(right.__invert__())

in _div_()?

a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago

Description changed:

--- 
+++ 
@@ -1,4 +1,4 @@
-Computing newforms for a certain character of modulus 23 fails in sage 7.2.
+This ticket used to be about a crash that occurred when computing newforms for a certain character of modulus 23 in sage 7.2.
 Here's how to reproduce it (you have to wait 10 minutes or so until the crash happens):

@@ -7,114 +7,4 @@ sage: N=Newforms(c,6, names='a')


-Then I get
-
-```
-/usr/local/sage/sage-1/local/lib/python2.7/site-packages/cysignals/signals.so(+0x4715)[0x7fc7327c4715]
-/usr/local/sage/sage-1/local/lib/python2.7/site-packages/cysignals/signals.so(+0x4765)[0x7fc7327c4765]
-/usr/local/sage/sage-1/local/lib/python2.7/site-packages/cysignals/signals.so(+0x6f41)[0x7fc7327c6f41]
-/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7fc73aae0330]
-/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7fc73a73ec37]
-/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7fc73a742028]
-/usr/local/sage/sage-1/local/lib/libntl.so.19(_ZN3NTL13TerminalErrorEPKc+0x2b)[0x7fc72603512b]
-/usr/local/sage/sage-1/local/lib/libntl.so.19(_ZN3NTL11UseFFTPrimeEl+0x33)[0x7fc725ef13c3]
-/usr/local/sage/sage-1/local/lib/libntl.so.19(_ZN3NTL11zz_pContextC1ERKNS_15INIT_FFT_STRUCTEl+0x28)[0x7fc725fd3838]
-/usr/local/sage/sage-1/local/lib/libntl.so.19(_ZN3NTL4zz_p7FFTInitEl+0x17)[0x7fc725fd3967]
-/usr/local/sage/sage-1/local/lib/libntl.so.19(_ZN3NTL9resultantERNS_2ZZERKNS_3ZZXES4_l+0x184)[0x7fc725f602f4]
-/usr/local/sage/sage-1/local/lib/libntl.so.19(_ZN3NTL4XGCDERNS_2ZZERNS_3ZZXES3_RKS2_S5_l+0x50)[0x7fc725f64b20]
-/usr/local/sage/sage-1/local/lib/python2.7/site-packages/sage/rings/number_field/number_field_element.so(+0x91149)[0x7fb7396e1149]
-/usr/local/sage/sage-1/local/lib/python2.7/site-packages/sage/rings/number_field/number_field_element.so(+0xff39)[0x7fb73965ff39]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x1efa)[0x7fc73adf5caa]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7fc73adf9e7d]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x43a1)[0x7fc73adf8151]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7fc73adf9e7d]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x43a1)[0x7fc73adf8151]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7fc73adf9e7d]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x43a1)[0x7fc73adf8151]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7fc73adf9e7d]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(+0x84425)[0x7fc73ad74425]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7fc73ad42733]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(+0x61205)[0x7fc73ad51205]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7fc73ad42733]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(+0xbdfaf)[0x7fc73adadfaf]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(+0xbcb3f)[0x7fc73adacb3f]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7fc73ad42733]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x1d51)[0x7fc73adf5b01]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7fc73adf9e7d]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x43a1)[0x7fc73adf8151]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7fc73adf9e7d]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x43a1)[0x7fc73adf8151]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7fc73adf9e7d]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7fc73adf9fb2]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4dae)[0x7fc73adf8b5e]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7fc73adf9e7d]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(+0x84425)[0x7fc73ad74425]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7fc73ad42733]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x1426)[0x7fc73adf51d6]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7fc73adf9e7d]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x43a1)[0x7fc73adf8151]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7fc73adf9e7d]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(+0x84425)[0x7fc73ad74425]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7fc73ad42733]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x1426)[0x7fc73adf51d6]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7fc73adf9e7d]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x43a1)[0x7fc73adf8151]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7fc73adf9e7d]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x43a1)[0x7fc73adf8151]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7fc73adf9e7d]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7fc73adf9fb2]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4dae)[0x7fc73adf8b5e]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7fc73adf9e7d]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x43a1)[0x7fc73adf8151]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7fc73adf9e7d]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x43a1)[0x7fc73adf8151]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7fc73adf9e7d]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x43a1)[0x7fc73adf8151]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7fc73adf9e7d]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x43a1)[0x7fc73adf8151]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7fc73adf9e7d]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x43a1)[0x7fc73adf8151]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7fc73adf9e7d]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x43a1)[0x7fc73adf8151]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x7fc73adf9e7d]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7fc73adf9fb2]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyRun_FileExFlags+0x92)[0x7fc73ae236f2]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xd9)[0x7fc73ae24c39]
-/usr/local/sage/sage-1/local/lib/libpython2.7.so.1.0(Py_Main+0xc7f)[0x7fc73ae3ac6f]
-/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fc73a729f45]
-python[0x40071e]
-------------------------------------------------------------------------
-Attaching gdb to process id 52504.
-
-Saved trace to /home/ehlen/.sage/crash_logs/crash_BoCUvt.log
-------------------------------------------------------------------------
-Unhandled SIGABRT: An abort() occurred.
-This probably occurred because a *compiled* module has a bug
-in it and is not properly wrapped with sig_on(), sig_off().
-Python will now terminate.
-------------------------------------------------------------------------
-Aborted (core dumped)
-```
-
-The file home/ehlen/.sage/crash_logs/crash_BoCUvt.log is empty.
-
-```
-sage: version()
-'SageMath version 7.2, Release Date: 2016-05-15'
-```
-
-Sage has been compiled from source in this case.
-
-I tested the same commands on a sage 7.1 compiled from source on a different machine and got a crash as well!
-
-```
-sage: N=Newforms(c,6, names='a')
-------------------------------------------------------------------------
-0   signals.so                          0x000000010dc3b5c5 print_backtrace + 37
-------------------------------------------------------------------------
-Unhandled SIGABRT: An abort() occurred.
-This probably occurred because a *compiled* module has a bug
-in it and is not properly wrapped with sig_on(), sig_off().
-Python will now terminate.
-------------------------------------------------------------------------
-```
+It turned out that this was due to NTL running out of FFT primes when inverting number field elements with humongous denominators. Moreover, it turned out that we only ran into this problem in the example (and other examples in the comments) because the function `_invert_c_()` of a number field element was doing unnecessary work.
a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago
comment:45

@pjbruin I can remove the doctest. What I really miss in sage is some more unit tests that would check many many cases and maybe even some random cases to work that should be separate from the doctests but this should be discussed elsewhere.

If you agree that simplifying _div_() makes sense, I could push both changes at any time. (I made some randomized tests and I don't see a performance advantage of the currently duplicated code versus the suggested simplification.)

Also, did you check very carefully that removing those additional multiplications that used to be done in _invert_c() do not cause any trouble anywhere? I wonder how someone could even come up with this.... could it really be that there was really no reason at all for this??? (This is exactly why I think there should be (even) more (complex) testing in sage.) Maybe we should ask the author Joel B. Mohler? However, I don't see activity by him on the github repo and the code has been sitting there since 2009... It seems really trivial to remove it and I can't find any problems but it just seems so hard for me to believe that someone did such a complicated thing without any reason ;-)

a6be518e-e92a-4ea3-ad2e-c326b54a0773 commented 8 years ago
comment:46

PS: my branch u/ehlen/sage_crashes_when_computing_newforms includes these changes now (I didn't change the branch of the ticket since I wanted to wait for your answer because you might have had a good reason for not simplifying div as much as I did).