sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.33k stars 453 forks source link

Make FiniteField_pari_ffelt the default for generic finite fields #14888

Closed pjbruin closed 10 years ago

pjbruin commented 11 years ago

Ticket #12142 implements an interface to PARI's t_FFELT type for finite fields, which is much faster than the one currently used by Sage, in which finite field elements are PARI objects like Mod(Mod(1, 3)*a, Mod(1, 3)*a^17 + Mod(2, 3)*a + Mod(1, 3)). The attached patch makes the new implementation the default for those finite fields that currently use the slow PARI implementation, namely those of cardinality pn with p > 2 prime, n > 1 and pn > 216.

The actual switch is just a trivial change in the FiniteField constructor. The only other real addition is construction of Integer from t_FFELT; the rest of the patch fixes doctests. Almost all fixes are for doctests that assumed FiniteField_elt_pari to be the default. One somewhat notable point is that certain finite elements now compare differently (see the changed doctest in polynomial_zz_pex.py). This is because gcmp_sage (in sage/libs/pari/misc.h) compares all non-real PARI types via their string representations. With the new FiniteFieldElement_pari_ffelt, string comparison gives the same result as lexicographic comparison of polynomials expressing finite field elements in terms of the chosen generator, which is nice.

Apply:

Depends on #12142 Depends on #15124 Depends on #15125

Component: finite rings

Keywords: FiniteField performance

Author: Peter Bruin, Jeroen Demeyer

Reviewer: Jean-Pierre Flori, Jeroen Demeyer, Peter Bruin

Merged: sage-5.13.beta0

Issue created by migration from https://trac.sagemath.org/ticket/14888

pjbruin commented 11 years ago

Attachment: trac_14888-switch_to_pari_ffelt.patch.gz

switch to FiniteField_pari_ffelt; based on 5.11.beta3 + #12142

pjbruin commented 11 years ago

Description changed:

--- 
+++ 
@@ -1,3 +1,6 @@
 Ticket #12142 implements an interface to PARI's t_FFELT type for finite fields, which is much faster than the one currently used by Sage, in which finite field elements are PARI objects like `Mod(Mod(1, 3)*a, Mod(1, 3)*a^17 + Mod(2, 3)*a + Mod(1, 3))`.  The attached patch makes the new implementation the default for those finite fields that currently use the slow PARI implementation, namely those of cardinality *p<sup>n</sup>* with *p* > 2 prime, *n* > 1 and *p<sup>n</sup>* > 2<sup>16</sup>.

 The actual switch is just a trivial change in the `FiniteField` constructor.  The only other real addition is construction of `Integer` from `t_FFELT`; the rest of the patch fixes doctests.  Almost all fixes are for doctests that assumed `FiniteField_elt_pari` to be the default.  One somewhat notable point is that certain finite elements now compare differently (see the changed doctest in `polynomial_zz_pex.py`).  This is because `gcmp_sage` (in `sage/libs/pari/misc.h`) compares all non-real PARI types via their string representations.  With the new `FiniteFieldElement_pari_ffelt`, string comparison gives the same result as lexicographic comparison of polynomials expressing finite field elements in terms of the chosen generator, which is nice.
+
+Apply: [attachment: trac_14888-switch_to_pari_ffelt.patch](https://github.com/sagemath/sage-prod/files/10658130/trac_14888-switch_to_pari_ffelt.patch.gz)
+
jpflori commented 11 years ago

Reviewer: Jean-Pierre Flori

jpflori commented 11 years ago
comment:2

Looks good and works fine for what I tried, good work Peter!

jdemeyer commented 11 years ago
comment:3

Haven't read the code of #12142, but does pickling work properly and has it been doctested? If not, then #14888 should be set back to needs_work.

pjbruin commented 11 years ago
comment:4

There shouldn't be any problems with pickling, see the comment I just made at #12142.

jdemeyer commented 11 years ago
comment:5

Doctest failures on 32-bit systems:

sage -t --long devel/sage/sage/graphs/generic_graph.py
**********************************************************************
File "devel/sage/sage/graphs/generic_graph.py", line 14898, in sage.graphs.generic_graph.GenericGraph.plot
Failed example:
    H = S.hecke_matrix(2)
Exception raised:
    Traceback (most recent call last):
      File "/var/lib/buildbot/build/sage/arando-1/arando_full/build/sage-5.12.beta4/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 479, in _run
        self.execute(example, compiled, test.globs)
      File "/var/lib/buildbot/build/sage/arando-1/arando_full/build/sage-5.12.beta4/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 838, in execute
        exec compiled in globs
      File "<doctest sage.graphs.generic_graph.GenericGraph.plot[78]>", line 1, in <module>
        H = S.hecke_matrix(Integer(2))
      File "/var/lib/buildbot/build/sage/arando-1/arando_full/build/sage-5.12.beta4/local/lib/python2.7/site-packages/sage/modular/ssmod/ssmod.py", line 1042, in hecke_matrix
        SS, II = self.supersingular_points()
      File "/var/lib/buildbot/build/sage/arando-1/arando_full/build/sage-5.12.beta4/local/lib/python2.7/site-packages/sage/modular/ssmod/ssmod.py", line 906, in supersingular_points
        neighbors = Phi_polys(2,X,ss_points[pos]).roots()
      File "/var/lib/buildbot/build/sage/arando-1/arando_full/build/sage-5.12.beta4/local/lib/python2.7/site-packages/sage/modular/ssmod/ssmod.py", line 206, in Phi_polys
        return x.parent()([j_pow[3] - 162000*j_pow[2] + 8748000000*j_pow[1] -
      File "element.pyx", line 1701, in sage.structure.element.RingElement.__mul__ (sage/structure/element.c:14531)
      File "coerce.pyx", line 803, in sage.structure.coerce.CoercionModel_cache_maps.bin_op (sage/structure/coerce.c:7330)
      File "coerce.pyx", line 917, in sage.structure.coerce.CoercionModel_cache_maps.canonical_coercion (sage/structure/coerce.c:8529)
      File "coerce_maps.pyx", line 82, in sage.structure.coerce_maps.DefaultConvertMap_unique._call_ (sage/structure/coerce_maps.c:3856)
      File "coerce_maps.pyx", line 77, in sage.structure.coerce_maps.DefaultConvertMap_unique._call_ (sage/structure/coerce_maps.c:3757)
      File "/var/lib/buildbot/build/sage/arando-1/arando_full/build/sage-5.12.beta4/local/lib/python2.7/site-packages/sage/rings/finite_rings/finite_field_pari_ffelt.py", line 501, in _element_constructor_
        return self.element_class(self, x)
      File "element_pari_ffelt.pyx", line 145, in sage.rings.finite_rings.element_pari_ffelt.FiniteFieldElement_pari_ffelt.__init__ (sage/rings/finite_rings/element_pari_ffelt.c:2975)
      File "element_pari_ffelt.pyx", line 206, in sage.rings.finite_rings.element_pari_ffelt.FiniteFieldElement_pari_ffelt.construct_from (sage/rings/finite_rings/element_pari_ffelt.c:3341)
    OverflowError: Python int too large to convert to C long
**********************************************************************

(the same error occurs in other places)

sage -t --long devel/sage/sage/groups/generic.py
**********************************************************************
File "devel/sage/sage/groups/generic.py", line 422, in sage.groups.generic.bsgs
Failed example:
    P=E.lift_x(a); P
Expected:
    (a : 9*a^4 + 22*a^3 + 23*a^2 + 30 : 1)
Got:
    (a : 28*a^4 + 15*a^3 + 14*a^2 + 7 : 1)
**********************************************************************
File "devel/sage/sage/groups/generic.py", line 851, in sage.groups.generic.discrete_log_lambda
Failed example:
    P=E.lift_x(a); P
Expected:
    (a : 28*a^4 + 15*a^3 + 14*a^2 + 7 : 1)
Got:
    (a : 9*a^4 + 22*a^3 + 23*a^2 + 30 : 1)
**********************************************************************

(due to a different square root being returned)

jdemeyer commented 11 years ago
comment:6

The first problem is in the following code:

        elif isinstance(x, int) or isinstance(x, long):
            g = (<pari_gen>self._parent._gen_pari).g
            sig_on()
            x_GEN = stoi(x)
            self.construct(INT_to_FFELT(g, x_GEN))

The Python long type doesn't convert to a C long (confusingly, a C long corresponds to a Python int).

To see the problem on 64 bits (let this inspire a doctest):

sage: GF(7^20, 'a')(long(2^63))
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-18-d85771ca8953> in <module>()
----> 1 GF(Integer(7)**Integer(20), 'a')(long(Integer(2)**Integer(63)))

/mazur/release/merger/sage-5.12.beta4/local/lib/python2.7/site-packages/sage/structure/parent.so in sage.structure.parent.Parent.__call__ (sage/structure/parent.c:8372)()

/mazur/release/merger/sage-5.12.beta4/local/lib/python2.7/site-packages/sage/structure/coerce_maps.so in sage.structure.coerce_maps.DefaultConvertMap_unique._call_ (sage/structure/coerce_maps.c:3856)()

/mazur/release/merger/sage-5.12.beta4/local/lib/python2.7/site-packages/sage/structure/coerce_maps.so in sage.structure.coerce_maps.DefaultConvertMap_unique._call_ (sage/structure/coerce_maps.c:3757)()

/mazur/release/merger/sage-5.12.beta4/local/lib/python2.7/site-packages/sage/rings/finite_rings/finite_field_pari_ffelt.pyc in _element_constructor_(self, x)
    499             return x
    500         else:
--> 501             return self.element_class(self, x)
    502 
    503     def _coerce_map_from_(self, R):

/mazur/release/merger/sage-5.12.beta4/local/lib/python2.7/site-packages/sage/rings/finite_rings/element_pari_ffelt.so in sage.rings.finite_rings.element_pari_ffelt.FiniteFieldElement_pari_ffelt.__init__ (sage/rings/finite_rings/element_pari_ffelt.c:2975)()

/mazur/release/merger/sage-5.12.beta4/local/lib/python2.7/site-packages/sage/rings/finite_rings/element_pari_ffelt.so in sage.rings.finite_rings.element_pari_ffelt.FiniteFieldElement_pari_ffelt.construct_from (sage/rings/finite_rings/element_pari_ffelt.c:3341)()

OverflowError: Python int too large to convert to C long
jdemeyer commented 11 years ago
comment:7

Working on it...

jdemeyer commented 11 years ago
comment:8

There are more serious issues with element_pari_ffelt.pyx. For example, it really requires sig_on() to be the sig_on() from libs/pari/gen.pyx. So I feel like this ticket isn't quite ready yet.

jdemeyer commented 11 years ago

Work Issues: PARI sig_on()

jdemeyer commented 11 years ago

Description changed:

--- 
+++ 
@@ -2,5 +2,7 @@

 The actual switch is just a trivial change in the `FiniteField` constructor.  The only other real addition is construction of `Integer` from `t_FFELT`; the rest of the patch fixes doctests.  Almost all fixes are for doctests that assumed `FiniteField_elt_pari` to be the default.  One somewhat notable point is that certain finite elements now compare differently (see the changed doctest in `polynomial_zz_pex.py`).  This is because `gcmp_sage` (in `sage/libs/pari/misc.h`) compares all non-real PARI types via their string representations.  With the new `FiniteFieldElement_pari_ffelt`, string comparison gives the same result as lexicographic comparison of polynomials expressing finite field elements in terms of the chosen generator, which is nice.

-Apply: [attachment: trac_14888-switch_to_pari_ffelt.patch](https://github.com/sagemath/sage-prod/files/10658130/trac_14888-switch_to_pari_ffelt.patch.gz)
+Apply:
+- [attachment: trac_14888-switch_to_pari_ffelt.patch](https://github.com/sagemath/sage-prod/files/10658130/trac_14888-switch_to_pari_ffelt.patch.gz)
+- [attachment: long_to_ffelt.patch​](https://github.com/sagemath/sage/files/ticket14888/a6fcd3561e6c701b1c87d85b45f13d70.gz)
jdemeyer commented 11 years ago

Description changed:

--- 
+++ 
@@ -4,5 +4,3 @@

 Apply:
 - [attachment: trac_14888-switch_to_pari_ffelt.patch](https://github.com/sagemath/sage-prod/files/10658130/trac_14888-switch_to_pari_ffelt.patch.gz)
-- [attachment: long_to_ffelt.patch​](https://github.com/sagemath/sage/files/ticket14888/a6fcd3561e6c701b1c87d85b45f13d70.gz)
-
jdemeyer commented 11 years ago

Changed dependencies from #12142 to #12142, #15124, #15125

jdemeyer commented 11 years ago

Attachment: 14888_fix_32bit.patch.gz

jdemeyer commented 11 years ago

Description changed:

--- 
+++ 
@@ -4,3 +4,4 @@

 Apply:
 - [attachment: trac_14888-switch_to_pari_ffelt.patch](https://github.com/sagemath/sage-prod/files/10658130/trac_14888-switch_to_pari_ffelt.patch.gz)
+- [attachment: 14888_fix_32bit.patch​](https://github.com/sagemath/sage/files/ticket14888/5d0f39f6280407ec3bcc07def8aee925.gz)
jdemeyer commented 11 years ago

Changed reviewer from Jean-Pierre Flori to Jean-Pierre Flori, Jeroen Demeyer

jdemeyer commented 11 years ago

Changed work issues from PARI sig_on() to none

jdemeyer commented 11 years ago
comment:13

Somebody should review attachment: 14888_fix_32bit.patch which fixes doctests on 32-bit systems.

pjbruin commented 11 years ago
comment:14

Looks good and still passes tests on my 64-bit system. I would give attachment: 14888_fix_32bit.patch a positive review except for the fact that the main change in it is specifically for 32-bit systems; at least somebody with a 32-bit system should verify that specific doctest.

jdemeyer commented 11 years ago
comment:15

I assure you that doctests pass on 32-bit.

jdemeyer commented 10 years ago
comment:16

This happens sometimes:

**********************************************************************
File "devel/sage/sage/schemes/plane_conics/con_finite_field.py", line 110, in sage.schemes.plane_conics.con_finite_field.ProjectiveConic_finite_field.?
Failed example:
    C.has_rational_point(point = True)  # output is random
Exception raised:
    Traceback (most recent call last):
      File "/scratch/release/merger/sage-5.13.beta0/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 479, in _run
        self.execute(example, compiled, test.globs)
      File "/scratch/release/merger/sage-5.13.beta0/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 838, in execute
        exec compiled in globs
      File "<doctest sage.schemes.plane_conics.con_finite_field.ProjectiveConic_finite_field.?[9]>", line 1, in <module>
        C.has_rational_point(point = True)  # output is random
      File "/scratch/release/merger/sage-5.13.beta0/local/lib/python2.7/site-packages/sage/schemes/plane_conics/con_finite_field.py", line 132, in has_rational_point
        s, pt = self.has_singular_point(point = True)
      File "/scratch/release/merger/sage-5.13.beta0/local/lib/python2.7/site-packages/sage/schemes/plane_conics/con_field.py", line 595, in has_singular_point
        return True, self.point(Sequence(D.right_kernel().gen()))
      File "/scratch/release/merger/sage-5.13.beta0/local/lib/python2.7/site-packages/sage/schemes/plane_conics/con_field.py", line 886, in point
        p = ProjectiveCurve_generic.point(self, v, check=check)
      File "/scratch/release/merger/sage-5.13.beta0/local/lib/python2.7/site-packages/sage/schemes/generic/scheme.py", line 370, in point
        return self.point_homset() (v, check=check)
      File "/scratch/release/merger/sage-5.13.beta0/local/lib/python2.7/site-packages/sage/schemes/generic/homset.py", line 263, in __call__
        return Set_generic.__call__(self, *args, **kwds)
      File "parent.pyx", line 1011, in sage.structure.parent.Parent.__call__ (sage/structure/parent.c:8392)
      File "coerce_maps.pyx", line 427, in sage.structure.coerce_maps.ListMorphism._call_with_args (sage/structure/coerce_maps.c:8035)
      File "coerce_maps.pyx", line 100, in sage.structure.coerce_maps.DefaultConvertMap_unique._call_with_args (sage/structure/coerce_maps.c:4278)
      File "coerce_maps.pyx", line 90, in sage.structure.coerce_maps.DefaultConvertMap_unique._call_with_args (sage/structure/coerce_maps.c:4089)
      File "/scratch/release/merger/sage-5.13.beta0/local/lib/python2.7/site-packages/sage/schemes/generic/homset.py", line 455, in _element_constructor_
        return self.codomain()._point(self, v, **kwds)
      File "/scratch/release/merger/sage-5.13.beta0/local/lib/python2.7/site-packages/sage/schemes/generic/algebraic_scheme.py", line 583, in _point
        return self.__A._point(*args, **kwds)
      File "/scratch/release/merger/sage-5.13.beta0/local/lib/python2.7/site-packages/sage/schemes/projective/projective_space.py", line 835, in _point
        return SchemeMorphism_point_projective_finite_field(*args, **kwds)
      File "/scratch/release/merger/sage-5.13.beta0/local/lib/python2.7/site-packages/sage/schemes/projective/projective_point.py", line 764, in __init__
        X.extended_codomain()._check_satisfies_equations(v)
      File "/scratch/release/merger/sage-5.13.beta0/local/lib/python2.7/site-packages/sage/schemes/generic/algebraic_scheme.py", line 967, in _check_satisfies_equations
        raise TypeError, "Coordinates %s do not define a point on %s"%(coords,self)
    TypeError: Coordinates [0, 0, 1] do not define a point on Projective Conic Curve over Finite Field in a of size 7^20 defined by x^2 + (a)*y^2 + 2*z^2
**********************************************************************

Note the code path in the traceback showing that Sage thinks the conic is singular.

jdemeyer commented 10 years ago
comment:17

Adding extra doctests shows

sage -t devel/sage/sage/schemes/plane_conics/con_finite_field.py
**********************************************************************
File "devel/sage/sage/schemes/plane_conics/con_finite_field.py", line 110, in sage.schemes.plane_conics.con_finite_field.ProjectiveConic_finite_field.?
Failed example:
    C._coefficients
Expected:
    [1, 0, 0, a, 0, 2]
Got:
    [1, 0, 0, a, 0, 0]
**********************************************************************
File "devel/sage/sage/schemes/plane_conics/con_finite_field.py", line 112, in sage.schemes.plane_conics.con_finite_field.ProjectiveConic_finite_field.?
Failed example:
    D = C.symmetric_matrix(); D
Expected:
    [1 0 0]
    [0 a 0]
    [0 0 2]
Got:
    [1 0 0]
    [0 a 0]
    [0 0 0]
**********************************************************************
jdemeyer commented 10 years ago
comment:18

Sorry, this one seems to be hard to debug. And it doesn't help that Singular is involved too because of the defining polynomial.

jdemeyer commented 10 years ago
comment:19

Note that I'm not even sure that this patch is what actually causes this...

pjbruin commented 10 years ago
comment:20

Hmm, this is annoying. How often does it occur? I just ran doctests on this file several dozens of times and did not get any failures. (System: x86_64 with 5.11.beta3 and this patch applied, and several others which probably shouldn't influence this.)

Do I understand correctly that there are no failures in the doctest checking that the equation is x^2 + (a)*y^2 + 2*z^2? If so, then the problem can hardly anywhere else than in the initialisation of C._coefficients, i.e. (probably) when converting the coefficient of z^2 from Singular to PARI.

I did adapt the conversion function (formerly si2sa_GFqPari, now si2sa_GFq_generic) in #12142; the changes seem harmless, but this function does look like a possible culprit to me.

jdemeyer commented 10 years ago
comment:21

Replying to @pjbruin:

Hmm, this is annoying. How often does it occur? I just ran doctests on this file several dozens of times and did not get any failures.

That might not be enough times. Try a few hundred or 1000 times.

Do I understand correctly that there are no failures in the doctest checking that the equation is x^2 + (a)*y^2 + 2*z^2? If so, then the problem can hardly anywhere else than in the initialisation of C._coefficients

Exactly.

The first thing I want to do now is to make sure that it is really this ticket which causes the bug.

jdemeyer commented 10 years ago
comment:22

It seems the problem only appears if #14957 and #14958 are also applied.

pjbruin commented 10 years ago
comment:23

Replying to @jdemeyer:

It seems the problem only appears if #14957 and #14958 are also applied.

Unfortunately I still haven't been able to reproduce it. I tried a different x86_64 system with these patches applied and ran the test 10,000 times, but it succeeds every time.

jdemeyer commented 10 years ago
comment:24

I tracked down the problem to this line in devel/sage/sage/libs/singular/singular.pyx:

ret = ret + c

where ret is a pari_ffelt element equal to zero, c is the C long equal to 2. The result is 0, while it should be 2.

jpflori commented 10 years ago
comment:25

Great work Jeroen!

pjbruin commented 10 years ago
comment:26

Replying to @jdemeyer:

I tracked down the problem to this line in devel/sage/sage/libs/singular/singular.pyx:

ret = ret + c

where ret is a pari_ffelt element equal to zero, c is the C long equal to 2. The result is 0, while it should be 2.

I wondered about the same line yesterday evening when I was looking at the surrounding function, but thought that it hardly looked like something that could fail with some small probability. Does this line always give the wrong result (and for some reason just isn't called most of the time)?

I noticed that c is defined by cdef int c (i.e. C int, not Python int), but is later used as a long; is that something to be suspicious about?

If not, then (since c is wrapped into a Python int before being added to ret) it must be the conversion from Python int to pari_ffelt that is causing the problem, but in that case it is strange that it only fails here.

jdemeyer commented 10 years ago
comment:27

Replying to @pjbruin:

Replying to @jdemeyer:

I tracked down the problem to this line in devel/sage/sage/libs/singular/singular.pyx:

ret = ret + c

where ret is a pari_ffelt element equal to zero, c is the C long equal to 2. The result is 0, while it should be 2.

I wondered about the same line yesterday evening when I was looking at the surrounding function, but thought that it hardly looked like something that could fail with some small probability.

Does this line always give the wrong result (and for some reason just isn't called most of the time)?

No, it rarely gives the wrong result.

I noticed that c is defined by cdef int c (i.e. C int, not Python int), but is later used as a long; is that something to be suspicious about?

It's not the cleanest coding (c = <int>napGetCoeff(z) would be better), but I don't see how that could make a difference, especially since this doesn't seem to be a case of overflow.

it must be the conversion from Python int to pari_ffelt that is causing the problem

I also guess that this is the problem.

pjbruin commented 10 years ago
comment:28

Replying to @jdemeyer:

it must be the conversion from Python int to pari_ffelt that is causing the problem

I also guess that this is the problem.

In the line ret = ret + c, could you replace c by base(c) to do the coercion explicitly, just to make sure it isn't some quirk in the coercion system? Or insert a line verifying that c == base(c) to test directly whether it is the conversion that is at fault?

jdemeyer commented 10 years ago
comment:29

The annoying thing is that this error is damned hard to reproduce. Sometimes adding debugging code makes the error go away. So as long as we haven't found the root of the problem, we're just guessing.

pjbruin commented 10 years ago
comment:30

Here is one more guess: in sage/rings/finite_rings/element_pari_ffelt.pyx the Python int x is first converted into a PARI t_INT as follows (lines 209-210):

pari_catch_sig_on()
x_GEN = (<pari_gen>pari(x)).g

The order of these two lines should be interchanged.

Before #15125, the second line was x_GEN = stoi(x); here pari_catch_sig_on() must be called first since we call the un-wrapped PARI function. However, now we call pari(x), which does its own pari_catch_sig_on()...pari_catch_sig_off(). As we know, the current error-handling code is non-reentrant. I don't see exactly how this could cause our bug, but maybe my imagination is too limited.

There is a similar mistake in the __pow__ method in the same file.

jdemeyer commented 10 years ago
comment:31

Replying to @pjbruin:

Here is one more guess: in sage/rings/finite_rings/element_pari_ffelt.pyx the Python int x is first converted into a PARI t_INT as follows (lines 209-210):

pari_catch_sig_on()
x_GEN = (<pari_gen>pari(x)).g

The order of these two lines should be interchanged.

Before #15125, the second line was x_GEN = stoi(x); here pari_catch_sig_on() must be called first since we call the un-wrapped PARI function. However, now we call pari(x), which does its own pari_catch_sig_on()...pari_catch_sig_off(). As we know, the current error-handling code is non-reentrant. I don't see exactly how this could cause our bug, but maybe my imagination is too limited.

I agree that is a mistake, but on the other hand, we are not handling any errors here, so the non-reentrant error handling code isn't called. So that cannot explain the problem.

In the line ret = ret + c, could you replace c by base(c)

With that change, I no longer get the error (which unfortunately doesn't imply that the bug is fixed).

pjbruin commented 10 years ago
comment:32

Replying to @jdemeyer:

I agree that is a mistake, but on the other hand, we are not handling any errors here, so the non-reentrant error handling code isn't called. So that cannot explain the problem.

Probably true; the only sort of thing that I could imagine happening here is that PARI runs out of stack space, the call has to be restarted, and the error handler does a longjmp() to the wrong stack frame. However, this can hardly be the problem, because nothing is using a lot of PARI memory here.

In the line ret = ret + c, could you replace c by base(c)

With that change, I no longer get the error (which unfortunately doesn't imply that the bug is fixed).

Mysterious. I checked that the coercion system does exactly the same thing internally, as it should (convert the Python int c to pari_ffelt and then add this to ret in base), so there is no other intermediate conversion that could go wrong.

jdemeyer commented 10 years ago
comment:33

Some more news from the debugging camp: I added

diff --git a/sage/rings/finite_rings/element_pari_ffelt.pyx b/sage/rings/finite_rings/element_pari_ffelt.pyx
--- a/sage/rings/finite_rings/element_pari_ffelt.pyx
+++ b/sage/rings/finite_rings/element_pari_ffelt.pyx
@@ -450,6 +450,7 @@
         pari_catch_sig_on()
         x.construct(FF_add((<FiniteFieldElement_pari_ffelt>self).val,
                            (<FiniteFieldElement_pari_ffelt>right).val))
+        print "%s + %s = %s"%(self, right, x)
         return x

     cpdef ModuleElement _sub_(FiniteFieldElement_pari_ffelt self, ModuleElement right):
@@ -482,6 +483,7 @@
         pari_catch_sig_on()
         x.construct(FF_mul((<FiniteFieldElement_pari_ffelt>self).val,
                            (<FiniteFieldElement_pari_ffelt>right).val))
+        print "%s * %s = %s"%(self, right, x)
         return x

     cpdef RingElement _div_(FiniteFieldElement_pari_ffelt self, RingElement right):

and adjusted some doctests. This clearly shows the error in the conversion to PARI:

sage -t devel/sage/sage/schemes/plane_conics/con_finite_field.py
**********************************************************************
File "devel/sage/sage/schemes/plane_conics/con_finite_field.py", line 108, in sage.schemes.plane_conics.con_finite_field.ProjectiveConic_finite_field.?
Failed example:
    C = Conic([1, a, -5]); C
Expected:
    a * 1 = a
    0 + a = a
    0 + 2 = 2
    Projective Conic Curve over Finite Field in a of size 7^20 defined by x^2 + (a)*y^2 + 2*z^2
Got:
    a * 0 = 0
    0 + 0 = 0
    0 + 2 = 2
    Projective Conic Curve over Finite Field in a of size 7^20 defined by x^2 + (a)*y^2 + 2*z^2
**********************************************************************
File "devel/sage/sage/schemes/plane_conics/con_finite_field.py", line 113, in sage.schemes.plane_conics.con_finite_field.ProjectiveConic_finite_field.?
Failed example:
    C._coefficients
Expected:
    [1, 0, 0, a, 0, 2]
Got:
    [1, 0, 0, 0, 0, 2]
**********************************************************************
jdemeyer commented 10 years ago
comment:34

If you have access to boxen, the build I use with these problems is in /release/sage-5.13.beta0. You could probably copy that and experiment with it.

jdemeyer commented 10 years ago
comment:35

Replying to @pjbruin:

Here is one more guess: in sage/rings/finite_rings/element_pari_ffelt.pyx the Python int x is first converted into a PARI t_INT as follows (lines 209-210):

pari_catch_sig_on()
x_GEN = (<pari_gen>pari(x)).g

The order of these two lines should be interchanged.

I just tried that, and it doesn't make the error go away.

jdemeyer commented 10 years ago
comment:36

Replying to @pjbruin:

In the line ret = ret + c, could you replace c by base(c)

With that change, I no longer get the error (which unfortunately doesn't imply that the bug is fixed).

Mysterious. I checked that the coercion system does exactly the same thing internally

How did you check? I think it's more and more likely that the coercion system is the problem here, because

1) The coercion system is notorious for generating hard-to-reproduce bugs since it relies on (hashes of) memory addresses.

2) Making the conversions explicit (see diff below) fixes the problem.

diff --git a/sage/libs/singular/singular.pyx b/sage/libs/singular/singular.pyx
--- a/sage/libs/singular/singular.pyx
+++ b/sage/libs/singular/singular.pyx
@@ -232,9 +232,9 @@
         c = <long>napGetCoeff(z)
         e = napGetExpFrom(z,1, _ring)
         if e == 0:
-            ret = ret + c
+            ret = ret + base(c)
         elif c != 0:
-            ret = ret  + c * a**e
+            ret = ret  + (a**e) * base(c)
         z = <napoly*>pNext(<poly*>z)
     return ret
pjbruin commented 10 years ago
comment:37

Replying to @jdemeyer:

If you have access to boxen, the build I use with these problems is in /release/sage-5.13.beta0. You could probably copy that and experiment with it.

OK, got it. It gave me a warning about unsupported ssse3 instructions, but at least the files in question don't seem to be affected.

pjbruin commented 10 years ago
comment:38

Replying to @jdemeyer:

Mysterious. I checked that the coercion system does exactly the same thing internally

How did you check? I think it's more and more likely that the coercion system is the problem here

I did the following (and I agree that this doesn't prove that the coercion always goes like this):

sage: F.<a>=FiniteField(7^20)
sage: from sage.structure.coerce import CoercionModel_cache_maps
sage: cm=CoercionModel_cache_maps()
sage: cm.explain(a,int(0))
Coercion on right operand via
    Conversion map:
      From: Set of Python objects of type 'int'
      To:   Finite Field in a of size 7^20
Arithmetic performed after coercions.
Result lives in Finite Field in a of size 7^20
Finite Field in a of size 7^20
jdemeyer commented 10 years ago
comment:39

Got it! The following line is no good:

x_GEN = (<pari_gen>pari(x)).g

Python can destroy the object pari(x) because there are no references to it (the pointer x_GEN doesn't count as a reference for Python's garbage collecting).

jdemeyer commented 10 years ago
comment:40

Patch coming up...

jdemeyer commented 10 years ago

Attachment: 14888_review.patch.gz

jdemeyer commented 10 years ago

Changed author from Peter Bruin to Peter Bruin, Jeroen Demeyer

jdemeyer commented 10 years ago

Description changed:

--- 
+++ 
@@ -5,3 +5,4 @@
 Apply:
 - [attachment: trac_14888-switch_to_pari_ffelt.patch](https://github.com/sagemath/sage-prod/files/10658130/trac_14888-switch_to_pari_ffelt.patch.gz)
 - [attachment: 14888_fix_32bit.patch​](https://github.com/sagemath/sage/files/ticket14888/5d0f39f6280407ec3bcc07def8aee925.gz)
+- [attachment: 14888_review.patch](https://github.com/sagemath/sage-prod/files/10658132/14888_review.patch.gz)
pjbruin commented 10 years ago
comment:42

Replying to @jdemeyer:

Got it! The following line is no good:

x_GEN = (<pari_gen>pari(x)).g

Python can destroy the object pari(x) because there are no references to it (the pointer x_GEN doesn't count as a reference for Python's garbage collecting).

Beautiful! I checked the C code generated by Cython and it does indeed decrease the reference count of this temporary object too early.

I have never been able to reproduce the bug, but I'm convinced that you nailed the problem. I'll run doctests and then give this a positive review.