Open zimmermann6 opened 16 years ago
Description changed:
---
+++
@@ -1,12 +1,6 @@
I propose the following strategy for factor(integer):
-1) do trial division by all primes up to say 1000. This can be done efficiently by a single
- gcd with the product of all those primes.
-2) use GMP-ECM, starting from say B1=100, and increasing B1 by sqrt(B1) at each step, until
- one reaches the _recommended_B1_list value which corresponds to 1/3 of the size of the
- number to be factored. Thus for a 90-digit input, one will stop at B1=250000.
-3) try GMP-ECM P-1 and P+1 with respectively 9*B1 and 3*B1 where B1 is the last value tried
- for ECM. The corresponding cost of those runs will be approximately the same as the last
- ECM curve, thus this will not slow down the average computation, and might find a few factors.
-4) run MPQS or GNFS. You might want to issue a warning to the user (if called from toplevel) at
- that time.
+1. do trial division by all primes up to say 1000. This can be done efficiently by a single gcd with the product of all those primes.
+2. use GMP-ECM, starting from say B1=100, and increasing B1 by sqrt(B1) at each step, until one reaches the _recommended_B1_list value which corresponds to 1/3 of the size of the number to be factored. Thus for a 90-digit input, one will stop at B1=250000.
+3. try GMP-ECM P-1 and P+1 with respectively 9*B1 and 3*B1 where B1 is the last value tried for ECM. The corresponding cost of those runs will be approximately the same as the last ECM curve, thus this will not slow down the average computation, and might find a few factors.
+4. run MPQS or GNFS. You might want to issue a warning to the user (if called from toplevel) at that time.
Comments from some experts:
On 12/11/2007, Robert Bradshaw <robertwb@math.washington.edu> wrote:
> I don't have much expertise in the area, but it looks like a sound
> proposal to me. The trial division bound seems a bit low (and perhaps
> should be adjusted for the size of the input). Is this similar to
> what Pari does? Would it make sense to parallelize ECM/QS by default
> on a multi-core system?
>
> - Robert
>
Bill Hart <goodwillhart@googlemail.com>
This proposal is absolutely correct. It is *exactly* what I would do,
with one exception. I would not issue a warning that MPQS is going to
start, unless the factorisation is over say 70 digits (~90s).
Bill.
I've just discussed this with Paul Zimmerman and here is what we should do (eventually):
1) Trial factoring for primes up to ~1000 2) SQUFOF for numbers up to ~32-40 bits (use the fast implementation in FLINT - it's 4x faster than anything else) 3) Pollard Rho if we have an efficient implementation 4) tinyQS in FLINT if the number of digits is small enough, i.e. if that is going to be faster than GMP-ECM (we have to profile this to determine the cutoff) 5) GMP-ECM with bound up to n^(1/3) 6) p-1 and p+1 with large bound (it works rarely, but saves MPQS if it happens to work - the runtime compared to MPQS is tiny) 7) MPQS up to ~90 digits if large prime variant, ~100 digits if double large prime variant or ~130 digits if triple large prime variant QS 8) GNFS 9) You are the NSA, so you know what to do.
Steps 2) and 3) of my original proposal might be easily implemented using the "finer ECM interface" proposed in #1421.
since no progress was done in 2 years, I close that ticket.
Make sure you understand the procedure for closing tickets. See this section of the Developer's Guide for more information.
This is still an issue--I have code that I need to clean up and post here (though probably won't get around to it 'till this summer).
Make sure you understand the procedure for closing tickets.
sorry, I wasn't aware of that.
Attachment: evaluate_1145.sage.gz
Apply first: #5945, #5310
Patch makes use of SQUFOF, ecmfactor, cunningham_table and sympy.factorint.
To pass all tests cunningham database should become a standard package or the warning in cunningham_tables.py should be removed.
To evalutate run attachment: evaluate_1145.sage
This should also fix #9463.
I have personally been considering moving all the integer factorisation code to a new module. I think it makes sense because a lot of new code is being added and sage/rings/integer.pyx
is already large enough.
What do you think?
+1 to moving all the code to a new factor module. I've just very briefly looked at your code. Also, in terms of code structure, there's no need for all the functions to start with an underscore, and it's probably worth making them cpdef functions for ease of testing. (There's no overhead for module-level methods, as they can't be overridden, so the cdef portion is just as fast.) How does sympy.factorint compare to, say, pari's sieves? What about the quadratic sieve from flint?
Okay, I will move the code to a new module. How should it be called? Should it stay in sage/rings
?
sympy.factorint becomes slow very fast but it can handle perfect powers better than other functions.
quadratic sieve from flint is currently not installed, so I couldn't use it.
Replying to @a-andre:
Okay, I will move the code to a new module. How should it be called? Should it stay in
sage/rings
?sympy.factorint becomes slow very fast but it can handle perfect powers better than other functions.
Personally, I wouldn't care that much about huge perfect powers. I agree with the PARI people that this is essentially a non-issue.
Replying to @a-andre:
Okay, I will move the code to a new module. How should it be called? Should it stay in
sage/rings
?
I would call it sage/rings/factorint.pyx
or so. It would be nice if you would move all the integer factoring code there, not only the newly added code.
I would love to help with this ticket, but I'm already involved in too many tickets...
Perfect power checking doesn't cost that much but it improves considerably the speed if it's a perfect power.
Replying to @a-andre:
Perfect power checking doesn't cost that much but it improves considerably the speed if it's a perfect power.
Trial division doesn't cost that much but it improves considerably the speed if it has lots of small factors.
Now guess which is more likely for a random number...
Besides, if you want to check for a power, just check for a perfect power. No need to call sympy.factorint() for that.
That's what I do.
if n.is_power():
from sympy import factorint
return [(Integer(p),e) for p,e in factorint(n).items()]
That doesn't look good. If n
is a power, you should write n
as b^k
and then factor b
and multiply all exponents by k
instead of using the slower sympy.factorint()
.
Replying to @a-andre:
Okay, I will move the code to a new module. How should it be called? Should it stay in
sage/rings
?
Yes, that'd be a fine place for it.
sympy.factorint becomes slow very fast but it can handle perfect powers better than other functions.
Then -1 from moving away from Pari. We can check for perfect powers very quickly using gmp/mpir first.
quadratic sieve from flint is currently not installed, so I couldn't use it.
Well, expect some followup patches then :).
I moved the integer factorization code to factorint.pyx. The patch is attached to #5945 because adding factor_using_flint is a minor change than this one, so review should be easier.
Currently arith.factor
links to factorint.factor
. We can create a ticket to get rid of it after fixing #5945. But for now it would be to much I think.
Attachment: trac_1145_cunningham_warning.patch.gz
I changed the things mentioned above.
Description changed:
---
+++
@@ -4,3 +4,10 @@
2. use GMP-ECM, starting from say B1=100, and increasing B1 by sqrt(B1) at each step, until one reaches the _recommended_B1_list value which corresponds to 1/3 of the size of the number to be factored. Thus for a 90-digit input, one will stop at B1=250000.
3. try GMP-ECM P-1 and P+1 with respectively 9*B1 and 3*B1 where B1 is the last value tried for ECM. The corresponding cost of those runs will be approximately the same as the last ECM curve, thus this will not slow down the average computation, and might find a few factors.
4. run MPQS or GNFS. You might want to issue a warning to the user (if called from toplevel) at that time.
+
+## Apply
+1. #5310
+2. #5945
+3. [attachment: trac_1145_integer_factorization.patch](https://github.com/sagemath/sage-prod/files/10638295/trac_1145_integer_factorization.patch.gz)
+4. [attachment: trac_1145_cunningham_warning.patch](https://github.com/sagemath/sage-prod/files/10638294/trac_1145_cunningham_warning.patch.gz)
+
if somebody wants to add GNFS to the list of algorithms available from Sage, CADO-NFS 1.0 was just released: http://cado-nfs.gforge.inria.fr/
Depends on #5310 and #5945
Apply trac_1145_integer_factorization.patch and trac_1145_cunningham_warning.patch
Description changed:
---
+++
@@ -8,6 +8,6 @@
## Apply
1. #5310
2. #5945
-3. [attachment: trac_1145_integer_factorization.patch](https://github.com/sagemath/sage-prod/files/10638295/trac_1145_integer_factorization.patch.gz)
-4. [attachment: trac_1145_cunningham_warning.patch](https://github.com/sagemath/sage-prod/files/10638294/trac_1145_cunningham_warning.patch.gz)
+3. #10623
+4. [attachment: trac_1145_integer_factorization.patch](https://github.com/sagemath/sage-prod/files/10638295/trac_1145_integer_factorization.patch.gz)
Attachment: trac_1145_integer_factorization.patch.gz
Yafu (http://sourceforge.net/projects/yafu/) might be an interesting software to include, although I have difficulties finding its license. It includes a multi-thread implementation of SIQS, which might be interesting in the 70-100 digit range.
Paul Zimmermann
I know that it's waiting on #5310, but your patch is doubled: you exported to a file that already contained the patch. Also, we should figure out how to merge these factoring improvements with those in #12125 and #12133; it shouldn't be too hard.
A couple other comments: you should use the perfect_power
method of Integers rather than writing your own base_exponent, and you should pass on the use_pollard
parameter when you recurse.
See also #14970
I propose the following strategy for factor(integer):
Apply
5310
5945
10623
CC: @jdemeyer @jpflori
Component: factorization
Issue created by migration from https://trac.sagemath.org/ticket/1145