Factoring stalls on certain inputs

mhostetter commented 2 years ago

As discovered in #185, factoring 2^256 - 1 stalls out which prevents testing if degree-256 polynomials over GF(2) are primitive. This would likely (?) be fixed once the quadratic sieve (#146) is added.

In [1]: import galois                                                                                                                        

# Takes forever... and may never return?
In [2]: galois.factors(2**256 - 1) 
^C---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-3-6806de99fcaa> in <module>
----> 1 galois.factors(2**256 - 1)

~/repos/galois/galois/_polymorphic.py in factors(value)
    514     """
    515     if isinstance(value, (int, np.integer)):
--> 516         return integer_factor.factors(value)
    517     elif isinstance(value, Poly):
    518         return poly_factor.factors(value)

~/repos/galois/galois/_factor.py in factors(n)
    250     # Step 4
    251     while n > 1 and not is_prime(n):
--> 252         f = pollard_rho(n)  # A non-trivial factor
    253         while f is None:
    254             # Try again with a different random function f(x)

~/repos/galois/galois/_factor.py in pollard_rho(n, c)
    571         a = f(a)
    572         b = f(f(b))
--> 573         d = math.gcd(a - b, n)
    574 
    575     if d == n:

KeyboardInterrupt:

pivis commented 1 year ago

Apologies if all of this is obvious to you or already known/considered. I got here specifically because I really liked this project and was playing with larger GF(2**n) fields.

It looks like 2**256-1 factorization as implemented currently gets to the point where it needs to factorize 340282366920938463463374607431768211457==2**128+1 which is 59649589127497217 * 5704689200685129054721. Unfortunately, it looks like Pollard's rho algorithm is getting much more than usual time for this number for offset c=1 (not sure why - square root of the smallest factor is ~24M but it seems that it takes around 455M steps for Pollard's rho to find the factor). That said running pollard_rho with c=3 finds the factor after 19M iterations which is close to regular working times of Pollard's rho.

I only see 2 things can be done:

Have as many as possible p**n-1 factorizations hardcoded (or in a sqlite db similar to conway_polys.db). A reasonable trick would be to introduce a hardcoded list of primes to check as factors where the list specifically contains large-ish primes that are known factors of 2**n-1 with n up to some value (maybe p**n-1 for other p too).
Use more advanced factorization algorithms so they can get a bit further).

For the 1st approach there seems to be some very interesting public materials regarding mersenne numbers factorizations such as Factorization tables from Cunningham book.

I got a bit confused at first with the book only containing 2**n-1 factorization for odd values of n, but that turned out to be just because otherwise 2**(2k)-1 = (2**k-1) * (2**k +1) and there is a separate section of the book for the 2**n+1 factorization where n=4k. E.g. you can find 59649589127497217 there in factorization of 2**128+1.

So if just prime factors from 2**n-1 and 2**n+1 factorizations (tables called 2- and 2+) are taken from the book and saved aside to be checked e.g. after all factors under 10M and before running more advanced algorithms this should be enough to factorize 2**n-1 for some higher n numbers.

More about the notations they are using and such here. There are some things that are not very convenient (e.g. P40 meaning 40-digit prime number without specifying it, and C200 for not-factored number). The primes hidden under P40 notation are present in separate tables it seems, but they would not be necessarily needed here.

pivis commented 1 year ago

Specifically the link for the notations explanations: https://homes.cerias.purdue.edu/~ssw/cun/notat.txt

mhostetter commented 1 year ago

@pivis thanks for looking into this and responding. I've been hoping someone would help out here -- the prime factorization has always been a weakness in this library. Prime factorization algorithms were new to me when I started this library, so I'm open to review / suggestions / improvements to the factorization algorithms used, and their deployment order in factors().

I got here specifically because I really liked this project and was playing with larger GF(2**n) fields.

Taking too long / forever to factor 2^n and 2^n - 1 is a problem, and has been reported before. In fact, I made and documented some workarounds to avoid those factorizations, if necessary in a pinch. See the "Speed up creation of large finite field classes" note. However, this is only a stopgap and a better solution is definitely needed.

My thoughts on your ideas:

Have as many as possible p**n-1 factorizations hardcoded (or in a sqlite db similar to conway_polys.db)

I think this is a good idea. Finite fields with order 2^n seem to be particularly popular, so adding a database of factorizations for 2^n and 2^n - 1 seems wise. I can work on incorporating that. Thanks for the links. I will definitely use them to construct the database.

Potentially in the future, we can add more p, as you mention. Although, which ones and how many degrees? That gets large quickly...

Use more advanced factorization algorithms so they can get a bit further).

I tried implementing the Quadratic Sieve algorithm twice before, but for various reasons never got it fully working / merged. If you're aware of more powerful algorithms and are interested in contributing, I'd love the help.

I also recently noticed that the perfect power detection (Step 2 in the factors()) seems way too slow (or perhaps I don't have an appropriate expectation), see #443. My suspicion is that iroot() is the culprit. Any help in debugging / diagnosing issues there would be helpful too.

mhostetter commented 1 year ago

@pivis the parsing of those tables was more annoying than anticipated! But I added a database of all the prime factorizations in #452. This will really speed things up!

I'm also patching perfect_power() in #454 to speed up factoring powers of small primes.

Thanks for commenting here, the suggestions, and feedback. I'll release 0.3.2 with the improvement today.

mhostetter commented 1 year ago

I'm closing this. This will now be tracked in the consolidated #456.

mhostetter / galois

Factoring stalls on certain inputs #187