ecm.py should use sympy-seeded random number generator; primefactors should accept `factorint` keywords

sympy / sympy

A computer algebra system written in pure Python

https://sympy.org/

Other

13.04k stars 4.46k forks source link

ecm.py should use sympy-seeded random number generator; primefactors should accept `factorint` keywords #25253

Open arungiridhar opened 1 year ago

arungiridhar commented 1 year ago

Greetings.

Integer factorization using code such as this:

from sympy import primefactors
from timeit import default_timer as timer
start = timer()
print (primefactors(1111111111111111111111111111111111111111111111111111111111111111111111111111111111))
end = timer()
print(end - start)

takes very different runtimes for each call. One user reports runtimes of:

157.78903823095607

98.25994733098196

103.65130478399806

110.66390570497606

309.7015662890044

seconds, all on the same computer and another user reports:

143.1462114790338

293.5698876410024

523.6875524830539

244.05478483595653

41.27652587596094

425.0380747059826

seconds on a different computer.

(Originally discussed and localized here: https://octave.discourse.group/t/symbolic-pkg-has-a-strange-delay/4611/11)

It would be nice if SymPy could return the same runtime for the same input.

smichr commented 1 year ago

probabalistic finding of factors of large numbers will not yield consistent times. You can read to see if there are options that can be supplied to a given method to ensure that it always takes the same time, but is your goal factorization or consistent time in finding the factorization?

arungiridhar commented 1 year ago

Both. It's acceptable for two similar but different inputs to take very different times, but it would be nice if the same input at least took the same time each time. Is it feasible for SymPy to use a deterministic sequence of divisors in Pollard Rho?

smichr commented 1 year ago

ecm uses rgen = random.Random(); it should be using the sympy-seeded generator, as I recall. And within factor_.py there are instantiations like prng = random.Random(seed + retries) which might be ok since they always use the same seed. primefactors should really accept keywords and pass those to factorint. Since it doesn't, call factorint directly (or write your own version of primefactors) and pass use_ecm=False (and a smaller prime to test) and the times will all be about the same. With ecm times to factor 11111111111111111111111111111111111111111111111111 ranged from 0.2 to 3.6; without ecm the times were all about 17 seconds.

asmeurer commented 11 months ago

I wonder if we should implement some retrying in case it gets stuck on a bad seed. We would need some rough estimate of how long a given value should take.

haru-44 commented 11 months ago

I wonder if we should implement some retrying in case it gets stuck on a bad seed.

Is it about _ecm_one_factor or ecm? Each trial ends with a max_curve trial, although there is an element of luck since the sigma is chosen at random.

https://github.com/sympy/sympy/blob/580478123816a986b27d27f379ad3d0482af2c82/sympy/ntheory/ecm.py#L237