vermaseren / form

The FORM project for symbolic manipulation of very big expressions
GNU General Public License v3.0
1.15k stars 136 forks source link

Crashing factorize #278

Open jPhy opened 6 years ago

jPhy commented 6 years ago

Running the following program

off statistics;

Symbols s,t,m,e,c;

#include- expression.txt
.sort:read;

Factorize num;
.end

on the polynomial with this expression.txt yields:

FORM 4.2.0 (Apr 11 2018, v4.2.0-47-ga1da4b8) 64-bits  Run: Wed Apr 11 15:06:50 2018
    off statistics;

    Symbols s,t,m,e,c;

    #include- expression.txt
    .sort:read;

    Factorize num;
    .end
ERROR: polynomials too large (> WORDSIZE)
Program terminating at factorize.frm Line 8 --> 
  6.56 sec out of 6.57 sec
jPhy commented 6 years ago

In fact, I ran into the problem when simplifying a rational polynomial. Computing the gcd with a simple polynomial triggers the same error message:

off statistics;

Symbols s,t,m,e,c;

#include- expression.txt
L den = -4*t^2*m^2+s*m^4-6*s*t*m^2+s*t^2-2*s^2*m^2+2*s^2*t+s^3;
.sort:read;

Local gcd = gcd_(num,den);

print gcd;
.end

with output:

ERROR: polynomials too large (> WORDSIZE)
Program terminating at gcd.frm Line 11 --> 
  4.66 sec out of 4.67 sec
vermaseren commented 6 years ago

It is a matter of some default settings that work out very bad in this case. If you put

: MaxNumberSize 150

at the startup of your programs they will work. Let me explain what is happening: When Form starts it has a maximum term size (40K) and a maximum number size which is half that amount because if you have regular terms one could envision that a term is just a coefficien that consists of a numerator and a denominator and hence each could be half that amount. Now come the polynomials and the way they have been programmed (I did not do that) they take for each coefficient this number space. The result is that your large polynomial uses then internally a lot of space. Fortunately you can change this maximum number size. The setting I gave above corresponds to 150 words which is just enough for your first problem, showing after half an hour that the only factor that comes out is 128.

Of course, the way many people use Form these days see the emphasis shifted from ground level terms to complicated function arguments. Hence it may be a good idea to change the defaults for this maximum number size to something more modest, but for instance 100 would be too small for a default. On the other hand 20000 is way too big. You polynomial is of course pushing the limits a bit. If you make it a bit bigger, it will start causing problems again.

Jos

On 12 Apr 2018, at 20:54, Stephan Jahn notifications@github.com wrote:

In fact, I ran into the problem when simplifying a rational polynomial. Computing the gcd with a simple polynomial triggers the same error message:

off statistics;

Symbols s,t,m,e,c;

include- expression.txt

L den = -4t^2m^2+sm^4-6stm^2+st^2-2s^2m^2+2s^2*t+s^3; .sort:read;

Local gcd = gcd_(num,den);

print gcd; .end with output:

ERROR: polynomials too large (> WORDSIZE) Program terminating at gcd.frm Line 11 --> 4.66 sec out of 4.67 sec — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vermaseren/form/issues/278#issuecomment-380909059, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLxEjQDlz_dDR3QbOCCGFipuxQ87DHmks5tn6LpgaJpZM4TP_pF.

vsht commented 6 years ago

Are there actually some guidelines on how to increase the default values (i.e. dos and donts) when working with large expressions on a modern multi-core machine with a lot of RAM (say a Xeon with 16 cores and 128 GB RAM or even more powerful)?

I fully understand that the defaults are chosen in such a way, that Form can run even on very weak computers, but I guess that everyone using Form for big and serious things always ends up switching to TForm running on a big server or even ParForm running on a cluster.

So far, every time I ran into a stage sort or issues described by jPhy, I always tried to increase everything by a factor of 2. For some cases I even needed

* 16x standard values (except for MaxTermSize which is even bigger)
#: MaxTermSize 1200K
#: WorkSpace 640M
#: LargeSize 800M
#: SmallSize 160M
#: ScratchSize 800M
#: HideSize 800M
#: TermsInSmall 1600K
#: LargePatches 4096
#: FilePatches 4096
vermaseren commented 6 years ago

Hi Vladislav,

I would concentrate on SmallSize, LargeSize,TermsiInSmall and ScratchSize, unless your terms are of a complexity that needs MaxTermSize to be increased (together with WorkSpace). Suppose your computer has 128Gbytes and you run TForm. I would put LargeSize at 30G, SmallSize at 5G, TermsInSmall at 10M and ScratchSize at 10G. You should realize that the sort buffers are allocated at least twice and the scratch buffers have either two or three copies. Hence, depending on what you do, the LargeSize may already be a bit too big. It also depends on the memory configuration: does it have swap space? If not, then you have to be a bit conservative. A good rule is to leave 10-20% for the OS to let it cache things its way. If you have virtual memory, you can make the buffers a bit larger (not too much). Batch computers usually do not have swap space. When I have a new machine, I ask the people at our computer group to configure it such that there is at least as much swap space as there is memory. This works out well. Also: it is nice to have at least 4Gbytes/core, but preferably even more. Most computers that are used for numerics do not exceed 4 or 8 Gbytes/core. Our best Form/Tform computer has 24 Gbytess per core. (and 32 cores). Works very well. What also helps is SSD for the scratch and sort files. But keep in mind, that in principle they have a finite number of write cycles. The cheaper types my have to be replaced after a few years, unless you have quite a lot of them. We have the rule that we keep the SSD’s empty (no normal user files) and use them exclusively for the temporary files of Form/Tform. That works well. On the whole, you need to experiment a bit with what sizes of the buffers are good for your prgrams. You have to study top a bit to see what is happening.

By the way: if you increase MaxTermSize, especially Tform (but also Form) may increase the LargeSize by itself, because it is also used for many types of caching and with large values for MaxTermSize and a large number of caches it may blow up rather spectacularly.

I hope this helps a bit

Jos

On 13 Apr 2018, at 04:44, Vladyslav Shtabovenko notifications@github.com wrote:

Are there actually some guidelines on how to increase the default values (i.e. dos and donts) when working with large expressions on a modern multi-core machine with a lot of RAM (say a Xeon with 16 cores and 128 GB RAM or even more powerful)?

I fully understand that the defaults are chosen in such a way, that Form can run even on very weak computers, but I guess that everyone using Form for big and serious things always ends up switching to TForm running on a big server or even ParForm running on a cluster.

So far, every time I ran into a stage sort or issues described by jPhy, I always tried to increase everything by a factor of 2. For some cases I even needed

tueda commented 6 years ago

For your information: I have a script working on 64-bit Linux machines, which generates form.set in the current directory with using the amount of available memory, though it may not be optimal. For example: formset.py -n 16 -o -- MaxTermSize=100K.

jPhy commented 6 years ago

In fact, the file I uploaded is a shortened version of the actual expression I want to simplify. Running gcd, rem, and div_ on the original expression again, I exprerience what @vermaseren pointed out:

If you make it a bit bigger, it will start causing problems again.

Looking around in the source code, I found the line https://github.com/vermaseren/form/blob/master/sources/poly.cc#L150 which restricts the size of temporary internal polynomials to ~4GB. That is probably because the memory is addressed using 32-bit integers (type int, see https://github.com/vermaseren/form/blob/master/sources/poly.cc#L145). Is there a deeper reason why the size of these polynomials is addressed with 32/bit ints (restricting them to <~4GB) or is it possible to use a larger integer type (e.g. long) there?

vermaseren commented 6 years ago

The whole of Form works internally with 32 bit words. I would have to inspect what would happen if we start playing with this limitation in just part of the code. There is a big chance it would not be simple. I did not write that part of Form (none of the .cc files actually). My knowledge of C++ is not very good. In this sense, it would be easier if Ben and/or Takahiro would look at it. The original author was Jan Kuipers, but he left the team more than 4 years ago.

Anyway, all this stuff should happen inside the memory or it will become very slow. At the moment your limit is 2*10^9 words. That is about 8 Gbytes. This does not leave much room for growing (one order of magnitude at best I would say).

Maybe you can try to see what the optimizer makes of it? To my knowledge that one can handle much bigger expressions although it may become rather slow at the highest level of optimizations.

Jos

On 13 Apr 2018, at 12:31, Stephan Jahn notifications@github.com wrote:

In fact, the file I uploaded is a shortened version of the actual expression I want to simplify. Running gcd, rem, and div_ on the original expression again, I exprerience what @vermaseren https://github.com/vermaseren pointed out:

If you make it a bit bigger, it will start causing problems again.

Looking around in the source code, I found the line https://github.com/vermaseren/form/blob/master/sources/poly.cc#L150 https://github.com/vermaseren/form/blob/master/sources/poly.cc#L150 which restricts the size of temporary internal polynomials to ~4GB. That is probably because the memory is addressed using 32-bit integers (type int, see https://github.com/vermaseren/form/blob/master/sources/poly.cc#L145 https://github.com/vermaseren/form/blob/master/sources/poly.cc#L145). Is there a deeper reason why the size of these polynomials is addressed with 32/bit ints (restricting them to <~4GB) or is it possible to use a larger integer type (e.g. long) there?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vermaseren/form/issues/278#issuecomment-381095137, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLxEiK_PhvqDHi_8K-SntXRmWXeGW_4ks5toH5ogaJpZM4TP_pF.

jPhy commented 6 years ago

My expressions are not quite ready for numerical evaluation yet; the next step is a series expansion in one of the variables. The idea is to cancel as many factors as possible to arrive at a simpler numerator/denominator representation to expand from.

We are currently investigating if dividing the troublesome factors out of the numerator would really lead to a simpler expression. Consider for example

(a^3 - b^3) / (a - b) = (a^2 + a b + b^2)

where the number of terms in the (in our case already quite large) numerator actually grows when dividing out a factor.

vsht commented 6 years ago

Dear Jos and Takahiro,

many thanks for sharing your wisdom and code. Highly appreciated!

Cheers, Vladyslav

spj101 commented 6 years ago

Not sure if this is helpful, but it seems possible to find examples where a result can not be obtained even when tuning MaxNumberSize. Can we do something else in such situations?

Running the file:

#-
#: MaxNumberSize 24 * Overflow in Multiplication
*#: MaxNumberSize 25 * ERROR: polynomials too large (> WORDSIZE)

Off Statistics;
Symbols s,t,c,m;
Symbols nc,nci,tr,e;

#include inexpr.txt
L denfac = (2*t^3*c+12*s*t*m^2-32*s*t*c*m^2+6*s*t^2-12*s*t^2*c-12*s^2*m^2+32*s^2*c*m^2+6*s^2*t-14*s^2*t*c);
.sort:load;

L prod = inexpr*denfac;
.sort:prod;

L proddiv = div_(prod,denfac);
.sort:div;

L diff = inexpr - proddiv;
print+s diff;
.end

With expression inexpr.txt.gz (first run: gunzip inexpr.txt.gz) gives Overflow in Multiplication when run with #: MaxNumberSize 24 and ERROR: polynomials too large (> WORDSIZE) when run with #: MaxNumberSize 25 after around 100 seconds.

Here at least we know the size of the input and output polynomials of the div_ routine are around 50MB. Watching the memory usage I notice that form does allocate around 5GB before issuing the error.

Thanks!

spj101 commented 6 years ago

Interestingly, just changing the order in which the symbols are declared makes the above example trivial and there is no need to adjust MaxNumberSize on my system. I suppose this alters the order in which div_ considers the variables.

Working example:

#-
Off Statistics;
Symbols t,c,m,s; * <--- altered this line only
Symbols nc,nci,tr,e;

#include inexpr.txt
L denfac = (2*t^3*c+12*s*t*m^2-32*s*t*c*m^2+6*s*t^2-12*s*t^2*c-12*s^2*m^2+32*s^2*c*m^2+6*s^2*t-14*s^2*t*c);
.sort:load;

L prod = inexpr*denfac;
.sort:prod;

L proddiv = div_(prod,denfac);
.sort:div;

L diff = inexpr - proddiv;
print+s diff;
.end

gives:

FORM 4.2.0 (Apr  3 2018, v4.2.0-46-g77ee4ea) 64-bits  Run: Mon Apr 16 18:50:23 2018
    #-

   diff = 0;

  10.29 sec out of 10.29 sec
spj101 commented 6 years ago

The problem encountered in the examples listed above is due to the fact that the input is not univariate and the denominator in not monic (with the chosen variable order), this was pointed out to me by @benruijl.

The C++ polynomial class in FORM is intended to represent sparse multivariate polynomials with integer coefficients. Since the quotient and remainder of the division of multivariate polynomials with integer coefficients are in general multivariate polynomials with rational coefficients, the division routine poly_divmod instead attempts to solve the pseudo-division problem:

LC(g)^i*f = q*g + r

Where LC(g) is the leading coefficient of g (defined using lexicographic ordering) and i is a sufficiently high power to ensure that q and r can be expressed as polynomials over the integers. Before passing the result back to the C part of FORM the routine then divides the result by LC(g)^i giving the expected result for multivariate polynomials with rational coefficients.

For the univariate case FORM uses the result found in the literature:

i <= max(0,deg(f) - deg(g) + 1),

to provide a reasonably tight bound on i.

For the multivariate case it is not so straightforward to bound the power i when g is viewed as a multivariate polynomial with integer coefficients, as in FORM. I looked around in the literature for this case and also tried to consider it myself but I was not successful finding any reasonable bound on i, does anybody know where something like this may have been considered in the literature? (As an aside, if g is instead viewed as a univariate polynomial in the leading variable with polynomial coefficients in the other variables then the bound from the univariate case also applies). In this case FORM instead attempts to determine this bound by replacing LC(g) with a new variable and multiplying f by some high power, MAXPOSITIVE, of this variable. FORM then performs the full division (which now has a monic denominator due to the replacement) and looks at the minimum power of the new variable appearing in the result, MINPOW, the bound on i is then given by:

i <= max(0,MAXPOSITIVE-MINPOW).

This is implemented here. After this, the pseudo-division is performed. This algorithm can be very inefficient as the number of terms in the quotient and remainder of the division with the new variable introduced can be very much larger than the number of terms in the result of the pseudo-division problem we are actually interested in. In the cases mentioned above in this thread it is the bounding i step of the algorithm which creates a huge intermediate expression during the division with the additional variable, the actual pseudo-division takes only a few seconds.

A neat(ish) example that triggers this swell is:

#-
Symbols y,x;
AutoDeclare Symbols x1,...,x9;

L f =   (2*x*y + x + y)^400; * terms = 80601
L g =   (2*x*y + x + y);     * terms = 3
L res = (2*x*y + x + y)^399; * terms = 80200
.sort

L q = div_(f,g); * terms = 80200
L r = rem_(f,g); * terms = 0
.sort

L diff = q - res;

print diff,r;
.end

Here the division used to determine the bound on i generates millions of terms even though the input and result have only ~80k terms. I am sure it is possible to construct even more compact and dramatic examples.

It would be nice if FORM could avoid this costly division just to bound i. In order to do so, one idea, suggested to me by @KernerM, is to simply try the division with some small i then if it fails increase i until the division succeeds. Concretely, we can check if the division of any coefficient during the poly_divmod routine yields a non-zero remainder, if so the division has failed over the integers. I have implemented this idea and issued a pull request #281. Obviously this algorithm is also not perfect, we can imagine cases where:

  1. Nearly the full division is performed tens of times before a large enough i is found.
  2. i is set to a value much larger than really necessary for the problem.

Can anybody think of a better algorithm? Alternatively, how simple would it be to extend the polynomial class to support multivariate polynomials with rational coefficients, in this case there are reasonably efficient algorithms for directly dividing, see Algorithm 1 of Sparse polynomial division using a heap, Monagan, Pearce, 2011?

Finally, there are further factors which contribute to the failure of FORM's div_ or rem_ functions:

  1. as explained by @vermaseren in his comment above the size of polynomials in the C++ part of FORM is quite limited due to the use of 32-bit pointers. I think it is just about feasible to remove this limitation within the C++ part of FORM (changes ~100 lines) but it is not clear to me if these large polynomials can be passed back to the C part of FORM easily.
  2. There is no attempt to reorder variables to make the denominator monic in cases where this is possible. I am not very sure how easy this would be to implement but it seems like an obvious optimisation.
benruijl commented 6 years ago

Thanks for the analysis!

At the moment I don't have much feeling to determine the upper bound. I would also be in favor of computing in Q, but that may actually be quite a bit of work. The sparse heap algorithm you refer to is implemented in Form (at least in some form), so at least that part is covered.

Concerning point 2: this is definitely something to look into.