vermaseren / form

The FORM project for symbolic manipulation of very big expressions
GNU General Public License v3.0
1.14k stars 136 forks source link

What is the fastest equivalent of Foreach in FORM? #187

Closed tueda closed 7 years ago

tueda commented 7 years ago

"Modern" programming languages (C++, Java, Bourne Shell, AWK, Perl, Python, Ruby, Rust, Mathematica etc.) have Foreach-like loop statements that iterate over a list or group of objects:

for (const auto &element : my_list) {
  // do something for each element...
}

Suppose we want to do a similar thing for terms stored in a $-variable. For definiteness, we consider finding one element matching to some condition. Is there any good way to do this?

If there would be a special do-in statement and breakdo statement, I could write

$list = x*y + y*z + z*x;
$result = 0;
do $n in $list;
  if (<some condition>);
    $result = <result>;
    breakdo;
  endif;
enddo;

But for now I may write

$list = x*y + y*z + z*x;
$result = 0;
$list_copy = $list;
label loop;
$n = termsin_($list_copy);
if ($n);
  $element = firstterm_($list_copy);
  if (<some condition>);
    $result = <result>;
  else;
    $list_copy = $list_copy - $element;
    goto loop;
  endif;
endif;

which looks rather miserable, especially when $list has many terms ~ O(10^3-10^4). Probably a code with a do-loop will be messier.

vermaseren commented 7 years ago

Hi,

I guess the easiest way to do this is to extend the #do instruction a bit. At the moment it knows

do i = expression

and then it goes through the terms of the expression. What seems to be needed is that it can also deal with

do i = $variable

unless I understood this wrong.

Jos

On 19 mei 2017, at 15:01, Takahiro Ueda notifications@github.com wrote:

"Modern" programming languages (C++, Java, Bourne Shell, AWK, Perl, Python, Ruby, Rust, Mathematica etc.) have Foreach https://en.wikipedia.org/wiki/Foreach_loop-like loop statements that iterate over a list or group of objects:

for (const auto &element : my_list) { // do something for each element... } Suppose we want to do a similar thing for terms stored in a $-variable. For definiteness, we consider finding one element matching to some condition. Is there any good way to do this?

If there would be a special do-in statement and breakdo statement, I could write

$list = xy + yz + z*x; $result = 0; do $n in $list; if (); $result = ; breakdo; endif; enddo; But for now I may write

$list = xy + yz + z*x; $result = 0; $listcopy = $list; label loop; $n = termsin($listcopy); if ($n); $element = firstterm($list_copy); if (); $result = ; else; $list_copy = $list_copy - $element; goto loop; endif; endif; which looks rather miserable, especially when $list has many terms ~ O(10^3-10^4). Probably a code with a do-loop will be messier.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vermaseren/form/issues/187, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLxEtu4qNFLU6JB06Z5M6qdPb-uip5dks5r7ZKagaJpZM4Ngfke.

tueda commented 7 years ago

Hi Jos, unfortunately the items (and their internal orders) in $list depend on each term in an expression.

tueda commented 7 years ago

This direction may be not a FORM way; the following code is too slow. Maybe I have to give up and go for another (old and hard) way...

#-
S x,x1,...,x20;
CF f,distrib;

L F = ranperm_(f,x1,...,x20);
id f(?a$args) = 1;

P "input: %$", $args;

* The number of the arguments.
$n = nargs_($args);

* Introduce the order as f(n,x).
$tmp1 = f(f(1),$args);
inside $tmp1;
  repeat id f(?a,f(x1?),x2?,?b) = f(?a,f(x1,x2),f(x1+1),?b);
  if (match(f(?a$args,x?)) == 0) exit;
endinside;

* Try for all combinations in lexicographical order: k-elements from n.
$k = 1;
label loop1;
  if ($k < $n);
    $list = distrib(1,$k,f,dummy_,$args) * replace_(distrib,distrib_);
label loop2;
      $tmp1 = termsin_($list);
      if ($tmp1);
*       Get a combination.
        $tmp1 = firstterm_($list);
        $list = $list - $tmp1;
        inside $tmp1;
          argument f;
            id f(x1?,x2?) = x2;
          endargument;
          if (match(f(?a$elem)) == 0) exit;
          P "%$", $elem;
*         if (<some condition>);
*           <break the outer loop>;
*         endif;
        endinside;
        goto loop2;
      endif;
    $k = $k + 1;
    goto loop1;
  endif;
.end
tueda commented 7 years ago

FYI: This is not a FORM way (never finishes), but other languages can do in this way. Timing comparison: the same thing can be written in Python as

import itertools
import random

args = ['x' + str(i) for i in range(1, 20 + 1)]
random.shuffle(args)

print('input: {0}'.format(args))

for k in range(1, len(args) + 1):
    for e in itertools.combinations(args, k):
        print(e)

taking 1.2 seconds if the I/O is ignored (>/dev/null), though you may say it is cheating because of the usage of a generator (not constructing a whole list). For Mathematica,

args = Table[Symbol["x" <> ToString[i]], {i, 1, 20}];
args = RandomSample[args];

Print["input: " <> ToString[args]];

allargs = Subsets[args, {1, Length[args]}];
Do[Print[e], {e, allargs}];

takes 10.8 seconds (again the I/O ignored).

vermaseren commented 7 years ago

Hi Takahiro,

So indeed I did not understand it properly. Hence we need an equivalent in the do statement.

Cheers

Jos

On 19 mei 2017, at 16:18, Takahiro Ueda notifications@github.com wrote:

Hi Jos, unfortunately the items (and their internal orders) in $list depends on each term in an expression.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vermaseren/form/issues/187#issuecomment-302715164, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLxEn_ElLD2yZuLOPZwzUxqxxy5pHGpks5r7aS0gaJpZM4Ngfke.

tueda commented 7 years ago

Hi Jos, rethinking this problem led to the following code, without a kind of do-in statement:

#-
Off stats;
S x,x1,...,x20;
S l,s,n;
CF f,f1,f2;

L F = f(x1,...,x20);
*id f(?a) = ranperm_(f,?a);
id f(?a$args) = 1;

P "input: %$", $args;

$m = nargs_($args);

* All combinations of n (= 1,...,m) elements from m elements.
do $n = 1, $m;
* Generate vectors (0,0,...,0), (0,0,...,1), ...
  $vec = f($n,$m-$n);
  $loop = 1;
  while ($loop);
    inside $vec;
*     Fill by 0's.
      repeat id f(l?pos_,s?,?a) = f(l-1,s,?a,0);
*     Now a vector is prepared.
      if (match(f(0,s?,?a$a)) == 0) exit;
*     Convert the generated vector to a combination in the original arguments.
      $tmp = f() * f1($a) * f2($args);
      inside $tmp;
        repeat;
          repeat id f1(n?pos_,?a) * f2(x?,?b) = f1(n-1,?a) * f2(?b);
          id f(?a) * f1(0,?b) * f2(x?,?c) = f(?a,x) * f1(?b) * f2(?c);
        endrepeat;
        if (match(f(?a$a)) == 0) exit;
        P "%$", $a;
      endinside;
*     Increment the right-most index.
      id ifmatch->1, f(0,s?pos_,?a,x?) = f(0,s-1,?a,x+1);
      repeat id f(l?,s?,?a,0) = f(l+1,s,?a);
      id ifmatch->1, f(l?,0,?a,x1?,x2?pos_) = f(l+1,x2-1,?a,x1+1);
*     Finished.
      $loop = 0;
label 1;
*     Occasionally exit the inside environment to avoid workspace overflows.
    endinside;
  endwhile;
enddo;

.end

taking 56 seconds without I/O.

Update: Using transform dropargs seems to be faster (39 seconds).

#-
Off stats;
S x,x1,...,x20;
S l,s,n;
CF f,f1,f2;

L F = f(x1,...,x20);
*id f(?a) = ranperm_(f,?a);
id f(?a$args) = 1;

P "input: %$", $args;

$m = nargs_($args);

do $n = 1, $m;
* Generate vectors (0,0,...,0), (0,0,...,1), ...
  $vec = f($n,$m-$n);
  $loop = 1;
  while ($loop);
    inside $vec;
*     Fill 0's.
      repeat id f(l?pos_,s?,?a) = f(l-1,s,?a,0);
*     Now a vector is prepared.
      if (match(f(0,s?,?a$a)) == 0) exit;
*     Convert the generated vector to a combination in the original arguments.
      $tmp = f() * f1($a) * f2($args);
      inside $tmp;
        repeat;
          id ifnomatch->1, f1(n?pos_$nn,?a) = f1(0,?a);
          transform f2,dropargs(1,$nn);
label 1;
          id f(?a) * f1(0,?b) * f2(x?,?c) = f(?a,x) * f1(?b) * f2(?c);
        endrepeat;
        if (match(f(?a$a)) == 0) exit;
        P "%$", $a;
      endinside;
*     Increment the right-most index.
      id ifmatch->2, f(0,s?pos_,?a,x?) = f(0,s-1,?a,x+1);
      repeat id f(l?,s?,?a,0) = f(l+1,s,?a);
      id ifmatch->2, f(l?,0,?a,x1?,x2?pos_) = f(l+1,x2-1,?a,x1+1);
*     Finished.
      $loop = 0;
label 2;
*     Occasionally exit the inside environment to avoid workspace overflows.
    endinside;
  endwhile;
enddo;

.end
vermaseren commented 7 years ago

Hi Takahiro,

What is wrong with this program?

-

S x,x1,...,x20; CF f,f1,f2;

L F = f(x1,...,x20); term; id f(?a) = sum(x,1,nargs(?a),f(x,?a)); id f(x?,?a) = distrib(1,x,f1,f2,?a); id f2(?a) = 1; id f1(?a) = f(nargs(?a),?a); sort; id f(x?,?a$a) = 1; Print +f "%$",$a; endterm; .end

or do I still not understand the essence?

Cheers

Jos

On 19 mei 2017, at 21:29, Takahiro Ueda notifications@github.com wrote:

Hi Jos, rethinking this problem led to the following code, without a kind of do-in statement:

-

Off stats; S x,x1,...,x20; S l,s,n; CF f,f1,f2;

L F = f(x1,...,x20); *id f(?a) = ranperm_(f,?a); id f(?a$args) = 1;

P "input: %$", $args;

$m = nargs_($args);

  • All combinations of n (= 1,...,m) elements from m elements. do $n = 1, $m;
  • Generate vectors (0,0,...,0), (0,0,...,1), ... $vec = f($n,$m-$n); $loop = 1; while ($loop); inside $vec;
  • Fill by 0's. repeat id f(l?pos_,s?,?a) = f(l-1,s,?a,0);
  • Now a vector is prepared. if (match(f(0,s?,?a$a)) == 0) exit;
  • Convert the generated vector to a combination in the original arguments. $tmp = f() f1($a) f2($args); inside $tmp; repeat; repeat id f1(n?pos_,?a) f2(x?,?b) = f1(n-1,?a) f2(?b); id f(?a) f1(0,?b) f2(x?,?c) = f(?a,x) f1(?b) f2(?c); endrepeat; if (match(f(?a$a)) == 0) exit; P "%$", $a; endinside;
  • Increment the right-most index. id ifmatch->1, f(0,s?pos,?a,x?) = f(0,s-1,?a,x+1); repeat id f(l?,s?,?a,0) = f(l+1,s,?a); id ifmatch->1, f(l?,0,?a,x1?,x2?pos) = f(l+1,x2-1,?a,x1+1);
  • Finished. $loop = 0; label 1;
  • Occasionally exit the inside environment to avoid workspace overflows. endinside; endwhile; enddo;

.end taking 56 seconds without I/O.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vermaseren/form/issues/187#issuecomment-302791357, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLxEiyhc5hp7D0hJ4LVArVCTrkKAi-Oks5r7e2pgaJpZM4Ngfke.

tueda commented 7 years ago

Hi Jos,

The problem is that the order in the list depends on each term, say

L F = f(x1,x2,x3,x4,x5)*g(...) + f(x3,x2,x5,x1,x4)*g(...) + other many terms

So the combinations for the first term should be

x1
x2
x3
x4
x5
x1,x2
x1,x3
x1,x4
x1,x5
...

while for the second term

x3
x2
x5
x1
x4
x3,x2
x3,x5
x3,x4
x3,x1
...

The orders are chosen by some reason, depending on the arguments of the other function g. For these combinations, I want to find one element (the first one) that matches to some criteria, which also depend on g. When one element is found, I can exit the loop, though in the worst case I have to iterate over all combinations.

vermaseren commented 7 years ago

Hi Takahiro,

If you leave out the sort; statement it probably does what you want. That leaves out the sorting, but keeps the order of first one, then two objects etc.

Jos

On 19 mei 2017, at 22:01, Takahiro Ueda notifications@github.com wrote:

Hi Jos,

The problem is that the order in the list depends on each term, say

L F = f(x1,x2,x3,x4,x5)g(...) + f(x3,x2,x5,x1,x4)g(...) + other many terms So the combinations for the first term should be

x1 x2 x3 x4 x5 x1,x2, x1,x3, x1,x4, x1,x5, ... while for the second term

x3 x2 x5 x1 x4 x3,x2, x3,x5, x3,x4, x3,x1, ... The orders are chosen by some reason, depending on the arguments of the other function g. For these combinations, I want to find one element (the first one) that matches to some criteria, which also depend on g.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vermaseren/form/issues/187#issuecomment-302797966, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLxEmnAk9We8s6ytWkfCM4H9cp-Tbm5ks5r7fUHgaJpZM4Ngfke.

vermaseren commented 7 years ago

OK, I tried a bit more and seem to understand it now a bit better. You want it to stop. Hence you do not want to generate the full 2^20 terms. As soon as a certain condition is fulfilled it should jump out and distrib_ cannot do that. Am I right now?

Jos

On 19 mei 2017, at 22:01, Takahiro Ueda notifications@github.com wrote:

Hi Jos,

The problem is that the order in the list depends on each term, say

L F = f(x1,x2,x3,x4,x5)g(...) + f(x3,x2,x5,x1,x4)g(...) + other many terms So the combinations for the first term should be

x1 x2 x3 x4 x5 x1,x2, x1,x3, x1,x4, x1,x5, ... while for the second term

x3 x2 x5 x1 x4 x3,x2, x3,x5, x3,x4, x3,x1, ... The orders are chosen by some reason, depending on the arguments of the other function g. For these combinations, I want to find one element (the first one) that matches to some criteria, which also depend on g.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vermaseren/form/issues/187#issuecomment-302797966, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLxEmnAk9We8s6ytWkfCM4H9cp-Tbm5ks5r7fUHgaJpZM4Ngfke.

tueda commented 7 years ago

Yes, better to exit the loop. Maybe something like

do $n = 1, $m;
  $found = 0;
  term;
    id f(a?) = distrib_(1,$n,f,dummy_,?a);
    if ($found == 0);
      ...
    endif;
  endterm;
  if ($found) $n = $m + 1;
enddo;

would be fast enough. (Depends on the speed of distrib_ vs. complicated repeat-id).

tueda commented 7 years ago

Jos, you are right. Generating the combinations by distrib_ without sorting is much faster.

#-
Off stats;
S x,x1,...,x20;
CF f,distrib;

L F = f(x1,...,x20);
id f(?a) = ranperm_(f,?a);
id f(?a$args) = 1;

P "input: %$", $args;

$m = nargs_($args);

$found = 0;
do $n = 1, $m;
  $tmp = 1;
  inside $tmp;
    multiply distrib(1,$n,f,dummy_,$args) * replace_(distrib,distrib_);
    if (match(f(?a$a)) == 0) exit;
    P "%$", $a;
*    if ($found == 0);
*      if ($n == 3) $found = 1;  * Test
*    endif;
  endinside;
  if ($found);
    $n = $m;
  endif;
enddo;

.end

takes only 3 seconds. Thanks!

vermaseren commented 7 years ago

Effectively what you want is to have some flag(s) that can stop further foliation at a point in the tree. Like (this is science fiction) ….. = distrib_(…..); if ( whatever ) setstopflag distrib; This might be implementable and possibly useful for more things

Jos

On 19 mei 2017, at 22:19, Takahiro Ueda notifications@github.com wrote:

Yes, better to exit the loop. Maybe something like

do $n = 1, $m; $found = 0; term; id f(x?) = distrib(1,$n,f,dummy,?a); if ($found == 0); ... endif; endterm; if ($found) $n = $m + 1; enddo; would be fast enough. (Depends on the speed of distrib_ vs. complicated repeat-id).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vermaseren/form/issues/187#issuecomment-302801640, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLxEuD-6HrwKt9KmJhLPsf86YGOKFERks5r7fk7gaJpZM4Ngfke.