vermaseren / form

The FORM project for symbolic manipulation of very big expressions
GNU General Public License v3.0
1.12k stars 137 forks source link

[new feature] Transform sort and uniq? #158

Open tueda opened 7 years ago

tueda commented 7 years ago

This would be a bit feature creep, but it would be nice to have sort and uniq operations in the transform statement (the meaning should be clear, like the Unix commands).

Background

Although function objects in FORM are internally stored as variable-length arrays, one can use them to represent other data structures, like a set (and this is presumably the best thing one can do with FORM). Suppose we want to have a set in a function like f(1,x1,x2,x3,...), where we store some additional information as a (small) number in the first argument and the rest is for a set of variables. Sometimes one wants to merge sets, which may be written as

repeat f(n?,?a) * f(n?,?b) = f(n,?a,?b);

but then one needs to canonicalize a set such that there are no duplicated elements. This might be programmed as

S x1,...,x10;
CF f;
L F = f(1,x1,...,x10,x1,...,x10,x1,...,x10) *
      f(2,x1,...,x10,x1,...,x10,x1,...,x10);

symmetrize f;
repeat id f(?a,x1?,x1?,?b) = f(?a,x1,?b);

P;
.end

The symmetrization and pattern matching only with neighbours reduce the cost of matching from O(n^2) to O(n) and actually it gives the correct result:

   F =
      f(1,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10)*f(2,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10);

But the above program has two tricky things:

  1. symmetrize doesn't have any option to specify the range of arguments like (2,last). Instead, the above code uses symmetrize for all arguments, with an abuse of the fact that a small number comes before symbols in the FORM sorting system with the default ordering.
  2. If there are many products of f in one term, the pattern matcher needs to try for matching many times also for ones that have been canonicalized. One could do the canonicalization one by one, id once, f(?a) = f1(?a), but it has some overhead to do so.

They are immediately solved if we have sort and uniq.

Example

If we introduce sort and uniq in the transform statement, the above canonicalization can be written in one line:

transform f, sort(2,last), uniq(2,last);

I prefer to have two operations sort and uniq rather than one operation sortuniq because sort itself is a basic operation that may be useful (uniq might be similar), but it would be nice if FORM cleverly recognizes sort(2,last) and uniq(2,last) to use an optimized algorithm for that combination.

tueda commented 7 years ago

25bfe3c35c introduced dedup of the transform statement, removing duplicate arguments with keeping the ordering. Similar to sort and then uniq, though the result is different (not sorted).