vermaseren / form

The FORM project for symbolic manipulation of very big expressions
GNU General Public License v3.0
1.01k stars 120 forks source link

Improve pattern matching with sets #254

Open tueda opened 6 years ago

tueda commented 6 years ago

Because pattern matching with user-defined sets uses linear search to find whether or not a set contains the target object, it is rather slow. The timing for the following benchmark program is clearly proportional to N:

#define N "2000"
#define M "1000"
#define L "100"
CF f;
Auto V p,v;
Set vecs: v1,...,v`N';
L F = <f(p1,p2,p3,p4,p5)>+...+<f(p`M',p{`M'+1},p{`M'+2},p{`M'+3},p{`M'+4})>;
#do i=1,`L'
  id f(?a,p?vecs,?b) = 0;
#enddo
.end

This can be improved on average O(log(N)) or O(1) by binary trees or hash tables, respectively. For example, if there are many elements in a set, more than some threshold, construct a tree or table at compile-time and then use it at run-time.

Another (but a simple) optimization would be: in many cases, elements in a set have continuous numbers internally. In such cases just check if the target object is in the range.

vermaseren commented 6 years ago

The best solution seems to be to order the elements in the set at compile time and have a second array with its number in the set. That way the binary search is log(N). For indexing purposes the second array is needed. Not for simple pattern matching. The only complication is a set with a mix of numbers and symbols and ranges of numbers or things like >3. Those can be placed in special positions. Same with wildcards.

On 14 Dec 2017, at 20:07, Takahiro Ueda notifications@github.com wrote:

Because pattern matching with user-defined sets uses linear search to find whether or not a set contains the target object, it is rather slow. The timing for the following benchmark program is clearly proportional to N:

define N "2000"

define M "1000"

define L "100"

CF f; Auto V p,v; Set vecs: v1,...,vN'; L F = <f(p1,p2,p3,p4,p5)>+...+<f(pM',p{M'+1},p{M'+2},p{M'+3},p{M'+4})>;

do i=1,`L'

id f(?a,p?vecs,?b) = 0;

enddo

.end This can be improved on average O(log(N)) or O(1) by binary trees or hash tables, respectively. For example, if there are many elements in a set, more than some threshold, construct a tree or table at compile-time and then use it at run-time.

Another (but a simple) optimization would be: in many cases, elements in a set have continuous numbers internally. In such cases just check if the target object is in the range.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vermaseren/form/issues/254, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLxEm_4R_9MAoptKeR9sf0x34t3MmeDks5tAXHdgaJpZM4RChlf.