vermaseren / form

The FORM project for symbolic manipulation of very big expressions
GNU General Public License v3.0
982 stars 118 forks source link

Crash in MergePatches #468

Closed jodavies closed 1 month ago

jodavies commented 4 months ago

Hello,

The attached script crashes in MergePatches (at sort.c L3604, with par=2, v4.3.1 tag) most of the time with tform. Increasing the sub-buffer sizes fixes this example. The input expression is "small-ish"... crash.tar.gz

In some cases, eg tvorm -w2 I can actually get this to not crash, but produce nonsense output (i.e, lots of bare terms, not inside the polyratfun.)

Thanks, Josh.

jodavies commented 4 months ago

The problem here arises in the sorting of the polyratfun arguments, like so:

Thread 5 "tvorm-test" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff33fff640 (LWP 550611)]
MergePatches (par=par@entry=2) at sort.c:3655
3655            *rr = 0;
(gdb) bt
#0  MergePatches (par=par@entry=2) at sort.c:3655
#1  0x0000555555685196 in EndSort (B=0x7ffc4c000b70, buffer=0x7ffed8323024, par=1) at sort.c:907
#2  0x0000555555635de1 in poly_sort (B=B@entry=0x7ffc4c000b70, a=a@entry=0x7ffed832301c) at polywrap.cc:573
#3  0x000055555563b276 in poly_ratfun_add (B=B@entry=0x7ffc4c000b70, t1=t1@entry=0x7fff3fda3bec, t2=t2@entry=0x7fff6da0fde0) at polywrap.cc:681
#4  0x00005555556b947e in SortBotMerge (B=B@entry=0x7ffc4c000b70) at threads.c:4322
#5  0x00005555556ba699 in RunSortBot (dummy=<optimised out>) at threads.c:1964
#6  0x00007ffff778fac3 in start_thread (arg=<optimised out>) at ./nptl/pthread_create.c:442
#7  0x00007ffff7821850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

The crash is due to AR.CompressPointer being null. It reminds me of #211 , but those problems were due to stage-sorting in the sub buffers, but in this example I think we did not reach that stage?

So, in this sort where should AR.CompressPointer have been set?

jodavies commented 1 month ago

This example has two failure modes, actually. The one in the previous comment is fixed by this commit, I think: https://github.com/jodavies/form/commit/243d6e49406fa9178c4be6f8927284f0b1d8d6e6

Now with higher numbers of threads, I just get the -w2 behaviour, where polyratfun argument terms end up at ground level (with 2 workers, the sortbots are not used, so that makes sense...). This failure mode does not come with any valgrind errors.

jodavies commented 1 month ago

Some more details: the remaining issue (terms escaping the polyratfun argument) happens only when the run calls MergePatches. This doesn't every run since it depends which thread receives which terms.

The problematic sort is due to poly_ratfun_add calling poly_sort and then needing to merge patches, from inside SplitMerge. The stack looks like this:

#0  MergePatches (par=par@entry=2) at sort.c:3678
#1  0x000055555567ecb7 in EndSort (B=0x7fff34000b70, buffer=0x7fff1f04b024, par=1) at sort.c:942
#2  0x000055555562f575 in poly_sort (B=B@entry=0x7fff34000b70, a=a@entry=0x7fff1f04b01c) at polywrap.cc:581
#3  0x0000555555634a22 in poly_ratfun_add (B=B@entry=0x7fff34000b70, t1=t1@entry=0x7fff054911e4, t2=t2@entry=0x7fff05825e24) at polywrap.cc:689
#4  0x000055555567b946 in AddPoly (B=B@entry=0x7fff34000b70, ps1=ps1@entry=0x7ffeb6ef24d0, ps2=ps2@entry=0x7ffeb6ef2600) at sort.c:2173
#5  0x000055555567be60 in SplitMerge (B=B@entry=0x7fff34000b70, Pointer=Pointer@entry=0x7ffeb6ef24d0, number=number@entry=76) at sort.c:3376
#6  0x000055555567bcda in SplitMerge (B=B@entry=0x7fff34000b70, Pointer=Pointer@entry=0x7ffeb6ef2270, number=152) at sort.c:3362
#7  0x000055555567e364 in EndSort (B=B@entry=0x7fff34000b70, buffer=0x7fff04e5f3e8, par=par@entry=0) at sort.c:750
#8  0x00005555556b793c in RunThread (dummy=<optimised out>) at threads.c:1472
#9  0x00007ffff778fac3 in start_thread (arg=<optimised out>) at ./nptl/pthread_create.c:442
#10 0x00007ffff7821850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

So it seems in this case some buffer is set incorrectly, and the terms are written into the wrong small buffer.