Closed jodavies closed 1 month ago
The problem here arises in the sorting of the polyratfun arguments, like so:
Thread 5 "tvorm-test" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff33fff640 (LWP 550611)]
MergePatches (par=par@entry=2) at sort.c:3655
3655 *rr = 0;
(gdb) bt
#0 MergePatches (par=par@entry=2) at sort.c:3655
#1 0x0000555555685196 in EndSort (B=0x7ffc4c000b70, buffer=0x7ffed8323024, par=1) at sort.c:907
#2 0x0000555555635de1 in poly_sort (B=B@entry=0x7ffc4c000b70, a=a@entry=0x7ffed832301c) at polywrap.cc:573
#3 0x000055555563b276 in poly_ratfun_add (B=B@entry=0x7ffc4c000b70, t1=t1@entry=0x7fff3fda3bec, t2=t2@entry=0x7fff6da0fde0) at polywrap.cc:681
#4 0x00005555556b947e in SortBotMerge (B=B@entry=0x7ffc4c000b70) at threads.c:4322
#5 0x00005555556ba699 in RunSortBot (dummy=<optimised out>) at threads.c:1964
#6 0x00007ffff778fac3 in start_thread (arg=<optimised out>) at ./nptl/pthread_create.c:442
#7 0x00007ffff7821850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
The crash is due to AR.CompressPointer
being null. It reminds me of #211 , but those problems were due to stage-sorting in the sub buffers, but in this example I think we did not reach that stage?
So, in this sort where should AR.CompressPointer
have been set?
This example has two failure modes, actually. The one in the previous comment is fixed by this commit, I think: https://github.com/jodavies/form/commit/243d6e49406fa9178c4be6f8927284f0b1d8d6e6
Now with higher numbers of threads, I just get the -w2
behaviour, where polyratfun argument terms end up at ground level (with 2 workers, the sortbots are not used, so that makes sense...). This failure mode does not come with any valgrind errors.
Some more details: the remaining issue (terms escaping the polyratfun argument) happens only when the run calls MergePatches
. This doesn't every run since it depends which thread receives which terms.
The problematic sort is due to poly_ratfun_add
calling poly_sort
and then needing to merge patches, from inside SplitMerge
. The stack looks like this:
#0 MergePatches (par=par@entry=2) at sort.c:3678
#1 0x000055555567ecb7 in EndSort (B=0x7fff34000b70, buffer=0x7fff1f04b024, par=1) at sort.c:942
#2 0x000055555562f575 in poly_sort (B=B@entry=0x7fff34000b70, a=a@entry=0x7fff1f04b01c) at polywrap.cc:581
#3 0x0000555555634a22 in poly_ratfun_add (B=B@entry=0x7fff34000b70, t1=t1@entry=0x7fff054911e4, t2=t2@entry=0x7fff05825e24) at polywrap.cc:689
#4 0x000055555567b946 in AddPoly (B=B@entry=0x7fff34000b70, ps1=ps1@entry=0x7ffeb6ef24d0, ps2=ps2@entry=0x7ffeb6ef2600) at sort.c:2173
#5 0x000055555567be60 in SplitMerge (B=B@entry=0x7fff34000b70, Pointer=Pointer@entry=0x7ffeb6ef24d0, number=number@entry=76) at sort.c:3376
#6 0x000055555567bcda in SplitMerge (B=B@entry=0x7fff34000b70, Pointer=Pointer@entry=0x7ffeb6ef2270, number=152) at sort.c:3362
#7 0x000055555567e364 in EndSort (B=B@entry=0x7fff34000b70, buffer=0x7fff04e5f3e8, par=par@entry=0) at sort.c:750
#8 0x00005555556b793c in RunThread (dummy=<optimised out>) at threads.c:1472
#9 0x00007ffff778fac3 in start_thread (arg=<optimised out>) at ./nptl/pthread_create.c:442
#10 0x00007ffff7821850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
So it seems in this case some buffer is set incorrectly, and the terms are written into the wrong small buffer.
Hello,
The attached script crashes in MergePatches (at sort.c L3604, with par=2, v4.3.1 tag) most of the time with tform. Increasing the sub-buffer sizes fixes this example. The input expression is "small-ish"... crash.tar.gz
In some cases, eg
tvorm -w2
I can actually get this to not crash, but produce nonsense output (i.e, lots of bare terms, not inside the polyratfun.)Thanks, Josh.