vermaseren / form

The FORM project for symbolic manipulation of very big expressions
GNU General Public License v3.0
985 stars 118 forks source link

Better errors when (Sub)SmallExtension is filled. #513

Open jodavies opened 1 month ago

jodavies commented 1 month ago

Previously, FORM warns only about SmallExtension, even if it is a sub-buffer sort.

This prompts the user to adjust the wrong buffer in the setup.

Fixes #512

jodavies commented 1 month ago

It looks like parform hangs after printing the error here.

coveralls commented 1 month ago

Coverage Status

coverage: 50.069% (+0.07%) from 49.999% when pulling 710d340774fb664db37bdb81f096f9d03e7229dd on jodavies:full-subsmallextension into 83e3d4185efca2e5938c665a6df9d67d6d9492ca on vermaseren:master.

tueda commented 1 month ago

Hmm...For some reason tform on macos-11 doesn't print "Please increase SubSmallExtension setup parameter." before terminating.

jodavies commented 1 month ago

I have no way to test if it crashes for some other reason.

tueda commented 1 month ago

This may be irrelevant, but on MacOS (Sonoma 14.4.1, Apple M2, Apple clang version 15.0.0), I got some malloc errors for Issue512_2:

./build/sources/vorm -D TEST=Issue512_2 check/fixes.frm 
FORM 5.0.0-beta.1 (May  4 2024, v5.0.0-beta.1-57-g1a75289)  Run: Sat May 11 17:47:37 2024 
    #ifndef `TEST' 
      #message Use -D TEST=XXX 
      #terminate 
    #else 
      #include `NAME_' # `TEST' 
    #- 

Time =       0.00 sec    Generated terms =          1 
            test         Terms in output =          1 
                         Bytes used      =         20 
Please increase SubSmallExtension setup parameter. 
Program terminating at check/fixes.frm Line 2722 -->  
  0.12 sec out of 0.13 sec 
vorm(25084,0x1fcafbac0) malloc: *** error for object 0x300000001: pointer being freed was not allocated 
vorm(25084,0x1fcafbac0) malloc: *** set a breakpoint in malloc_error_break to debug 

The corresponding Valgrind run on Linux is:

valgrind ./build/sources/vorm -D TEST=Issue512_2 check/fixes.frm
FORM 5.0.0-beta.1 (May  4 2024, v5.0.0-beta.1-57-g1a75289)  Run: Sat May 11 17:55:02 2024
...
Time =       0.05 sec    Generated terms =          1
            test         Terms in output =          1
                         Bytes used      =         20
==118620== Invalid write of size 4
==118620==    at 0x20C263: GarbHand (sort.c:3599)
==118620==    by 0x20CA39: AddPoly (sort.c:2199)
==118620==    by 0x20CD2F: SplitMerge (sort.c:3345)
==118620==    by 0x20CBA2: SplitMerge (sort.c:3330)
==118620==    by 0x20CBA2: SplitMerge (sort.c:3330)
==118620==    by 0x20CBA2: SplitMerge (sort.c:3330)
==118620==    by 0x20CBA2: SplitMerge (sort.c:3330)
==118620==    by 0x20CBBB: SplitMerge (sort.c:3331)
==118620==    by 0x20CBBB: SplitMerge (sort.c:3331)
==118620==    by 0x20CBA2: SplitMerge (sort.c:3330)
==118620==    by 0x20E7EB: StoreTerm (sort.c:4440)
==118620==    by 0x1DF55F: Generator (proces.c:3366)
==118620==  Address 0x4bf8a5b8 is 0 bytes after a block of size 22,857,080 alloc'd
==118620==    at 0x4849839: malloc (in /home/linuxbrew/.linuxbrew/Cellar/valgrind/3.22.0/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==118620==    by 0x22C03F: Malloc1 (tools.c:2322)
==118620==    by 0x206335: AllocSort (setfile.c:916)
==118620==    by 0x209261: NewSort (sort.c:641)
==118620==    by 0x126FDB: CoIdExpression (comexpr.c:824)
==118620==    by 0x12765A: CoId (comexpr.c:398)
==118620==    by 0x13ADF2: CompileStatement (compiler.c:658)
==118620==    by 0x1D76E1: PreProcessor (pre.c:1123)
==118620==    by 0x214BAE: main (startup.c:1690)
==118620==
Please increase SubSmallExtension setup parameter.
Program terminating at check/fixes.frm Line 2722 -->
==118620== Invalid read of size 4
==118620==    at 0x21443B: Terminate (startup.c:1792)
==118620==    by 0x4BF78A6F: ???
==118620==    by 0x20CA86: AddPoly (sort.c:2249)
==118620==    by 0x20CD2F: SplitMerge (sort.c:3345)
==118620==    by 0x20CBA2: SplitMerge (sort.c:3330)
==118620==    by 0x20CBA2: SplitMerge (sort.c:3330)
==118620==    by 0x20CBA2: SplitMerge (sort.c:3330)
==118620==    by 0x20CBA2: SplitMerge (sort.c:3330)
==118620==    by 0x20CBBB: SplitMerge (sort.c:3331)
==118620==    by 0x20CBBB: SplitMerge (sort.c:3331)
==118620==    by 0x20CBA2: SplitMerge (sort.c:3330)
==118620==    by 0x20E7EB: StoreTerm (sort.c:4440)
==118620==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==118620==
  6.78 sec out of 6.79 sec
==118620== Invalid free() / delete / delete[] / realloc()
==118620==    at 0x484C89F: free (in /home/linuxbrew/.linuxbrew/Cellar/valgrind/3.22.0/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==118620==    by 0x22C0B0: M_free (tools.c:2449)
==118620==    by 0x1404D8: ClearSortGZIP (compress.c:731)
==118620==    by 0x20FCDC: CleanUpSort (sort.c:4737)
==118620==    by 0x2140EB: CleanUp (startup.c:1710)
==118620==    by 0x21440E: Terminate (startup.c:1850)
==118620==    by 0x214A53: onErrSig (startup.c:1558)
==118620==    by 0x4D8E51F: ??? (in /usr/lib/x86_64-linux-gnu/libc.so.6)
==118620==    by 0x21443A: Terminate (startup.c:1790)
==118620==    by 0x4BF78A6F: ???
==118620==    by 0x20CA86: AddPoly (sort.c:2249)
==118620==    by 0x20CD2F: SplitMerge (sort.c:3345)
==118620==  Address 0x1f00000098 is not stack'd, malloc'd or (recently) free'd
==118620==
==118620== Invalid free() / delete / delete[] / realloc()
==118620==    at 0x484C89F: free (in /home/linuxbrew/.linuxbrew/Cellar/valgrind/3.22.0/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==118620==    by 0x22C0B0: M_free (tools.c:2449)
==118620==    by 0x1404E8: ClearSortGZIP (compress.c:732)
==118620==    by 0x20FCDC: CleanUpSort (sort.c:4737)
==118620==    by 0x2140EB: CleanUp (startup.c:1710)
==118620==    by 0x21440E: Terminate (startup.c:1850)
==118620==    by 0x214A53: onErrSig (startup.c:1558)
==118620==    by 0x4D8E51F: ??? (in /usr/lib/x86_64-linux-gnu/libc.so.6)
==118620==    by 0x21443A: Terminate (startup.c:1790)
==118620==    by 0x4BF78A6F: ???
==118620==    by 0x20CA86: AddPoly (sort.c:2249)
==118620==    by 0x20CD2F: SplitMerge (sort.c:3345)
==118620==  Address 0x41fffffff0 is not stack'd, malloc'd or (recently) free'd
==118620==
==118620== Syscall param unlink(pathname) points to unaddressable byte(s)
==118620==    at 0x4E620AB: unlink (syscall-template.S:120)
==118620==    by 0x4DADE6C: remove (remove.c:33)
==118620==    by 0x20FCF4: CleanUpSort (sort.c:4741)
==118620==    by 0x2140EB: CleanUp (startup.c:1710)
==118620==    by 0x21440E: Terminate (startup.c:1850)
==118620==    by 0x214A53: onErrSig (startup.c:1558)
==118620==    by 0x4D8E51F: ??? (in /usr/lib/x86_64-linux-gnu/libc.so.6)
==118620==    by 0x21443A: Terminate (startup.c:1790)
==118620==    by 0x4BF78A6F: ???
==118620==    by 0x20CA86: AddPoly (sort.c:2249)
==118620==    by 0x20CD2F: SplitMerge (sort.c:3345)
==118620==    by 0x20CBA2: SplitMerge (sort.c:3330)
==118620==  Address 0x5 is not stack'd, malloc'd or (recently) free'd

Something wrong seems to be happening before calling Terminate()...

jodavies commented 1 month ago

I'll have a look at it. It might be related to https://github.com/vermaseren/form/issues/468#issuecomment-2104900922

jodavies commented 1 month ago

The valgrind error here is due to GarbHand making no attempt to avoid writing out of bounds. The old content of the small buffer, plus the stuff in SplitScratch, is copied back to the small buffer here: https://github.com/vermaseren/form/blob/0112aa6b21b48616575c6cd205ec00c7797dbc93/sources/sort.c#L3581-L3611

The function certainly knows things have gone wrong; if you define TESTGARB it prints "We are in deep trouble", and form is certainly going to Terminate after return. But this can be made "valgrind clean" by comparing s2+i with sTop2 in each iteration of the while loops (maybe just the second: the original content of the small buffer should certainly fit again), before the NCOPY.

That we get a valgrind error here means that the content of the old small buffer + SplitScratch has blown through the small extension, and also through the PObuffer, which follows the small buffer in the allocation of AllocSort.

jodavies commented 1 month ago

(maybe just the second: the original content of the small buffer should certainly fit again)

This is actually not the case here; it is the first loop which causes the problem. I don't understand why, though. The general idea is,

  1. In AddCoef and AddPoly, try to put a new term into the small buffer
  2. If it doesn't fit within the SmallExtension (sTop2), call GarbHand
  3. Check total size of existing sPointer and SplitScratch terms. If they fit in the spare space in the LargeBuffer use that, otherwise allocate space
  4. Copy the terms of sPointer and SplitScratch to the temporary space, skipping holes.
  5. Copy terms back to the SmallBuffer+Extension, (again skipping holes?)

I would expect that the original terms pointed to by sPointer should be guaranteed to fit in the SmallBuffer+Extension?

jodavies commented 3 weeks ago

I added some prints to try to work out what is going on here. For a "working" call of GarbHand, I get something like the following:

sort.c AddPoly 2196, make new term (S->sFill 932624) (new term 7106) --> 939730
sort.c AddPoly 2196, make new term (S->sFill 939730) (new term 9192) --> 948922
sort.c AddPoly 2196, make new term (S->sFill 948922) (new term 12614) --> 961536
------Garbage collection-------
in:  S->sFill = 948922, S->sTop2 = 960008
total = 926812, nterms = 0, SmallSize = 640004, SmallESize = 960008
out:  S->sFill = 926812, S->sTop2 = 960008

so as soon as sFill plus the size of the new term is >= sTop2, GarbHand is called which clears out the gaps in the small buffer (this is all during a SplitMerge so terms are being merged a lot).

At the problematic call:

sort.c AddPoly 2196, make new term (S->sFill 959886) (new term 72) --> 959958
sort.c AddPoly 2196, make new term (S->sFill 959958) (new term 48) --> 960006
sort.c AddPoly 2196, make new term (S->sFill 960006) (new term 48) --> 960054
------Garbage collection-------
in:  S->sFill = 960006, S->sTop2 = 960008
total = 982988, nterms = 0, SmallSize = 640004, SmallESize = 960008
==96593== Invalid write of size 4
==96593==    at 0x221EB0: GarbHand (sort.c:3597)
==96593==    by 0x22277E: AddPoly (sort.c:2206)
==96593==    by 0x222B07: SplitMerge (sort.c:3321)

Here "total" is larger than sTop2 (I changed the print compared to the master branch, which prints 2*total for some reason). But sFill was 960006 before the call, and it was only trying to add a term of 48 words. So I am not sure how a "total" of 982988 is possible, unless the sPointer array contains terms unaccounted for in sFill -- as I understand, this should not be the case.

I am testing on top of #529 so writes over sTop2 are an immediate crash, unlike in the master branch which needs a write beyond PObuffer also (the test rebased onto master actually doesn't cause a valgrind error anymore, due to the increased value of SIOsize from #515).

jodavies commented 3 weeks ago

The issue is that "NEWSPLITMERGE" is not strict about setting pointers to zero, for terms which have merged (or cancel, but in this test nothing cancels). If we use the old code, GarbHand correctly computes the size of the terms which remain.

Maybe this was done for speed, but what is means is that the "new" code fails in some cases where the old succeeds because GarbHand cannot work properly. If you run Issue512_2 with a SubSmallExtension of 5090K, it will run to completion with the old code, but crash with the new because it determines (incorrectly) that SubSmallExtension is too small.. The Timsort code is commented for this tests, but it doesn't change anything.

jodavies commented 3 weeks ago

@tueda can you test this on your mac?

In case it is needed again, I used the following print code (and diff) in the two versions of SplitMerge, to make sure the pointer array now looks the same in both cases.

// Print current state
printf("%4ld: ", number);
for (int ii = 0; ii < S->sTerms; ii++) {
       if ( (S->sPointer)[ii] ) {
               printf("X");
       }
       else {
               printf(".");
       }
}
printf("\n");
fflush(stdout);
tueda commented 3 weeks ago

Now it works on my MacBook Air.

jodavies commented 3 weeks ago

OK, that is good. I have run a few mincer benchmarks and did not see any performance impact. I think in FORM the term comparison is so expensive that no amount of pointer operations during the sort is noticable.

I have a "re-written" version of SplitMerge with some variable name changes and a bunch of commentary to make sure I understand it. Maybe the "improvement in the style of Timsort" can be further improved; at the moment it checks the right-hand half of the left array for the merge point of the right array with a binary search. In principle one could check the entire left array for the merge point, instead of just giving up immediately if the terms are not ordered at the midpoint. Maybe Jos already tested this at the time, and found it not to help? I will test this, but it would be for a new PR.