segmentation fault on freebsd-amd64 & MacOS

bakul commented 4 years ago

./k_test segfaults after printing t:700

git bisect says this is the first bad commit: c62db0361db582

    frame #0: 0x0000000100033009 k_test`unpool(r=6) at km.c:186:11
   183      }//Low lanes subdivide pages. no divide op
   184      *L=z;
   185    }
-> 186    z=*L;*L=*z;*z=0;
   187    mUsed+=k; if(mUsed>mMax)mMax=mUsed;
   188    R z;
   189  }

I verified the previous commit does work. Both FreeBSD and MacOS stop at the same line.

tavmem commented 4 years ago

Don't have access to my MacOS machine till sometime next week.

tavmem commented 4 years ago

The segfault is caused by the test added for issue #571 Commenting out that test eliminates the segfault

$ git diff
diff --git a/src/tests.c b/src/tests.c
index e67bcb3..2238f8c 100644
--- a/src/tests.c
+++ b/src/tests.c
@@ -313,7 +313,7 @@ Z I tests02()
   TC( -1, 4: (.((`a;1);(`b;2)))(,`a) )            // issue 561
   TC_( "1 2", "(1 2 1)\\1" )                      // issue 572
   TC_( "12 6 3", "0 1 1 3 2 5 3 7 4 9 5 11 6\\ 12" )       // issue 572
-  TC( 1, d:.((`a;0);(`b;1)); `d .`b )             // issue 571
+  //TC( 1, d:.((`a;0);(`b;1)); `d .`b )             // issue 571
   TC( .[;;;;], .[;;;;] )                          // issue 543  testing for leaks
   TC( .[], .[] )                                  // issue 543  testing for leaks

I will look into why the segfault is happening.

tavmem commented 4 years ago

This is interesting:

$ ./k
kona      \ for help. \\ to exit.

  d:.((`a;0);(`b;1))
value error
d:.((`a;0);(`b;1))
 ^
>  \
  d:.((`a;0);(`b;1))
  d
.((`a;0;)
  (`b;1;))
  `d . `b
1

Execute the first command of the test, and it fails with a "value error" Execute it again, and it works ... and the second command gives the correct result !!!!

tavmem commented 4 years ago

The strange thing is that the 1st command has nothing to do with the symbolic indexing of issue #571 The command is just setting up an ordinary dictionary.

bakul commented 4 years ago

It dies in this test (on freebsd):

Breakpoint 3, tests02 () at src/tests.c:319
319       TC(.[*; (3;4); :], (0;12) )
(gdb) n

Program received signal SIGSEGV, Segmentation fault.
0x0000000000245098 in unpool (r=6) at src/km.c:186
186       z=*L;*L=*z;*z=0;

But it dies elsewhere under MacOS. But if you run the failing test expression by itself in k, they work fine so I am thinking this is some sort of memory corruption.

tavmem commented 4 years ago

I agree. Also strange in MacOS:

$ ./k
kona      \ for help. \\ to exit.

  d:.((`a;0);(`b;1))
value error
d:.((`a;0);(`b;1))
 ^
>  \
  d
.((`a;0;)
  (`b;1;))

Although "value error" is reported, the dictionary did get set up without executing the command a second time.

$ ./k
kona      \ for help. \\ to exit.

  d:.((`a;0);(`b;1))
value error
d:.((`a;0);(`b;1))
 ^
>  \
  `d . `b
1

tavmem commented 4 years ago

Memory corruption would explain the inconsistent behavior and the different failures on FreeBSD and MacOS experienced by @bakul and myself. It might also be the cause of the different failures experienced by @hkinds and myself when using Windows, as documented in issue #581.

tavmem commented 4 years ago

On FreeBSD, git bisect says this is the first bad commit: c62db03 k_test does work on FreeBSD using the prior commit of Jan 8 2020

However, on MacOS, using the prior commit of Jan 8 2020:

$ pwd
/Users/tavmem/k200108
$ ./k
kona      \ for help. \\ to exit.

  d:.((`a;0);(`b;1))
value error
d:.((`a;0);(`b;1))
 ^
>

This indicates that the problem begins earlier.

bakul commented 4 years ago

I don't get that error on the commit of Jan 8 (a50105f5) on the mac and no tests fail. FYI this is on version 10.14.6 (Mojave).

tavmem commented 4 years ago

My Mac is on Sierra (10.12.6) I don't update it (or use it) very often.

tavmem commented 4 years ago

My results on Sierra (10.12.6) show that the first problematic commit is 48be0eb9768137ce124e03fac6dc8f84920d8aff of Nov 15, 2019. However, given the conflicting results we have so far, who knows?

What is particularly striking is that this commit was supposed to be purely stylistic. I was only meant to give the code better clarity ... no content updates. Given that ... it still may be the culprit.

However, there is no denying that Sierra has a problem with this and all later commits. Nevertheless, the actual problem may originate in an earlier commit.

My plan (as a next step) is to take the prior commit of Nov 13, 2019 and selectively apply the stylistic changes of the Nov 15 commit to see which (if any) of them cause the problem in Sierra.

tavmem commented 4 years ago

Oh ... by the way ... k_test works (with no errors) on my Mac with Sierra using the Nov 15 commit.

tavmem commented 4 years ago

After commit fae36da488089dee7e549f91a60a558e36218789 of Feb 11, 2020 (on MacOS w Sierra 10.12.6):

$ ./k
kona      \ for help. \\ to exit.

  d:.((`a;0);(`b;1)); `d .`b              
1

before this commit, MacOS w Sierra produced:

$ ./k
kona      \ for help. \\ to exit.

  d:.((`a;0);(`b;1)); `d .`b
1
value error
d:.((`a;0);(`b;1)); `d .`b
at execution instance 1 of "."

however, we still get

$ ./k_test
t:0
t:50
t:100
t:150
t:200
t:250
t:300
t:350
t:400
t:450
t:500
t:550
t:600
t:650
t:700
Segmentation fault: 11

tavmem commented 4 years ago

Now, the status on MacOS Sierra (10.12.6) agrees with MacOS Mojave (10.14.6): The first bad commit is: c62db03 of Jan 22, 2020.

tavmem commented 4 years ago

As we have probably noted before, If you comment out the test for issue 571 (in MacOS):

$ git diff
diff --git a/src/tests.c b/src/tests.c
index e67bcb3..2238f8c 100644
--- a/src/tests.c
+++ b/src/tests.c
@@ -313,7 +313,7 @@ Z I tests02()
   TC( -1, 4: (.((`a;1);(`b;2)))(,`a) )            // issue 561
   TC_( "1 2", "(1 2 1)\\1" )                      // issue 572
   TC_( "12 6 3", "0 1 1 3 2 5 3 7 4 9 5 11 6\\ 12" )       // issue 572
-  TC( 1, d:.((`a;0);(`b;1)); `d .`b )             // issue 571
+  //TC( 1, d:.((`a;0);(`b;1)); `d .`b )             // issue 571
   TC( .[;;;;], .[;;;;] )                          // issue 543  testing for leaks
   TC( .[], .[] )                                  // issue 543  testing for leaks

then k_test works with no errors but (in both versions of MacOS) that test works when run by itself

$ ./k
kona      \ for help. \\ to exit.

  d:.((`a;0);(`b;1)); `d .`b
1

At this point, it may pay to examine more closely the problem in FreeBSD and/or in Windows.

bakul commented 4 years ago

It is possible that the bug is present on every platform but manifests itself only on FreeBSD and MacOS. Suggest using valgrind. Running valgrind ./k_test reveals this:


...
t:650
t:700
==882== Invalid read of size 8
==882==    at 0x245098: ??? (in /tmp/kona/k_test)
==882==    by 0x2457C9: ??? (in /tmp/kona/k_test)
==882==    by 0x2447CF: ??? (in /tmp/kona/k_test)
==882==    by 0x2446E4: ??? (in /tmp/kona/k_test)
==882==    by 0x220614: ??? (in /tmp/kona/k_test)
==882==    by 0x222C2B: ??? (in /tmp/kona/k_test)
==882==    by 0x2208F0: ??? (in /tmp/kona/k_test)
==882==    by 0x222E41: ??? (in /tmp/kona/k_test)
==882==    by 0x2208F0: ??? (in /tmp/kona/k_test)
==882==    by 0x21FE5B: ??? (in /tmp/kona/k_test)
==882==    by 0x22F610: ??? (in /tmp/kona/k_test)
==882==    by 0x22F5DA: ??? (in /tmp/kona/k_test)
==882==  Address 0x106 is not stack'd, malloc'd or (recently) free'd
==882==
==882==
==882== Process terminating with default action of signal 11 (SIGSEGV)
==882==  Access not within mapped region at address 0x106
==882==    at 0x245098: ??? (in /tmp/kona/k_test)
==882==    by 0x2457C9: ??? (in /tmp/kona/k_test)
==882==    by 0x2447CF: ??? (in /tmp/kona/k_test)
==882==    by 0x2446E4: ??? (in /tmp/kona/k_test)
==882==    by 0x220614: ??? (in /tmp/kona/k_test)
==882==    by 0x222C2B: ??? (in /tmp/kona/k_test)
==882==    by 0x2208F0: ??? (in /tmp/kona/k_test)
==882==    by 0x222E41: ??? (in /tmp/kona/k_test)
==882==    by 0x2208F0: ??? (in /tmp/kona/k_test)
==882==    by 0x21FE5B: ??? (in /tmp/kona/k_test)
==882==    by 0x22F610: ??? (in /tmp/kona/k_test)
==882==    by 0x22F5DA: ??? (in /tmp/kona/k_test)
==882==  If you believe this happened as a result of a stack
==882==  overflow in your program's main thread (unlikely but
==882==  possible), you can try to increase the size of the
==882==  main thread stack using the --main-stacksize= flag.
==882==  The main thread stack size used in this run was 16777216.
==882==
==882== HEAP SUMMARY:
==882==     in use at exit: 0 bytes in 0 blocks
==882==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==882==
==882== All heap blocks were freed -- no leaks are possible
==882==
==882== For counts of detected and suppressed errors, rerun with: -v
==882== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
[2]    882 segmentation fault  valgrind k_test```

tavmem commented 4 years ago

Thanks ... will do. Here is some other info:

If you comment out all other test batches leaving only the batch test02:

diff --git a/src/tests.c b/src/tests.c
index e67bcb3..fc98a99 100644
--- a/src/tests.c
+++ b/src/tests.c
@@ -65,11 +65,11 @@ I tc(S a, S b) //test comparison .  R 0,1,2
 I test()
 { testtime=clock();

-  testsBook();
-  tests01();
+  //testsBook();
+  //tests01();
   tests02();
-  testsIO();  //could become slow - in the future may not want to test by default
-  K x; x=_(567);if(!tp(x && *kI(x)==567))fprintf(stderr,"\n\nK string execution broken\n\n"); cd(x);
+  //testsIO();  //could become slow - in the future may not want to test by default
+  //K x; x=_(567);if(!tp(x && *kI(x)==567))fprintf(stderr,"\n\nK string execution broken\n\n"); cd(x);

 //done:
   testtime=(clock()-testtime)/CLOCKS_PER_SEC;
$

then k_test works (including the test for 571)

$ ./k_test
t:0
t:50
t:100
t:150
t:200
t:250
t:300
t:350
t:400
t:450
t:500
t:550
t:600
Test pass rate: 1.0000, Total: 601, Passed: 587, Skipped: 14, Failed: 0, Time: 0.439583s
OK

It seems that there is some interaction with another test that gets run before the 571 test when using the full complement of tests.

tavmem commented 4 years ago

If you run only 2 batches

  testsBook();
  //tests01();
  tests02();
  //testsIO();  //could become slow - in the future may not want to test by default
  //K x; x=_(567);if(!tp(x && *kI(x)==567))fprintf(stderr,"\n\nK string execution broken\n\n"); cd(x);

then you get the segfault

tavmem commented 4 years ago

Also ... The test for issue 571 is about the 208th test in the batch tests02 If you move that test to be the 1st test in batch tests02, then all tests pass.

This appears to be more evidence that either

the problem is caused by the interaction of the tests, or
the problem is somewhere in the tests.c code (inadequate cleanup between tests).

tavmem commented 4 years ago

Here is another experiment that indicates that the problem is not test specific. (Again, it looks more like the problem may be with the program running the tests.)

So far, the segfault has occurred on the 718th test: TC( .[;;;;], .[;;;;] ) // issue 543 testing for leaks and it disappears if you omit the 717th test: TC( 1, d:.((`a;0);(`b;1)); `d .`b ) // issue 571

However, change the following 2 lines in the codebase:

$ git diff
diff --git a/src/kx.c b/src/kx.c
index ba005c6..a2b9164 100644
--- a/src/kx.c
+++ b/src/kx.c
@@ -593,7 +593,7 @@ Z V ex_(V a, I r)   //Expand wd()->7-0 types, expand and evaluate brackets.   Co
   cd(y); R z; }

 K ex(K a)   //Input is (usually, but not always) 7-0 type from wd()
-{ U(a);
+{ O("sd_(a,2):");sd_(a,2); U(a);
   if(a->t==7 && kVC(a)>(K)DT_SIZE && 7==kVC(a)->t && 6==kVC(a)->n) fwh=1;
   if(a->t==7)
   { if(prnt==0)
diff --git a/src/p.c b/src/p.c
index a565bda..01ee1de 100644
--- a/src/p.c
+++ b/src/p.c
@@ -251,7 +251,7 @@ I mark(I*m,I k,I t){ DO(k, m[i]=i?t:-t) R k; }
 //      so the check probably has to do with whether some useful symbol occurred on the line already
 //other errors: syntax error

-K wd(S s, int n){ lineA=s; fdc=0; R wd_(s,n,denameD(&KTREE,d_,1),0); }
+K wd(S s, int n){ O("\n****************** s:%s\n",s); lineA=s; fdc=0; R wd_(s,n,denameD(&KTREE,d_,1),0); }

 K wd_(S s, int n, K*dict, K func) //parse: s input string, n length;
 { //assumes: s does not contain a }])([{ mismatch, s is a "complete" expression
$

The change to src/p.c displays what test is fed to the parse module. The change to src/kx.c displays the input fed to the execution module.

Now run ./k_test. It segfaults on the 17th test: TC(4 3, ^ (1 2 3; "abc"; `x `y `z; 5.4 1.2 -3.56))

NB: This was run on MacOS Sierra 10.12.6 Running this on another OS may not give the same results.

tavmem commented 4 years ago

(Again, all tests in this comment were run an MacOS Sierra 10.12.6)

As mentioned in an earlier comment, if you move the test for issue 571 to be the first test in the batch test02, then all the tests pass. However, if you run this version with valgrind you get an interesting result:

$ valgrind ./k_test
==1131== Memcheck, a memory error detector
==1131== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1131== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==1131== Command: ./k_test
==1131== 
--1131-- run: /usr/bin/dsymutil "./k_test"
==1131== Conditional jump or move depends on uninitialised value(s)
==1131==    at 0x100D395E8: _platform_memchr$VARIANT$Base (in /usr/lib/system/libsystem_platform.dylib)
==1131==    by 0x100AE54D1: __sfvwrite (in /usr/lib/system/libsystem_c.dylib)
==1131==    by 0x100AEF898: __vfprintf (in /usr/lib/system/libsystem_c.dylib)
==1131==    by 0x100B15422: __v2printf (in /usr/lib/system/libsystem_c.dylib)
==1131==    by 0x100AEB33D: vfprintf_l (in /usr/lib/system/libsystem_c.dylib)
==1131==    by 0x100AE9247: printf (in /usr/lib/system/libsystem_c.dylib)
==1131==    by 0x100054F84: tc (tests.c:32)
==1131==    by 0x10006CD3D: testsBook (tests.c:995)
==1131==    by 0x1000554E7: test (tests.c:68)
==1131==    by 0x10002203D: kinit (kc.c:169)
==1131==    by 0x100054DCC: main (main.c:6)
==1131== 
t:0
t:50
t:100
t:150
t:200
t:250
t:300
t:350
t:400
t:450
t:500
t:550
t:600
t:650
t:700
t:750
t:800
t:850
t:900
t:950
t:1000
t:1050
t:1100
Test pass rate: 1.0000, Total: 1125, Passed: 1092, Skipped: 33, Failed: 0, Time: 11.274427s
OK
--1131-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option
--1131-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times)
--1131-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times)
--1131-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 8 times)
kona      \ for help. \\ to exit.

  ==1131== Thread 2:
==1131== Invalid read of size 4
==1131==    at 0x100D4C899: _pthread_body (in /usr/lib/system/libsystem_pthread.dylib)
==1131==    by 0x100D4C886: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==1131==    by 0x100D4C08C: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==1131==  Address 0x18 is not stack'd, malloc'd or (recently) free'd
==1131== 
==1131== 
==1131== Process terminating with default action of signal 11 (SIGSEGV)
==1131==  Access not within mapped region at address 0x18
==1131==    at 0x100D4C899: _pthread_body (in /usr/lib/system/libsystem_pthread.dylib)
==1131==    by 0x100D4C886: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
==1131==    by 0x100D4C08C: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
==1131==  If you believe this happened as a result of a stack
==1131==  overflow in your program's main thread (unlikely but
==1131==  possible), you can try to increase the size of the
==1131==  main thread stack using the --main-stacksize= flag.
==1131==  The main thread stack size used in this run was 8388608.
==1131== 
==1131== HEAP SUMMARY:
==1131==     in use at exit: 37,659 bytes in 337 blocks
==1131==   total heap usage: 56,789 allocs, 56,452 frees, 168,385 bytes allocated
==1131== 
==1131== LEAK SUMMARY:
==1131==    definitely lost: 4 bytes in 1 blocks
==1131==    indirectly lost: 0 bytes in 0 blocks
==1131==      possibly lost: 1,970 bytes in 94 blocks
==1131==    still reachable: 13,757 bytes in 87 blocks
==1131==         suppressed: 21,928 bytes in 155 blocks
==1131== Rerun with --leak-check=full to see details of leaked memory
==1131== 
==1131== Use --track-origins=yes to see where uninitialised values come from
==1131== For lists of detected and suppressed errors, rerun with: -s
==1131== ERROR SUMMARY: 19 errors from 2 contexts (suppressed: 4 from 4)
Segmentation fault: 11
$

Although all tests pass, you still get a segfault in Thread 2, which does not show up UNLESS you use valgrind.

I decided to check what happens in a very early version of kona. I went back to the commit of June 23, 2011, 611fdca7d77fd43fe1f30089cf18d8edef6b415b. The results (on MacOS Sierra) are:

$ ./k_test
t:0
t:50
t:100
t:150
t:200
t:250
t:300
t:350
t:400
t:450
t:500
t:550
t:600
t:650
t:700
Segmentation fault: 11
$

The results using valgrind are:

$ valgrind ./k_test
==1196== Memcheck, a memory error detector
==1196== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1196== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==1196== Command: ./k_test
==1196== 
--1196-- run: /usr/bin/dsymutil "./k_test"
==1196== Conditional jump or move depends on uninitialised value(s)
==1196==    at 0x100D395E8: _platform_memchr$VARIANT$Base (in /usr/lib/system/libsystem_platform.dylib)
==1196==    by 0x100AE54D1: __sfvwrite (in /usr/lib/system/libsystem_c.dylib)
==1196==    by 0x100AEF898: __vfprintf (in /usr/lib/system/libsystem_c.dylib)
==1196==    by 0x100B15422: __v2printf (in /usr/lib/system/libsystem_c.dylib)
==1196==    by 0x100AEB33D: vfprintf_l (in /usr/lib/system/libsystem_c.dylib)
==1196==    by 0x100AE9247: printf (in /usr/lib/system/libsystem_c.dylib)
==1196==    by 0x100054F84: tc (tests.c:32)
==1196==    by 0x10006CD3D: testsBook (tests.c:994)
==1196==    by 0x1000554E7: test (tests.c:68)
==1196==    by 0x10002203D: kinit (kc.c:169)
==1196==    by 0x100054DCC: main (main.c:6)
==1196== 
t:0
t:50
t:100
t:150
t:200
t:250
t:300
t:350
t:400
t:450
t:500
t:550
t:600
t:650
t:700
==1196== Invalid read of size 8
==1196==    at 0x100033B16: unpool (km.c:186)
==1196==    by 0x100034249: kallocI (km.c:155)
==1196==    by 0x10003318F: kalloc (km.c:160)
==1196==    by 0x1000330A1: newK (km.c:141)
==1196==    by 0x100033EC3: Kd (km.c:331)
==1196==    by 0x100033F51: Kv (km.c:333)
==1196==    by 0x10002F4D6: ex1 (kx.c:786)
==1196==    by 0x100028E9A: ex0 (kx.c:664)
==1196==    by 0x10002EF09: ex_ (kx.c:592)
==1196==    by 0x10002C0FE: ex2 (kx.c:811)
==1196==    by 0x10002F305: ex1 (kx.c:770)
==1196==    by 0x100028529: ex0 (kx.c:624)
==1196==  Address 0x106 is not stack'd, malloc'd or (recently) free'd
==1196== 
==1196== 
==1196== Process terminating with default action of signal 11 (SIGSEGV)
==1196==  Access not within mapped region at address 0x106
==1196==    at 0x100033B16: unpool (km.c:186)
==1196==    by 0x100034249: kallocI (km.c:155)
==1196==    by 0x10003318F: kalloc (km.c:160)
==1196==    by 0x1000330A1: newK (km.c:141)
==1196==    by 0x100033EC3: Kd (km.c:331)
==1196==    by 0x100033F51: Kv (km.c:333)
==1196==    by 0x10002F4D6: ex1 (kx.c:786)
==1196==    by 0x100028E9A: ex0 (kx.c:664)
==1196==    by 0x10002EF09: ex_ (kx.c:592)
==1196==    by 0x10002C0FE: ex2 (kx.c:811)
==1196==    by 0x10002F305: ex1 (kx.c:770)
==1196==    by 0x100028529: ex0 (kx.c:624)
==1196==  If you believe this happened as a result of a stack
==1196==  overflow in your program's main thread (unlikely but
==1196==  possible), you can try to increase the size of the
==1196==  main thread stack using the --main-stacksize= flag.
==1196==  The main thread stack size used in this run was 8388608.
==1196== 
==1196== HEAP SUMMARY:
==1196==     in use at exit: 24,661 bytes in 262 blocks
==1196==   total heap usage: 46,183 allocs, 45,921 frees, 123,786 bytes allocated
==1196== 
==1196== LEAK SUMMARY:
==1196==    definitely lost: 4 bytes in 1 blocks
==1196==    indirectly lost: 0 bytes in 0 blocks
==1196==      possibly lost: 1,689 bytes in 81 blocks
==1196==    still reachable: 1,040 bytes in 25 blocks
==1196==         suppressed: 21,928 bytes in 155 blocks
==1196== Rerun with --leak-check=full to see details of leaked memory
==1196== 
==1196== Use --track-origins=yes to see where uninitialised values come from
==1196== For lists of detected and suppressed errors, rerun with: -s
==1196== ERROR SUMMARY: 16 errors from 2 contexts (suppressed: 4 from 4)
Segmentation fault: 11
$

This is the exact same problem that we are addressing today. It has been an unrecognized problem (in MacOS, and maybe in FreeBSD) since the earliest days of kona.

tavmem commented 4 years ago

Well, not completely unrecognized. Issue #274 addresses this in the case of an individual test (not ./k_test) using OSX 10.7.5. This issue was opened on Nov 23, 2014 and was resolved by commit 7ec26c9cdbc05558524e2394a6dd81daaaa32a9f on the same day.

Also, (which may be relevant) the segfault of issue #274 for the individual test began to appear after the commit 068fc66eb33e8b577965c2b9ba1fa5fa0ae8aa9f of Mar 11, 2014 to fix issue #239 opened on March 10, 2014.

However, using the commit 7ec26c9cdbc05558524e2394a6dd81daaaa32a9f ./k_test works with no errors at all (even when using valgrind):

$ valgrind ./k_test
==1340== Memcheck, a memory error detector
==1340== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1340== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==1340== Command: ./k_test
==1340== 
--1340-- run: /usr/bin/dsymutil "./k_test"
==1340== Conditional jump or move depends on uninitialised value(s)
==1340==    at 0x100D1E5E8: _platform_memchr$VARIANT$Base (in /usr/lib/system/libsystem_platform.dylib)
==1340==    by 0x100ACA4D1: __sfvwrite (in /usr/lib/system/libsystem_c.dylib)
==1340==    by 0x100AD4898: __vfprintf (in /usr/lib/system/libsystem_c.dylib)
==1340==    by 0x100AFA422: __v2printf (in /usr/lib/system/libsystem_c.dylib)
==1340==    by 0x100AD033D: vfprintf_l (in /usr/lib/system/libsystem_c.dylib)
==1340==    by 0x100ACE247: printf (in /usr/lib/system/libsystem_c.dylib)
==1340==    by 0x100042464: tc (tests.c:32)
==1340==    by 0x100050E6D: testsBook (tests.c:698)
==1340==    by 0x100042844: test (tests.c:67)
==1340==    by 0x10001C6AC: kinit (kc.c:120)
==1340==    by 0x1000422AC: main (main.c:6)
==1340== 
t:0
t:50
t:100
t:150
t:200
t:250
t:300
t:350
t:400
t:450
t:500
t:550
t:600
t:650
t:700
t:750
t:800
t:850
Test pass rate: 1.0000, Total: 895, Passed: 869, Skipped: 26, Failed: 0, Time: 7.135506s
OK
K Console - Enter \ for help

So, now it seems that the segfault in ./k_test (when using valgrind on MacOS 10.12.6)

existed early on in kona
then disappeared
and has now reappeared.

I guess the next questions are:

when did it disappear, and what caused it to disappear?
when does it first make its latest reappearance (when using valgrind on MacOS 10.12.6), and what caused it to reappear?

tavmem commented 4 years ago

The earliest commit in which the usual make command works is d013b1d2364cf3865c36781990f0154dd8ad98d0 of Mar 29, 2011. In this commit both ./k_test and valgrind ./k_test both fail with Segmentation fault: 11. This segfault is resolved by commit d19eb90468a0910efa97709f087282f7e5d1f4c1 of Nov 11, 2011 for both ./k_test and valgrind ./k_test. The resolving commit did not add or change any tests in tests.c (which is relevant because, later, we find that whether the segfault occurs at all can depend on the order of tests in test.c).

valgrind ./k_test begins to segfault in Thread 2 in commit 665252fc54a18b4a4a8d5d5edaac76ddc3f2a5c0 of Oct 25, 2015.

The current segfault in .k_test begins with commit c62db0361db582015e62f147da197091d3eca24c of Jan 22, 2020.

tavmem commented 4 years ago

The following changes

provide a temporary workaround for the segfault in ./k_test when using MacOS by moving the test for issue 571 to be the 1st test in batch tests02. All tests then run in MacOS. Hopefully, this workaround suffices in FreeBSD also.

actually fix the segfault in valgrind ./k_test in Thread 2 when there is no segfault in ./k_test by eliminating the separate timer thread.


$ git diff
diff --git a/src/kc.c b/src/kc.c
index 2a06a0c..4d6c2ac 100644
--- a/src/kc.c
+++ b/src/kc.c
@@ -413,9 +413,9 @@ I attend() {  //K3.2 uses fcntl somewhere
 FD_SET(listener, &master);
 fdmax = listener; }

pthread_t thread;
if(pthread_create(&thread, NULL, timer_thread, NULL)){
perror("Create timer thread"); abort(); }
//pthread_t thread;
//if(pthread_create(&thread, NULL, timer_thread, NULL)){
// perror("Create timer thread"); abort(); }

fln=1; for(;;) { // main loop diff --git a/src/tests.c b/src/tests.c index e67bcb3..dbeb51c 100644 --- a/src/tests.c +++ b/src/tests.c @@ -91,6 +91,7 @@ Z I testsIO()

Z I tests02() {
TC( 1, d:.((a;0);(b;1)); d .b ) // issue 571 ... workaround TC(b,(ab)[1]) TC(2, {1+1} 0) TC(2, {a:1;a+a} _n ) @@ -313,7 +314,7 @@ Z I tests02() TC( -1, 4: (.((a;1);(b;2)))(,a) ) // issue 561 TC( "1 2", "(1 2 1)\1" ) // issue 572 TC( "12 6 3", "0 1 1 3 2 5 3 7 4 9 5 11 6\ 12" ) // issue 572
TC( 1, d:.((a;0);(b;1)); d .b ) // issue 571
//TC( 1, d:.((a;0);(b;1)); d .b ) // issue 571 TC( .[;;;;], .[;;;;] ) // issue 543 testing for leaks TC( .[], .[] ) // issue 543 testing for leaks
```
I will commit these changes and continue to research the underlying cause of the segfault that occurs in ```./k_test``` on the test for issue 543, when the test for issue 571 is run in its original spot.
```

kevinlawler / kona

segmentation fault on freebsd-amd64 & MacOS #582