mxk / go-sqlite

SQLite Library for Go
BSD 3-Clause "New" or "Revised" License
59 stars 20 forks source link

non-deterministic segfault #6

Open rwcarlsen opened 10 years ago

rwcarlsen commented 10 years ago

I've been using your driver to do some post-processing analysis on various sqlite databases I have. The analysis involves many queryies, table creation, insertions, etc. To the best of my knowledge, there is no concurrency in the post-process.

To reproduce:

Other info:

Sample failure output:

SIGSEGV: segmentation violation
PC=0x40e52e
signal arrived during cgo execution

runtime.cgocall(0x405d83, 0x7f9d2aebc2c8)
    /home/r/go/src/pkg/runtime/cgocall.c:149 +0x11b fp=0x7f9d2aebc2b0
github.com/mxk/go-sqlite/sqlite3._Cfunc_sqlite3_step(0x2817bc8, 0x7f9d2aebc430)
    github.com/mxk/go-sqlite/sqlite3/_obj/_cgo_defun.c:608 +0x31 fp=0x7f9d2aebc2c8
github.com/mxk/go-sqlite/sqlite3.(*Stmt).step(0xc2100836e0, 0x7f9d2aebc430, 0x2)
    /home/r/gopath/src/github.com/mxk/go-sqlite/sqlite3/sqlite3.go:959 +0x65 fp=0x7f9d2aebc310
github.com/mxk/go-sqlite/sqlite3.(*Stmt).exec(0xc2100836e0, 0x7f9d2aebc430, 0x2, 0x2, 0x0, ...)
    /home/r/gopath/src/github.com/mxk/go-sqlite/sqlite3/sqlite3.go:870 +0xdb fp=0x7f9d2aebc360
github.com/mxk/go-sqlite/sqlite3.(*Stmt).Query(0xc2100836e0, 0x7f9d2aebc430, 0x2, 0x2, 0x536ba0, ...)
    /home/r/gopath/src/github.com/mxk/go-sqlite/sqlite3/sqlite3.go:771 +0x9e fp=0x7f9d2aebc398
github.com/rwcarlsen/cyan/post.(*Context).getNewOwners(0xc21005d180, 0x501, 0x0, 0x0, 0x0, ...)
    /home/r/gopath/src/github.com/rwcarlsen/cyan/post/walker.go:294 +0x178 fp=0x7f9d2aebc478
github.com/rwcarlsen/cyan/post.(*Context).walkDown(0xc21005d180, 0xc2100d8b10)
    /home/r/gopath/src/github.com/rwcarlsen/cyan/post/walker.go:259 +0x508 fp=0x7f9d2aebc648
github.com/rwcarlsen/cyan/post.(*Context).walkDown(0xc21005d180, 0xc2100d8a80)
    /home/r/gopath/src/github.com/rwcarlsen/cyan/post/walker.go:288 +0x991 fp=0x7f9d2aebc818
github.com/rwcarlsen/cyan/post.(*Context).walkDown(0xc21005d180, 0xc2100d8a20)
    /home/r/gopath/src/github.com/rwcarlsen/cyan/post/walker.go:288 +0x991 fp=0x7f9d2aebc9e8
github.com/rwcarlsen/cyan/post.(*Context).walkDown(0xc21005d180, 0xc2100d89c0)
    /home/r/gopath/src/github.com/rwcarlsen/cyan/post/walker.go:288 +0x991 fp=0x7f9d2aebcbb8
github.com/rwcarlsen/cyan/post.(*Context).walkDown(0xc21005d180, 0xc21008e840)
    /home/r/gopath/src/github.com/rwcarlsen/cyan/post/walker.go:288 +0x991 fp=0x7f9d2aebcd88
github.com/rwcarlsen/cyan/post.(*Context).WalkAll(0xc21005d180, 0x0, 0x0)
    /home/r/gopath/src/github.com/rwcarlsen/cyan/post/walker.go:196 +0x3f0 fp=0x7f9d2aebce88
main.main()
    /home/r/gopath/src/github.com/rwcarlsen/cyan/cmd/cycpost/main.go:39 +0x1d6 fp=0x7f9d2aebcf48
runtime.main()
    /home/r/go/src/pkg/runtime/proc.c:220 +0x11f fp=0x7f9d2aebcfa0
runtime.goexit()
    /home/r/go/src/pkg/runtime/proc.c:1394 fp=0x7f9d2aebcfa8

goroutine 3 [syscall]:
runtime.goexit()
    /home/r/go/src/pkg/runtime/proc.c:1394

goroutine 4 [finalizer wait]:
runtime.park(0x48f920, 0x8cdc68, 0x8c0d68)
    /home/r/go/src/pkg/runtime/proc.c:1342 +0x66
runfinq()
    /home/r/go/src/pkg/runtime/mgc0.c:2279 +0x84
runtime.goexit()
    /home/r/go/src/pkg/runtime/proc.c:1394

rax     0xee478523a12a5b8b
rbx     0x2671428
rcx     0x267de00
rdx     0x267de00
rdi     0x2671428
rsi     0xd
rbp     0x1a0
rsp     0x7f9d2a0b1870
r8      0x1
r9      0x0
r10     0x45
r11     0x7f9d240008c8
r12     0x2671428
r13     0x0
r14     0x26b54eb
r15     0xc
rip     0x40e52e
rflags  0x10286
cs      0x33
fs      0x0
gs      0x0
rwcarlsen commented 10 years ago

I'll also add that I've tried compiling with both clang v3.4.1 and gcc v4.9. Both exhibit the same problem

mxk commented 10 years ago

So far I haven't been able to reproduce this on Windows 8.1 or CentOS 6.5 amd64 in more than 1,000 iterations. Can you follow Ian's instructions to see what is at PC=0x40e52e? We need to find out where in the sqlite code the crash happens. Feel free to send me the core dump.

rwcarlsen commented 10 years ago

Here is a full backtrace:

Program received signal SIGSEGV, Segmentation fault.
sqlite3DbMallocRaw (n=<optimized out>, db=0x6e6338, n=<optimized out>) at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:20086
20086           db->lookaside.pFree = pBuf->pNext;
#0  sqlite3DbMallocRaw (n=<optimized out>, db=0x6e6338, n=<optimized out>) at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:20086
#1  sqlite3IndexAffinityStr (pIdx=<optimized out>, v=<optimized out>, pIdx=<optimized out>) at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:28530
#2  sqlite3Pragma (pParse=<optimized out>, pId1=<optimized out>, pId2=<optimized out>, pValue=<optimized out>, minusFlag=<optimized out>) at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:33593
#3  0x00000000004bfc1b in sqlite3VdbeSorterWrite (db=<optimized out>, pCsr=<optimized out>, pVal=<optimized out>) at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:75163
#4  sqlite3VdbeExec (p=0x8661d8) at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:6577
#5  0x000000000040fd79 in sqlite3Step (p=<optimized out>) at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:66192
#6  sqlite3_step (pStmt=0x8661d8) at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:722
#7  0x0000000000415208 in sqlite3_exec (db=0x6e6338, zSql=<optimized out>, xCallback=0x0, pArg=0x0, pzErrMsg=0x0) at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:96070
#8  0x000000000040648c in _cgo_0d46fcb81049_Cfunc_sqlite3_exec (v=0xc2080d3c28) at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/sqlite3.go:510
#9  0x00000000005023b1 in runtime.asmcgocall () at /home/robert/go/src/pkg/runtime/asm_amd64.s:705
#10 0x0000000000000000 in ?? ()

And here is another one I got that is different :/

Program received signal SIGSEGV, Segmentation fault.
sqlite3DbMallocRaw (n=<optimized out>, db=0x6e6308, n=<optimized out>) at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:20086
20086           db->lookaside.pFree = pBuf->pNext;
#0  sqlite3DbMallocRaw (n=<optimized out>, db=0x6e6308, n=<optimized out>) at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:20086
#1  sqlite3IndexAffinityStr (pIdx=<optimized out>, v=<optimized out>, pIdx=<optimized out>) at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:28530
#2  sqlite3Pragma (pParse=<optimized out>, pId1=<optimized out>, pId2=<optimized out>, pValue=<optimized out>, minusFlag=<optimized out>)
    at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:33593
#3  0x0000000000462e71 in sqlite3DbMallocZero (db=<optimized out>, n=<optimized out>, db=<optimized out>, n=<optimized out>)
    at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:20045
#4  sqlite3ExprListAppend (pParse=0x829008, pList=<optimized out>, pExpr=0x6ebfd8) at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:13159
#5  yy_reduce (yypParser=0x894518, yyruleno=157) at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:53438
#6  0x0000000000460929 in sqlite3Parser (yyp=0x894518, yymajor=1, pParse=<optimized out>, yyminor=...)
    at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:119808
#7  0x0000000000416a62 in sqlite3RunParser (pParse=0x829008, 
    zSql=0x842878 "SELECT tr.ReceiverId, tr.Time FROM Transactions AS tr\n\t\t\t\t  WHERE tr.ResourceId = ? AND tr.SimId = ?\n\t\t\t\t  ORDER BY tr.Time ASC;", 
    pzErrMsg=0x7fffffffe150) at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:120626
#8  0x00000000004bc6b0 in sqlite3Prepare (db=0x6e6308, 
    zSql=0x842878 "SELECT tr.ReceiverId, tr.Time FROM Transactions AS tr\n\t\t\t\t  WHERE tr.ResourceId = ? AND tr.SimId = ?\n\t\t\t\t  ORDER BY tr.Time ASC;", 
    nBytes=<optimized out>, saveSqlFlag=0, pReprepare=<optimized out>, ppStmt=0x7fffffffe240, pzTail=<optimized out>)
    at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:100334
#9  0x0000000000415c6e in sqlite3LockAndPrepare (db=0x6e6308, 
    zSql=0x842878 "SELECT tr.ReceiverId, tr.Time FROM Transactions AS tr\n\t\t\t\t  WHERE tr.ResourceId = ? AND tr.SimId = ?\n\t\t\t\t  ORDER BY tr.Time ASC;", nBytes=-1, 
    saveSqlFlag=0, pOld=0x83a018, ppStmt=0x7fffffffe240, pzTail=<optimized out>) at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:100426
#10 0x0000000000410728 in sqlite3Reprepare (p=<optimized out>) at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:100456
#11 sqlite3_step (pStmt=0x83a018) at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:724
#12 0x00000000004065cc in _cgo_0d46fcb81049_Cfunc_sqlite3_step (v=0xc2080c12b0) at /home/robert/gopath/src/github.com/mxk/go-sqlite/sqlite3/sqlite3.go:640
mxk commented 10 years ago

What does thread apply all bt give you in gdb? Are there any other threads that were calling sqlite functions at the same time? To make sure that this isn't the cause, you can try building the package with SQLITE_THREADSAFE set to 1 instead of 2 in sqlite3.go.

It's good that the crash happens in the same location, so we at least know that it's something related to the lookaside memory allocator. The two pointers in question are db->lookaside and pBuf, and I'm guessing that it's the later that is referring to an invalid (non-zero) address.

What happens if you build the package with SQLITE_OMIT_LOOKASIDE?

rwcarlsen commented 10 years ago

SQLITE_THREADSAFE=1 seems stable - no segfaults. Setting SQLITE_OMIT_LOOKASIDE also (without threadsafe=1) seems stable. Here are the multiple threads:

Thread 5 (Thread 0x7ffff5f0c700 (LWP 19637)):
#0  scanblock () at /home/r/go/src/pkg/runtime/mgc0.c:902
#1  0x00007ffff5f0d000 in ?? ()
#2  0x0000000000685428 in bufferList ()
#3  0x0000000000685428 in bufferList ()
#4  0x00007ffff5f0d000 in ?? ()
#5  0x0000000000684d70 in bufferList ()
#6  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7ffff680d700 (LWP 19636)):
#0  runtime.findfunc () at /home/r/go/src/pkg/runtime/symtab.goc:263
#1  0x00000000004d7e6d in github.com/mxk/go-sqlite/sqlite3.func�002 (~r0=...)
    at /home/r/gopath/src/github.com/mxk/go-sqlite/sqlite3/sqlite3.go:239
#2  0x00000000005da001 in func.* ()
#3  0x0000000000619b28 in pclntab ()
#4  0x00000000004a39ed in runtime.gentraceback ()
    at /home/r/go/src/pkg/runtime/traceback_x86.c:176
#5  0x00000000004d1d24 in github.com/mxk/go-sqlite/sqlite3.(*Conn).Exec (
    c=0x697638 <runtime.mheap+55832>, sql=..., args=..., ~r2=...)
    at /home/r/gopath/src/github.com/mxk/go-sqlite/sqlite3/sqlite3.go:262
#6  0x00000000004d7e6d in github.com/mxk/go-sqlite/sqlite3.func�002 (~r0=...)
    at /home/r/gopath/src/github.com/mxk/go-sqlite/sqlite3/sqlite3.go:239
#7  0x0000000000697638 in runtime.mheap ()
#8  0x0000000000000001 in ?? ()
#9  0x0000000000000004 in ?? ()
#10 0x00007fff00000042 in ?? ()
#11 0x000000c208048850 in ?? ()
#12 0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7ffff700e700 (LWP 19635)):
#0  runtime.futex () at /home/r/go/src/pkg/runtime/sys_linux_amd64.s:269
#1  0x0000000000496d72 in runtime.futexsleep () at /home/r/go/src/pkg/runtime/os_linux.c:55
#2  0x00007ffff7e34f60 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7ffff780f700 (LWP 19634)):
#0  runtime.futex () at /home/r/go/src/pkg/runtime/sys_linux_amd64.s:269
#1  0x0000000000496d07 in runtime.futexsleep () at /home/r/go/src/pkg/runtime/os_linux.c:49
#2  0x000000000067ab18 in runtime.sched ()
#3  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7ffff7fd5740 (LWP 19630)):
#0  sqlite3DbMallocRaw (n=<optimized out>, db=0x698308, n=<optimized out>)
    at /home/r/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:20086
#1  sqlite3BitvecSet (p=<optimized out>, i=<optimized out>)
    at /home/r/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:37339
#2  0x00000000004587cc in sqlite3DbMallocZero (db=<optimized out>, n=<optimized out>, 
    db=<optimized out>, n=<optimized out>)
    at /home/r/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:20045
#3  sqlite3ExprAlloc (db=<optimized out>, op=<optimized out>, pToken=<optimized out>, 
    dequote=<optimized out>, db=<optimized out>, op=<optimized out>, 
    pToken=<optimized out>, dequote=<optimized out>)
    at /home/r/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:12493
#4  0x0000000000443a7d in sqlite3PExpr (pParse=0x6a9fc8, op=74, pLeft=0x0, pRight=0x0, 
    pToken=0x6c0030)
    at /home/r/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:78120
#5  0x000000000043fd80 in spanExpr (pParse=<optimized out>, op=135, pValue=0x6c0030, 
    pOut=<optimized out>)
    at /home/r/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:116447
#6  yy_reduce (yypParser=0x6bfe98, yyruleno=192)
    at /home/r/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:53559
#7  0x000000000043c477 in sqlite3Parser (yyp=0x6bfe98, yymajor=23, pParse=<optimized out>, 
    yyminor=...)
    at /home/r/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:119808
#8  0x0000000000410c16 in sqlite3RunParser (pParse=0x6a9fc8, 
    zSql=0xc20802e300 "INSERT INTO TimeList VALUES (?, ?);", pzErrMsg=0x7fffffffe5d8)
    at /home/r/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:120626
#9  0x0000000000474201 in sqlite3Prepare (db=0x698308, 
    zSql=0xc20802e300 "INSERT INTO TimeList VALUES (?, ?);", nBytes=<optimized out>, 
    saveSqlFlag=1, pReprepare=<optimized out>, ppStmt=0xc208052028, pzTail=<optimized out>)
    at /home/r/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:100334
#10 0x00000000004103c9 in sqlite3LockAndPrepare (db=0x698308, 
    zSql=0xc20802e300 "INSERT INTO TimeList VALUES (?, ?);", nBytes=-1, saveSqlFlag=1, 
    pOld=0x0, ppStmt=0xc208052028, pzTail=0xc208052030)
    at /home/r/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:100426
#11 0x000000000041000f in sqlite3_prepare_v2 (db=<optimized out>, zSql=<optimized out>, 
    nBytes=<optimized out>, ppStmt=<optimized out>, pzTail=<optimized out>, 
    db=<optimized out>, zSql=<optimized out>, nBytes=<optimized out>, 
    ppStmt=<optimized out>, pzTail=<optimized out>)
    at /home/r/gopath/src/github.com/mxk/go-sqlite/sqlite3/lib/sqlite3.c:100502
#12 0x000000000040630d in _cgo_ce65b26af501_Cfunc_sqlite3_prepare_v2 (v=0x7ffff7e16940)
    at /home/r/gopath/src/github.com/mxk/go-sqlite/sqlite3/sqlite3.go:618
#13 0x00000000004b0d91 in runtime.asmcgocall ()
    at /home/r/go/src/pkg/runtime/asm_amd64.s:705
#14 0x0000000000000000 in ?? ()
rwcarlsen commented 10 years ago

Also FWIW - the post-process used your interface directly. I rewrote it to use database/sql from the go std library with your driver as the backend, and it seems to be stable that way.

rwcarlsen commented 10 years ago

the old segfaulting post-process is now on in my repo github.com/rwcarlsen/cyan under the segfault branch.

mxk commented 10 years ago

drh suggests building the package with SQLITE_OMIT_LOOKASIDE and then running the program under valgrind: http://marc.info/?l=sqlite-users&m=140269886427294&w=2

Unfortunately, valgrind doesn't seem to work with Go: https://code.google.com/p/go/issues/detail?id=782

I don't have any good ideas at the moment and I'll be gone for the entire weekend. If you don't figure it out before then, I'll take another look next week. I don't see any reason why switching to the database/sql driver would solve the problem.

rin01 commented 10 years ago

My machine is Linux Mint in VirtualBox, on Windows 7 host: Linux nico-Qiana-Xfce 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux go version go1.3.2 linux/amd64

I downloaded rwcarlsen/cyan, commit 5a27279d4a36d094c424662c4296b4364be672b6, of June 5, 2014.

I have run it for 3000 iterations, and there was no crash at all. I also ran it with GOMAXPROCS=9 for 1000 iterations, but it doesn't change anything.

I just wanted to say that unfortunately, I cannot reproduce the problem.

rin01 commented 10 years ago

I have run the program again, with the following flags in sqlite.go:

//#cgo CFLAGS: -DNDEBUG=1
#cgo CFLAGS: -DSQLITE_DEBUG=1
#cgo CFLAGS: -DSQLITE_MEMDEBUG=1

SQLITE_MEMDEBUG does memory checking similar to Valgrind (in particular, use of freed memory, freing unallocated memory).
SQLITE_DEBUG    trashes all content in the lookaside slot being freed with the 0xaa bit pattern to verify that it is not used after being freed.
                It does also some error checking on mutexes.

Again, with 1300 iterations, no crash happened.

Like you, I think that the problem occurs in pBuf, that is referring to an invalid address. At the time of crash, the value of pBuf is db->lookaside.pFree, as we can see in the code of sqlite3DbMallocRaw(). It points to the first free slot available in the lookaside allocator. At time of crash, this pointer was corrupted.

db->lookaside.pFree is only updated by sqlite3DbMallocRaw() and sqlite3DbFree().

The only way I see to corrupt db->lookaside.pFree is that a slot has been written to after being freed. This overwriting will corrupt the first bytes of the slot, which contain the pointer pNext that points to the next free slot. Eventually, this slot will be allocated and removed from the free list, by the function sqlite3DbMallocRaw(), which will assign the corrupted pNext pointer to db->lookaside.pFree. The next call to sqlite3DbMallocRaw() will crash because db->lookaside.pFree is invalid.

To check if a slot in the free list has been overwritten, I added these two checking functions in sqlite.c.

// checks that all slots in the free list of the lookaside allocator have not been overwritten.
// This is done by checking that they all contain the bit pattern 0xaa.
//
void check_lookaside_freelist(sqlite3 *db) {
  LookasideSlot *pslot;
  int            ii;
  int            bug;

  bug=0;
  pslot = db->lookaside.pFree;

  while (pslot) {
    for (ii=sizeof(LookasideSlot*); ii<db->lookaside.sz; ii++) {
      if (((unsigned char *)pslot)[ii] != 0xaa) {
         bug = 1;
         break;
      }
    }
    assert(bug==0);
    pslot = pslot->pNext;
  }
}

// checks that a buffer from lookaside memory allocator, pointed to by pslot, has not been overwritten.
// This is done by checking that it contains the bit pattern 0xaa.
//
void check_lookaside_slot_is_not_overwritten(sqlite3 *db, LookasideSlot *pslot) {
  int            bug;
  int            ii;

  bug=0;

  for (ii=sizeof(LookasideSlot*); ii<db->lookaside.sz; ii++) {
      if (((unsigned char *)pslot)[ii] != 0xaa) {
         bug = 1;
         break;
      }
  }
  assert(bug==0);
}

I inserted check_lookaside_freelist() in sqlite3DbFree() function, and check_lookaside_slot_is_not_overwritten() in sqlite3DbMallocRaw(), to check that the returned buffer has not been overwritten.

In setupLookaside() that initializes the lookaside allocator, I also put this line in the loop that fills the free list with the 500 slots available in the allocator's memory block, to fill the slots with 0xaa bit pattern: memset(p, 0xaa, db->lookaside.sz);

I ran again the program, and unfortunately, there is no crash. I don't detect any write-after-free operation :-(

I also put runtime.GC() in the cycpost program, as GC may run some terminators in the sqlite driver, but I am still unable to reproduce the issue.

The most probable scenario I see to corrupt the allocator is freing a slot two times.

Inital free list state:
db->lookaside.pFree -----> R ------> B -------> U

Free H slot:
db->lookaside.pFree -----> H -----> R ------> B -------> U

Free H slot again:
db->lookaside.pFree -----> H -----> H (------> H ...)

sqlite3DbMallocRaw() returns pointer to H, but db->lookaside.pFree still points to H:
db->lookaside.pFree -----> H (------> H ...)

The caller will write to H, and the H.pNext pointer seen by the free list becomes corrupt.
So, next call to sqlite3DbMallocRaw() will crash.
rin01 commented 10 years ago

@rwcarlsen If you have time, can you change the following lines in $GOPATH/src/github.com/mxk/go-sqlite/sqlite3/sqlite3.go, and run again your program ?

//#cgo CFLAGS: -DNDEBUG=1
#cgo CFLAGS: -DSQLITE_DEBUG=1
#cgo CFLAGS: -DSQLITE_MEMDEBUG=1
#cgo CFLAGS: -DSQLITE_OMIT_LOOKASIDE=1

You said that with SQLITE_OMIT_LOOKASIDE, it is stable. But with SQLITE_DEBUG and SQLITE_MEMDEBUG, if any illegal memory access occurs, it will be detected, even if it doesn't cause a crash.

rin01 commented 10 years ago

Forget my previous messages, I think the cause is the Stmt finalizer. It runs 4 times in the program. More on this tomorrow...

rin01 commented 10 years ago

I can reproduce the error.

It is a race condition between Stmt finalizer dedicated goroutine and the main goroutine. If they call sqlite3DbMallocRaw() or sqlite3DbFree() at the same time (on two OS threads), the Lookaside memory allocator gets corrupted.

You can modify sqlite3.c to boost this improbable event.

Line 8350:

#include <assert.h>
#include <stddef.h>
#include <unistd.h>     <----- add this line 

Line 19954: sqlite3DbFree(sqlite3 *db, void *p) definition

pBuf->pNext = db->lookaside.pFree;
usleep(1);                              <----- add this line 
db->lookaside.pFree = pBuf;
db->lookaside.nOut--;
return;

Line 20086: sqlite3DbMallocRaw(sqlite3 *db, int n) definition

if( db->lookaside.bEnabled ){
  if( n>db->lookaside.sz ){
    db->lookaside.anStat[1]++;
  }else if( (pBuf = db->lookaside.pFree)==0 ){
    db->lookaside.anStat[2]++;
  }else{
    usleep(1);                               <------- add this line
    db->lookaside.pFree = pBuf->pNext;
    db->lookaside.nOut++;
    db->lookaside.anStat[0]++;
    if( db->lookaside.nOut>db->lookaside.mxOut ){
      db->lookaside.mxOut = db->lookaside.nOut;
    }
    return (void*)pBuf;
  }
}

When you run

cp orig-segfault.sqlite segfault.sqlite; cycpost segfault.sqlite

the error should occur within 2 seconds, because the finalizers run at the beginning of the program.

If the program runs longer than two seconds, just type CTRL-C and run it again.
The crash occurs approximately once in 15 times.

I obtain:

fatal error: unexpected signal during runtime execution
[signal 0x7 code=0x80 addr=0x0 pc=0x40f436]

runtime stack:
runtime: unexpected return pc for runtime.sigpanic called from 0x40f436
runtime.throw(0x8543c5)
    /usr/local/go/src/pkg/runtime/panic.c:520 +0x69
runtime: unexpected return pc for runtime.sigpanic called from 0x40f436
runtime.sigpanic()
    /usr/local/go/src/pkg/runtime/os_linux.c:222 +0x3d

goroutine 16 [syscall]:
runtime.cgocall(0x406bdc, 0x7f1c14cd98f8)
    /usr/local/go/src/pkg/runtime/cgocall.c:143 +0xe5 fp=0x7f1c14cd98e0 sp=0x7f1c14cd9898
github.com/mxk/go-sqlite/sqlite3._Cfunc_sqlite3_step(0x22ebf18, 0x7f1c14cd9ad8)
    github.com/mxk/go-sqlite/sqlite3/_obj/_cgo_defun.c:610 +0x31 fp=0x7f1c14cd98f8 sp=0x7f1c14cd98e0
github.com/mxk/go-sqlite/sqlite3.(*Stmt).step(0xc208083ef0, 0x0, 0x0)
/home/nico/z_go/src/github.com/mxk/go-sqlite/sqlite3/sqlite3.go:973 +0x7c fp=0x7f1c14cd9940 sp=0x7f1c14cd98f8
github.com/mxk/go-sqlite/sqlite3.(*Stmt).exec(0xc208083ef0, 0x7f1c14cd9ad8, 0x2, 0x2, 0x0, 0x0)
    /home/nico/z_go/src/github.com/mxk/go-sqlite/sqlite3/sqlite3.go:884 +0xfa fp=0x7f1c14cd99b8 sp=0x7f1c14cd9940
github.com/mxk/go-sqlite/sqlite3.(*Stmt).Query(0xc208083ef0, 0x7f1c14cd9ad8, 0x2, 0x2, 0x0, 0x0)
/home/nico/z_go/src/github.com/mxk/go-sqlite/sqlite3/sqlite3.go:785 +0xb0 fp=0x7f1c14cd9a00 sp=0x7f1c14cd99b8
github.com/rwcarlsen/cyan/post.(*Context).getNewOwners(0xc208028180, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /home/nico/z_go/src/github.com/rwcarlsen/cyan/post/walker.go:297 +0x17c fp=0x7f1c14cd9b00 sp=0x7f1c14cd9a00
github.com/rwcarlsen/cyan/post.(*Context).walkDown(0xc208028180, 0xc208025410)
    /home/nico/z_go/src/github.com/rwcarlsen/cyan/post/walker.go:260 +0x504 fp=0x7f1c14cd9d18 sp=0x7f1c14cd9b00
github.com/rwcarlsen/cyan/post.(*Context).WalkAll(0xc208028180, 0x0, 0x0)
    /home/nico/z_go/src/github.com/rwcarlsen/cyan/post/walker.go:197 +0x408 fp=0x7f1c14cd9e18 sp=0x7f1c14cd9d18
main.main()
    /home/nico/z_go/src/github.com/rwcarlsen/cyan/cmd/cycpost/main.go:39 +0x260 fp=0x7f1c14cd9f50 sp=0x7f1c14cd9e18
runtime.main()
    /usr/local/go/src/pkg/runtime/proc.c:247 +0x11a fp=0x7f1c14cd9fa8 sp=0x7f1c14cd9f50
runtime.goexit()
    /usr/local/go/src/pkg/runtime/proc.c:1445 fp=0x7f1c14cd9fb0 sp=0x7f1c14cd9fa8
created by _rt0_go
    /usr/local/go/src/pkg/runtime/asm_amd64.s:97 +0x120

See also: https://groups.google.com/forum/#!topic/golang-nuts/QftgD_l81KQ

Compiling with SQLITE_THREADSAFE=1 should fix the problem, because sqlite will allow two OS threads to access the same connection at the same time.

With this option, I obtain no more crash.

mxk commented 10 years ago

Thanks for looking into this! SQLITE_THREADSAFE=1 will reduce the performance in the common case, so I think removing finalizers would be the better option. I added them just as an extra safety net, but I didn't consider the fact that they could run in parallel with other calls on the same connection object. Are there any good reasons for leaving them in?

rin01 commented 10 years ago

Hi, Maxim !

I didn't consider the fact that they could run in parallel

If you did not, it means that very few people do. It is the proof that finalizers are dangerous beasts. I didn't either before hitting my head on this issue ;-)

Personnally, I avoid using finalizers, and it seems I am not alone in this case. But if you remove finalizers in your driver, memory leaks are possible. If the user forgets to Close() Stmts or Conns, the memory they take in C heap will never be freed. And it is really easy to forget to close these objects. I see no easy way to force the user to close all Stmts and Conns properly when he is finished with them. In long running programs, it would be really bad.

Personnally, I am using your driver in a long running program, and just modified it a little bit and put a panic in the finalizer: e.g. panic("You have forgotten to close the Stmt object \"" + stmt.text + "\""). But I imagine it is too rude to do in a driver.

So, for the moment, I see only two possibilities:

In the end, finalizers will even hide huge memory consumption issue if a long-running program is e.g. creating a lot of Stmts and forgets to close them, which is not good. If Stmts are created in a faster rate than finalizers close them, the situation is even worse and memory will explode. So, I think it is better to remove these finalizers, after all...

rin01 commented 10 years ago

Besides, if the user reads only a few records from a table, and let the Stmt go out of scope, the lock on the database will not be released until the the finalizer runs. It would be better to let the user call Close() explicitely to release the database lock. In the end, it is one more reason to remove the finalizers.