maxpert / marmot

A distributed SQLite replicator built on top of NATS
https://maxpert.github.io/marmot/
MIT License
1.86k stars 42 forks source link

Crash: "double free or corruption (!prev)" #78

Open antiops opened 1 year ago

antiops commented 1 year ago

I've been getting consistent crashes on the master server (in a master node + 1 replica setup). Both are on the latest version from the realeases page. Uptime is inconsistent. Sometimes its up for a day then crashes and sometimes it crashes within a few hours.

They're using basic configs so I might be missing an important thing that I do not know about.

The replica server has been running fine with no crashes.

Configs:

Master (config-main.toml)


db_path="/home/tik/redis/videos-replica.v2.db"
seq_map_path="/tmp/videos-main.cbor"

node_id=1

publish=true replicate=false


> Replica (`config-replica.toml`)
```toml
db_path="/home/rep/tik/videos.v2.db"
seq_map_path="/tmp/videos-replica-1.cbor"

node_id=2

publish=false
replicate=true

Details

Each instance is ran through the command line like

# Master
./marmot -config config-main.toml -cluster-addr 10.1.0.12:4223 -cluster-peers 'nats://10.1.0.1:14222/'

# Replica
./marmot -config config-replica.toml -cluster-addr 10.1.0.1:14222 -cluster-peers 'nats://10.1.0.12:4223/'

The database that it's using is 1.8GB with 4 tables of which only 1 (videos_clean) is being updated frequently. The master database is a replica itself to keep it separate from the production one, a script pushes changes to it every minute.

Below is the output from the most recent crash.

marmot-v0.8.5-master-crashlog.txt

maxpert commented 1 year ago

Reading crash logs:

goroutine 12606 [syscall]:
runtime.cgocall(0xdc4280, 0xc00057ed50)
        /opt/hostedtoolcache/go/1.20.7/x64/src/runtime/cgocall.go:157 +0x5c fp=0xc00057ed28 sp=0xc00057ecf0 pc=0x40601c
github.com/mattn/go-sqlite3._Cfunc_sqlite3_close_v2(0x7f1b701351f8)
        _cgo_gotypes.go:631 +0x4c fp=0xc00057ed50 sp=0xc00057ed28 pc=0x884f0c
github.com/mattn/go-sqlite3.(*SQLiteConn).Close.func1(0x0?)
        /home/runner/go/pkg/mod/github.com/mattn/go-sqlite3@v1.14.17/sqlite3.go:1772 +0x46 fp=0xc00057ed88 sp=0xc00057ed50 pc=0x8958c6
github.com/mattn/go-sqlite3.(*SQLiteConn).Close(0xc000502840)
        /home/runner/go/pkg/mod/github.com/mattn/go-sqlite3@v1.14.17/sqlite3.go:1772 +0x25 fp=0xc00057edb8 sp=0xc00057ed88 pc=0x8957c5
database/sql.(*driverConn).finalClose.func2()
        /opt/hostedtoolcache/go/1.20.7/x64/src/database/sql/sql.go:644 +0x3c fp=0xc00057ede0 sp=0xc00057edb8 pc=0x7ee3dc
database/sql.withLock({0x12f7620, 0xc00037a6c0}, 0xc00057ee88)
        /opt/hostedtoolcache/go/1.20.7/x64/src/database/sql/sql.go:3405 +0x8c fp=0xc00057ee20 sp=0xc00057ede0 pc=0x7fc86c
database/sql.(*driverConn).finalClose(0xc00037a6c0)
        /opt/hostedtoolcache/go/1.20.7/x64/src/database/sql/sql.go:642 +0x116 fp=0xc00057eec8 sp=0xc00057ee20 pc=0x7ee296
database/sql.finalCloser.finalClose-fm()
        <autogenerated>:1 +0x2b fp=0xc00057eee0 sp=0xc00057eec8 pc=0x7fddcb
database/sql.(*driverConn).Close(0xc00037a6c0)
        /opt/hostedtoolcache/go/1.20.7/x64/src/database/sql/sql.go:623 +0x13f fp=0xc00057ef28 sp=0xc00057eee0 pc=0x7ee15f
database/sql.(*DB).connectionCleaner(0xc00047e340, 0xc00027b000?)
        /opt/hostedtoolcache/go/1.20.7/x64/src/database/sql/sql.go:1078 +0x23d fp=0xc00057efc0 sp=0xc00057ef28 pc=0x7efffd
database/sql.(*DB).startCleanerLocked.func1()
        /opt/hostedtoolcache/go/1.20.7/x64/src/database/sql/sql.go:1048 +0x2a fp=0xc00057efe0 sp=0xc00057efc0 pc=0x7efd8a
runtime.goexit()
        /opt/hostedtoolcache/go/1.20.7/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00057efe8 sp=0xc00057efe0 pc=0x46e9e1
created by database/sql.(*DB).startCleanerLocked
        /opt/hostedtoolcache/go/1.20.7/x64/src/database/sql/sql.go:1048 +0x105

Sounds like there is some sort of connection cleanup in the SQLite library that's messing it up. You see it happen infrequently due to the race condition (guessing from startCleanerLocked). Seems like I might have to do some deeper digging into github.com/mattn/go-sqlite3

maxpert commented 1 year ago

Would it be OK for you to join the discord channel and DM me? I am trying to reproduce the issue.

computinglife commented 4 months ago

Is this resolved ?

maxpert commented 3 months ago

I've not been able to reproduce the issue. I am about to push out newer version out with newer version of SQLite. Maybe you can try after that and tell me if it reproduces for you?