nspcc-dev / neofs-node

NeoFS is a decentralized distributed object storage integrated with the Neo blockchain
https://fs.neo.org
GNU General Public License v3.0
31 stars 38 forks source link

Panic on deletion from peapod #2813

Open roman-khimov opened 2 months ago

roman-khimov commented 2 months ago

Expected Behavior

No panics.

Current Behavior

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x110 pc=0xa0d5ba]

goroutine 210 [running]:
go.etcd.io/bbolt.(*DB).page(...)
        /home/rik/go/pkg/mod/go.etcd.io/bbolt@v1.3.9/db.go:1078
go.etcd.io/bbolt.(*Tx).page(0x0, 0x4)
        /home/rik/go/pkg/mod/go.etcd.io/bbolt@v1.3.9/tx.go:533 +0x5a
go.etcd.io/bbolt.(*Bucket).pageNode(0xc00017db00?, 0x2?)
        /home/rik/go/pkg/mod/go.etcd.io/bbolt@v1.3.9/bucket.go:748 +0x8d
go.etcd.io/bbolt.(*Cursor).search(0xc0005e7148, {0xc0003c6100, 0x40, 0x40}, 0xab14272b8f4bccb9?)
        /home/rik/go/pkg/mod/go.etcd.io/bbolt@v1.3.9/cursor.go:272 +0x54
go.etcd.io/bbolt.(*Cursor).seek(0xc0005e7148, {0xc0003c6100?, 0x2323d16dcfebc235?, 0xef247803e7c508ee?})
        /home/rik/go/pkg/mod/go.etcd.io/bbolt@v1.3.9/cursor.go:159 +0x36
go.etcd.io/bbolt.(*Bucket).Get(0xc00017db00, {0xc0003c6100, 0x40, 0x40})
        /home/rik/go/pkg/mod/go.etcd.io/bbolt@v1.3.9/bucket.go:263 +0x8c
github.com/nspcc-dev/neofs-node/pkg/local_object_storage/blobstor/peapod.(*Peapod).Delete.func1(0xc0005e7200?)
        /home/rik/neofs-node/pkg/local_object_storage/blobstor/peapod/peapod.go:434 +0x67
github.com/nspcc-dev/neofs-node/pkg/local_object_storage/blobstor/peapod.(*Peapod).batch(0xc00025c620, {0xe42db0, 0xc0000380c0}, 0xc0005e72e0)
        /home/rik/neofs-node/pkg/local_object_storage/blobstor/peapod/peapod.go:464 +0x131
github.com/nspcc-dev/neofs-node/pkg/local_object_storage/blobstor/peapod.(*Peapod).Delete(0x7f40754a06e8?, {{{0x35, 0xc2, 0xeb, 0xcf, 0x6d, 0xd1, 0x23, 0x23, 0xee, ...}, ...}, ...})
        /home/rik/neofs-node/pkg/local_object_storage/blobstor/peapod/peapod.go:432 +0xa5
github.com/nspcc-dev/neofs-node/pkg/local_object_storage/blobstor.(*BlobStor).Delete(0xc00025c690, {{{0x35, 0xc2, 0xeb, 0xcf, 0x6d, 0xd1, 0x23, 0x23, 0xee, ...}, ...}, ...})
        /home/rik/neofs-node/pkg/local_object_storage/blobstor/delete.go:34 +0x15e
github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard.(*Shard).delete(0xc000112280, {{0xc0002b6e00?, 0x0?, 0x0?}, 0x0?})
        /home/rik/neofs-node/pkg/local_object_storage/shard/delete.go:108 +0xc18
github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard.(*Shard).removeGarbage(0xc000112280)
        /home/rik/neofs-node/pkg/local_object_storage/shard/gc.go:210 +0x34a
github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard.(*gc).tickRemover(0xc00060e2a0)
        /home/rik/neofs-node/pkg/local_object_storage/shard/gc.go:170 +0xba
created by github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard.(*gc).init
        /home/rik/neofs-node/pkg/local_object_storage/shard/gc.go:107 +0x145
FAIL    github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine 59.857s

Possible Solution

Either it's a BoltDB issue or some really weird test setup problem that triggers it. Let it stay here for a while.

Steps to Reproduce (for bugs)

I have no idea. Just got this during some test runs.

Your Environment

roman-khimov commented 2 months ago

Whoa. Testnet:

Apr 19 20:03:44 titan4 neofs-node[419988]: 2024-04-19T20:03:44.575Z        info        log/log.go:12        local object storage operation        {"shard_id": "2xcLzuvvNzKg8jELk1mn4q", "address": "cWaXnFSPr5teUdtUu8MFwsm2LJ3W7NanzBpWWmCCmeF/3PVjGxCNEyTCuhinjVR3iBmp5qhGzSAArkQWxwe6VThC", "op": "DELETE", "type": "peapod", "storage_id": "peapod"}
Apr 19 20:03:44 titan4 neofs-node[419988]: panic: runtime error: invalid memory address or nil pointer dereference
Apr 19 20:03:44 titan4 neofs-node[419988]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x138 pc=0x9a14a9]
Apr 19 20:03:44 titan4 neofs-node[419988]: goroutine 221 [running]:
Apr 19 20:03:44 titan4 neofs-node[419988]: go.etcd.io/bbolt.(*DB).page(...)
Apr 19 20:03:44 titan4 neofs-node[419988]:         go.etcd.io/bbolt@v1.3.9/db.go:1078
Apr 19 20:03:44 titan4 neofs-node[419988]: go.etcd.io/bbolt.(*Tx).page(0xfd0da0?, 0xc075b75260?)
Apr 19 20:03:44 titan4 neofs-node[419988]:         go.etcd.io/bbolt@v1.3.9/tx.go:533 +0x49
Apr 19 20:03:44 titan4 neofs-node[419988]: go.etcd.io/bbolt.(*Bucket).pageNode(0xc00904b680?, 0x1ef?)
Apr 19 20:03:44 titan4 neofs-node[419988]:         go.etcd.io/bbolt@v1.3.9/bucket.go:748 +0x85
Apr 19 20:03:44 titan4 neofs-node[419988]: go.etcd.io/bbolt.(*Cursor).search(0xc005089160, {0xc012199440, 0x40, 0x40}, 0x4866acd85d38d679?)
Apr 19 20:03:44 titan4 neofs-node[419988]:         go.etcd.io/bbolt@v1.3.9/cursor.go:272 +0x48
Apr 19 20:03:44 titan4 neofs-node[419988]: go.etcd.io/bbolt.(*Cursor).seek(0xc005089160, {0xc012199440?, 0x682e1ef6bbc51809?, 0x35f0a6fc0a89190c?})
Apr 19 20:03:44 titan4 neofs-node[419988]:         go.etcd.io/bbolt@v1.3.9/cursor.go:159 +0x2e
Apr 19 20:03:44 titan4 neofs-node[419988]: go.etcd.io/bbolt.(*Bucket).Get(0x682e1ef6bbc51809?, {0xc012199440, 0x40, 0x4866acd85d38d679?})
Apr 19 20:03:44 titan4 neofs-node[419988]:         go.etcd.io/bbolt@v1.3.9/bucket.go:263 +0x58
Apr 19 20:03:44 titan4 neofs-node[419988]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/blobstor/peapod.(*Peapod).Delete.func1(0xc00904b680)
Apr 19 20:03:44 titan4 neofs-node[419988]:         github.com/nspcc-dev/neofs-node/pkg/local_object_storage/blobstor/peapod/peapod.go:434 +0x65
Apr 19 20:03:44 titan4 neofs-node[419988]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/blobstor/peapod.(*Peapod).batch(0xc0001f1b90, {0x13d6830, 0x1e57220}, 0xc0050892c8)
Apr 19 20:03:44 titan4 neofs-node[419988]:         github.com/nspcc-dev/neofs-node/pkg/local_object_storage/blobstor/peapod/peapod.go:464 +0x113
Apr 19 20:03:44 titan4 neofs-node[419988]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/blobstor/peapod.(*Peapod).Delete(0x11839a8?, {{{0x9, 0x18, 0xc5, 0xbb, 0xf6, 0x1e, 0x2e, 0x68, 0xc, ...}, ...}, ...})
Apr 19 20:03:44 titan4 neofs-node[419988]:         github.com/nspcc-dev/neofs-node/pkg/local_object_storage/blobstor/peapod/peapod.go:432 +0x8b
Apr 19 20:03:44 titan4 neofs-node[419988]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/blobstor.(*BlobStor).Delete(0xc0001f1d50, {{{0x9, 0x18, 0xc5, 0xbb, 0xf6, 0x1e, 0x2e, 0x68, 0xc, ...}, ...}, ...})
Apr 19 20:03:44 titan4 neofs-node[419988]:         github.com/nspcc-dev/neofs-node/pkg/local_object_storage/blobstor/delete.go:34 +0x159
Apr 19 20:03:44 titan4 neofs-node[419988]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard.(*Shard).delete(0xc00169f9a0, {{0xc075856000?, 0x1d2fab44c39eaa94?, 0xcb5ff16b99ac1623?}, 0x80?})
Apr 19 20:03:44 titan4 neofs-node[419988]:         github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard/delete.go:108 +0xc18
Apr 19 20:03:44 titan4 neofs-node[419988]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard.(*Shard).removeGarbage(0xc00169f9a0)
Apr 19 20:03:44 titan4 neofs-node[419988]:         github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard/gc.go:210 +0x331
Apr 19 20:03:44 titan4 neofs-node[419988]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard.(*gc).tickRemover(0xc001e206c0)
Apr 19 20:03:44 titan4 neofs-node[419988]:         github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard/gc.go:170 +0xb3
Apr 19 20:03:44 titan4 neofs-node[419988]: created by github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard.(*gc).init in goroutine 1
Apr 19 20:03:44 titan4 neofs-node[419988]:         github.com/nspcc-dev/neofs-node/pkg/local_object_storage/shard/gc.go:107 +0x13c
roman-khimov commented 2 months ago

Preceeding the testnet case:

Apr 19 20:03:44 titan4 neofs-node[419988]: 2024-04-19T20:03:44.565Z        info        neofs-node/grpc.go:138        stopping gRPC server...        {"name": "NeoFS Control API"}
Apr 19 20:03:44 titan4 neofs-node[419988]: 2024-04-19T20:03:44.566Z        info        neofs-node/grpc.go:154        gRPC server stopped successfully        {"name": "NeoFS Control API"}
Apr 19 20:03:44 titan4 neofs-node[419988]: 2024-04-19T20:03:44.566Z        info        neofs-node/storage.go:76        closing components of the storage engine...
Apr 19 20:03:44 titan4 neofs-node[419988]: 2024-04-19T20:03:44.571Z        info        shard/gc.go:181        waiting for GC workers to stop...
Apr 19 20:03:44 titan4 neofs-node[419988]: 2024-04-19T20:03:44.571Z        warn        shard/gc.go:117        stop event listener by closed channel
Apr 19 20:03:44 titan4 neofs-node[419988]: 2024-04-19T20:03:44.571Z        info        log/log.go:12        local object storage operation        {"shard_id": "2xcLzuvvNzKg8jELk1mn4q", "address": "cWaXnFSPr5teUdtUu8MFwsm2LJ3W7NanzBpWWmCCmeF/3GH4MNDCMXpkd8rWw56Q8eGPF3ogxyVqSkYXzXiCnPDd", "op": "DELETE", "type": "peapod", "storage_id": "peapod"}
Apr 19 20:03:44 titan4 neofs-node[419988]: 2024-04-19T20:03:44.574Z        info        shard/gc.go:181        waiting for GC workers to stop...
Apr 19 20:03:44 titan4 neofs-node[419988]: 2024-04-19T20:03:44.574Z        warn        shard/gc.go:117        stop event listener by closed channel
Apr 19 20:03:44 titan4 neofs-node[419988]: 2024-04-19T20:03:44.575Z        info        log/log.go:12        local object storage operation        {"shard_id": "2xcLzuvvNzKg8jELk1mn4q", "address": "cWaXnFSPr5teUdtUu8MFwsm2LJ3W7NanzBpWWmCCmeF/3PVjGxCNEyTCuhinjVR3iBmp5qhGzSAArkQWxwe6VThC", "op": "DELETE", "type": "peapod", "storage_id": "peapod"}

It's a concurrent DB Close() issue, GC runs in its own thread and tries to do things after the DB is closed. Not critical for now (we may deprecate peapods after #2814).