zilliztech / milvus-backup

Backup and restore tool for Milvus
Apache License 2.0
110 stars 38 forks source link

[Bug]: when the partition key is enabled, restore will fail with error `execute job not allow to set partition name for collection with partition key` #317

Closed zhuwenxing closed 1 month ago

zhuwenxing commented 3 months ago

Current Behavior

[2024-03-22T08:12:16.835Z] [2024-03-22 08:11:52 - INFO - ci_test]: restore_backup: {'requestId': '837dbf3f-e823-11ee-a475-7e8047ed0b4a', 'code': 3, 'msg': 'workerpool: execute job not allow to set partition name for collection with partition key: importing data failed', 'data': {'id': '837dcedf-e823-11ee-a475-7e8047ed0b4a', 'state_code': 1, 'start_time': 1711094972, 'collection_restore_tasks': [{'id': '837e1f50-e823-11ee-a475-7e8047ed0b4a', 'state_code': 1, 'start_time': 1711094972, 'target_collection_name': 'restore_backup_7PiyVGVm_bak', 'restored_size': 0, 'to_restore_size': 70393, 'target_db_name': 'default'}], 'restored_size': 0, 'to_restore_size': 0}} (test_restore_backup.py:388)

Expected Behavior

No response

Steps To Reproduce

No response

Environment

No response

Anything else?

No response

balazik commented 2 months ago

We encountered the same bug when backing up our collection that uses an INT64 partition key and has 10 partitions. The milvus-backup tool successfully creates the backup, but during restoration, it fails with the error message: "not allowed to set partition name for a collection with a partition key: importing data failed." To reproduce the issue, simply add an INT64 partition key with random values and 10 partitions to a collection, populate it with random data, and attempt a backup.

Is there a workaround for restoring a collection with a partition key and multiple partitions, or do we need to wait for a bug fix?

It's possible that the issue originated here: https://github.com/milvus-io/milvus/issues/25586

bigsheeper commented 2 months ago

/assign

bigsheeper commented 1 month ago

/assign @zhuwenxing /unassign please help to verify

zhuwenxing commented 1 month ago
[2024/05/09 02:40:10.286 +00:00] [INFO] [storage/remote_chunk_manager.go:324] ["finish walk through objects"] [prefix=backup/backup_2b8CF5BY/binlogs/insert_log/449620733881434714/449620733881434745/449620733881434765/449620733881434765/] [recursive=true]
[2024/05/09 02:40:10.303 +00:00] [ERROR] [conc/options.go:54] ["Conc pool panicked"] [panic="runtime error: invalid memory address or nil pointer dereference"] [stack="github.com/milvus-io/milvus/pkg/util/conc.(*poolOption).antsOptions.func1\n\t/go/src/github.com/milvus-io/milvus/pkg/util/conc/options.go:54\ngithub.com/panjf2000/ants/v2.(*goWorker).run.func1.1\n\t/go/pkg/mod/github.com/panjf2000/ants/v2@v2.7.2/worker.go:54\nruntime.gopanic\n\t/usr/local/go/src/runtime/panic.go:884\ngithub.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1.1\n\t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:74\nruntime.gopanic\n\t/usr/local/go/src/runtime/panic.go:884\nruntime.panicmem\n\t/usr/local/go/src/runtime/panic.go:260\nruntime.sigpanic\n\t/usr/local/go/src/runtime/signal_unix.go:841\ngithub.com/milvus-io/milvus/internal/datanode/importv2.GetRowsStats\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/importv2/hash.go:117\ngithub.com/milvus-io/milvus/internal/datanode/importv2.(*scheduler).readFileStat\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/importv2/scheduler.go:221\ngithub.com/milvus-io/milvus/internal/datanode/importv2.(*scheduler).PreImport.func2\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/importv2/scheduler.go:171\ngithub.com/milvus-io/milvus/internal/datanode/importv2.(*scheduler).PreImport.func3\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/importv2/scheduler.go:186\ngithub.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n\t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\ngithub.com/panjf2000/ants/v2.(*goWorker).run.func1\n\t/go/pkg/mod/github.com/panjf2000/ants/v2@v2.7.2/worker.go:67"]
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x4269287]

goroutine 2054 [running]:
panic({0x4eb70e0, 0x7a75fb0})
    /usr/local/go/src/runtime/panic.go:987 +0x3bb fp=0xc0034134f8 sp=0xc003413438 pc=0x1c15f1b
github.com/milvus-io/milvus/pkg/util/conc.(*poolOption).antsOptions.func1({0x4eb70e0, 0x7a75fb0})
    /go/src/github.com/milvus-io/milvus/pkg/util/conc/options.go:56 +0x15b fp=0xc0034135c0 sp=0xc0034134f8 pc=0x368225b
github.com/panjf2000/ants/v2.(*goWorker).run.func1.1()
    /go/pkg/mod/github.com/panjf2000/ants/v2@v2.7.2/worker.go:54 +0x75 fp=0xc003413638 sp=0xc0034135c0 pc=0x367f375
runtime.deferCallSave(0xc003413708, 0xc003413fb8?)
    /usr/local/go/src/runtime/panic.go:796 +0x88 fp=0xc003413648 sp=0xc003413638 pc=0x1c15b08
runtime.runOpenDeferFrame(0xc001930be0)
    /usr/local/go/src/runtime/panic.go:769 +0x1b4 fp=0xc003413690 sp=0xc003413648 pc=0x1c15934
panic({0x4eb70e0, 0x7a75fb0})
    /usr/local/go/src/runtime/panic.go:884 +0x213 fp=0xc003413750 sp=0xc003413690 pc=0x1c15d73
github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1.1()
    /go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:74 +0x97 fp=0xc0034137b0 sp=0xc003413750 pc=0x4833717
runtime.deferCallSave(0xc003413880, 0xc003413f50?)
    /usr/local/go/src/runtime/panic.go:796 +0x88 fp=0xc0034137c0 sp=0xc0034137b0 pc=0x1c15b08
runtime.runOpenDeferFrame(0xc001931770)
    /usr/local/go/src/runtime/panic.go:769 +0x1b4 fp=0xc003413808 sp=0xc0034137c0 pc=0x1c15934
panic({0x4eb70e0, 0x7a75fb0})
    /usr/local/go/src/runtime/panic.go:884 +0x213 fp=0xc0034138c8 sp=0xc003413808 pc=0x1c15d73
runtime.panicmem(...)
    /usr/local/go/src/runtime/panic.go:260
runtime.sigpanic()
    /usr/local/go/src/runtime/signal_unix.go:841 +0x37d fp=0xc003413928 sp=0xc0034138c8 pc=0x1c2e21d
github.com/milvus-io/milvus/internal/datanode/importv2.GetRowsStats({0x5b64b00, 0xc00047f7a0}, 0xc0019adb80)
    /go/src/github.com/milvus-io/milvus/internal/datanode/importv2/hash.go:117 +0x3e7 fp=0xc003413b00 sp=0xc003413928 pc=0x4269287
github.com/milvus-io/milvus/internal/datanode/importv2.(*scheduler).readFileStat(0xc0010d9040, {0x5b16c50, 0xc002c8cba0}, {0x5b64b00, 0xc00047f7a0}, 0xc002cdad20?)
    /go/src/github.com/milvus-io/milvus/internal/datanode/importv2/scheduler.go:221 +0x3a7 fp=0xc003413cc0 sp=0xc003413b00 pc=0x426c487
github.com/milvus-io/milvus/internal/datanode/importv2.(*scheduler).PreImport.func2(0xc0028fcfc8?, 0xc0015e3840)
    /go/src/github.com/milvus-io/milvus/internal/datanode/importv2/scheduler.go:171 +0x1bf fp=0xc003413f08 sp=0xc003413cc0 pc=0x426bcdf
github.com/milvus-io/milvus/internal/datanode/importv2.(*scheduler).PreImport.func3()
    /go/src/github.com/milvus-io/milvus/internal/datanode/importv2/scheduler.go:186 +0x25 fp=0xc003413f28 sp=0xc003413f08 pc=0x426bae5
github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1()
    /go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81 +0xbc fp=0xc003413f88 sp=0xc003413f28 pc=0x48335dc
github.com/panjf2000/ants/v2.(*goWorker).run.func1()
    /go/pkg/mod/github.com/panjf2000/ants/v2@v2.7.2/worker.go:67 +0x97 fp=0xc003413fe0 sp=0xc003413f88 pc=0x367f277
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc003413fe8 sp=0xc003413fe0 pc=0x1c4fec1
created by github.com/panjf2000/ants/v2.(*goWorker).run
    /go/pkg/mod/github.com/panjf2000/ants/v2@v2.7.2/worker.go:48 +0x65

@bigsheeper with image tag master-20240508-17a79f4c-amd64, restore the backup when partition key is enabled will make Milvus panic

zhuwenxing commented 1 month ago

/unassign

/assign @bigsheeper

bigsheeper commented 1 month ago

/assign @zhuwenxing /unassign please help to verify

zhuwenxing commented 1 month ago

verified and fixed