vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.79k stars 1.41k forks source link

Velero restart due to code panic on a rare occasion during restore on Azure provider #5935

Closed danfengliu closed 1 year ago

danfengliu commented 1 year ago

What steps did you take and what happened:

Recently all Azure pipelines of nightly hit this issue by 1 or 2 random test cases among all 40 test cases occasionally, test failed at restore phase, after reproducing and debugging according to the velero log at first, the restore failure is caused by Velero restart, and the restart is caused by a panic as logs below.

Panic log:

time="2023-03-01T08:45:31Z" level=info msg="restore completed" logSource="pkg/controller/restore_controller.go:545" restore=velero/ns-mp-1ns-mp-274751300-6bc7-4d29-8fb8-5a550c8e1574
time="2023-03-01T08:45:32Z" level=info msg="Using storage account key: true" cmd=/plugins/velero-plugin-for-microsoft-azure logSource="/go/src/velero-plugin-for-microsoft-azure/velero-plugin-for-microsoft-azure/object_store.go:360" pluginName=velero-plugin-for-microsoft-azure restore=velero/ns-mp-1ns-mp-274751300-6bc7-4d29-8fb8-5a550c8e1574
[restart-panic.txt](https://github.com/vmware-tanzu/velero/files/10859847/restart-panic.txt)

panic: runtime error: index out of range [43] with length 30

goroutine 1756 [running]:
compress/flate.(*huffmanBitWriter).indexTokens(0xc00074a000, {0xc000df4000, 0xbab, 0x0?})
    /usr/local/go/src/compress/flate/huffman_bit_writer.go:551 +0x2a5
compress/flate.(*huffmanBitWriter).writeBlock(0xc00074a000, {0xc000df4000?, 0x80?, 0x4cc?}, 0x0, {0x0, 0x0, 0x0})
    /usr/local/go/src/compress/flate/huffman_bit_writer.go:440 +0xcf
compress/flate.(*compressor).writeBlock(0xc000f7e000, {0xc000df4000?, 0x0?, 0x157?}, 0x157?)
    /usr/local/go/src/compress/flate/deflate.go:170 +0x9c
compress/flate.(*compressor).deflate(0xc000f7e000)
    /usr/local/go/src/compress/flate/deflate.go:415 +0x6d9
compress/flate.(*compressor).write(0xc000f7e000, {0xc0000c9200?, 0x157, 0x157?})
    /usr/local/go/src/compress/flate/deflate.go:554 +0x82
compress/flate.(*Writer).Write(...)
    /usr/local/go/src/compress/flate/deflate.go:712
compress/gzip.(*Writer).Write(0xc000553600, {0xc0000c9200, 0x157, 0x424})
    /usr/local/go/src/compress/gzip/gzip.go:196 +0x34a
io.(*multiWriter).Write(0xc000371200?, {0xc0000c9200, 0x157, 0x424})
    /usr/local/go/src/io/multi.go:60 +0x86
github.com/sirupsen/logrus.(*Entry).write(0xc000473260)
    /go/pkg/mod/github.com/sirupsen/logrus@v1.8.1/entry.go:286 +0x15b
github.com/sirupsen/logrus.(*Entry).log(0xc0004731f0, 0x4, {0xc0005adac0, 0x1f})
    /go/pkg/mod/github.com/sirupsen/logrus@v1.8.1/entry.go:251 +0x3da
github.com/sirupsen/logrus.(*Entry).Log(0xc0004731f0, 0x4, {0xc000db9d98?, 0xc000db9da8?, 0xc000db9da8?})
    /go/pkg/mod/github.com/sirupsen/logrus@v1.8.1/entry.go:293 +0x4f
github.com/sirupsen/logrus.(*Entry).Info(...)
    /go/pkg/mod/github.com/sirupsen/logrus@v1.8.1/entry.go:310
github.com/vmware-tanzu/velero/pkg/plugin/clientmgmt/process.(*logrusAdapter).Info(0xc0009f3620, {0xc0005adaa0, 0x1f}, {0xc000122a00?, 0x2?, 0xc000f4c610?})
    /go/src/github.com/vmware-tanzu/velero/pkg/plugin/clientmgmt/process/logrus_adapter.go:80 +0x9c
github.com/hashicorp/go-plugin.(*Client).logStderr(0xc00063c2c0, {0x2a86f60?, 0xc000606238?})
    /go/pkg/mod/github.com/hashicorp/go-plugin@v1.4.3/client.go:1035 +0xc07
created by github.com/hashicorp/go-plugin.(*Client).Start
    /go/pkg/mod/github.com/hashicorp/go-plugin@v1.4.3/client.go:611 +0x144c
..Kubectl logs = /usr/local/bin/kubectl logs -n velero velero-5fc7d87f64-94q5t

What did you expect to happen:

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

Anything else you would like to add:

Environment:

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

Lyndon-Li commented 1 year ago

Fixed by #5956