Closed tuusberg closed 1 year ago
I forgot to add that we tried enabling logging in the plugin by rebuilding its image but it does not show any logs (tried both fmt.Print
and logrus logger.
@tuusberg Would you provide Velero log bundle? We need more info to troubleshoot the problem. Thanks.
@tuusberg
Plugins log are printed into Velero server's log file.
Just use velero debug
command to collect the all the information used for debug, then upload the generated tarbarll file.
If the plugins failed to clean snapshots, I think that should happen during backup deletion. Please check whether there are some error information in the DeleteBackupRequest's status.
hey @Lyndon-Li @blackpiglet. I am attaching a velero debug bundle: bundle-2022-12-05-15-03-42.tar.gz
we don't see any logs coming from the plugin because it constantly gets killed. The bundle also does not have any plugin logs.
Plugins are not running as a Daemon process like the Velero server. They are binaries called by Velero server. After the task finished, the plugins will exist,so there are some logs about plugins exist, and IMO, that's normal. It doesn't mean the plugin process is caught by something like crash.
I found some logs related to GCP plugin from the debug bundle. Most of the logs are printed by the Go plugin framework github.com/hashicorp/go-plugin
. They are printed when plugin exists.
From the log, I hadn't find any thing that can make GCP plugins fail to delete snapshot.
time="2022-12-05T18:03:53Z" level=debug msg="received EOF, stopping recv loop" backup=velero/velero-hourly-20221205180037 cmd=/velero err="rpc error: code = Unavailable desc = error reading from server: EOF" logSource="pkg/plugin/clientmgmt/logrus_adapter.go:75" pluginName=stdio
time="2022-12-05T18:03:53Z" level=debug msg="plugin process exited" backup=velero/velero-hourly-20221205180037 cmd=/velero logSource="pkg/plugin/clientmgmt/logrus_adapter.go:75" path=/velero pid=54201
time="2022-12-05T18:03:53Z" level=debug msg="plugin exited" backup=velero/velero-hourly-20221205180037 cmd=/velero logSource="pkg/plugin/clientmgmt/logrus_adapter.go:75"
time="2022-12-05T18:03:53Z" level=debug msg="received EOF, stopping recv loop" backup=velero/velero-hourly-20221205180037 cmd=/plugins/velero-plugin-for-gcp err="rpc error: code = Unavailable desc = error reading from server: EOF" logSource="pkg/plugin/clientmgmt/logrus_adapter.go:75" pluginName=stdio
time="2022-12-05T18:03:53Z" level=debug msg="plugin process exited" backup=velero/velero-hourly-20221205180037 cmd=/plugins/velero-plugin-for-gcp logSource="pkg/plugin/clientmgmt/logrus_adapter.go:75" path=/plugins/velero-plugin-for-gcp pid=54210
time="2022-12-05T18:03:53Z" level=debug msg="plugin exited" backup=velero/velero-hourly-20221205180037 cmd=/plugins/velero-plugin-for-gcp logSource="pkg/plugin/clientmgmt/logrus_adapter.go:75"
@blackpiglet thank you for your response. We suspect that backups are left behind because of this error, the plugin gets restarted constantly only in this particular environment. Could it be the reason we are seeing hundreds of thousands snapshots left untouched after Velero backups are deleted?
No. I don't think these errors will make plugins fail to delete related snapshot. These logs are normal for plugin existing. There should be other reason if snapshots are not deleted during backup deletion, and if there is the case, there should be some error printed, when backup is deleted.
Could you try to create a backup with snapshot, then delete it to verify whether related snapshots are left behind, and collect the debug bundle?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
What steps did you take and what happened:
We are running Velero in multiple environments and recently noticed that the number of GCP Compute Engine snapshots is constantly growing in one of them. Velero does not delete some of the snapshots and we believe that it has to do with the fact that the plugin we are using is getting restarted constantly:
What did you expect to happen:
That error message should not be there (we don't see it happening in other envs) and all snapshots should be deleted when their backups expire.
The following information will help us better understand what's going on:
If you are using velero v1.7.0+:
Please use
velero debug --backup <backupname> --restore <restorename>
to generate the support bundle, and attach to this issue, more options please refer tovelero debug --help
If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero
velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename>
orkubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>
Anything else you would like to add:
It only happens in one environment, all other envs are fine.
Environment:
velero version
): v1.9.3velero client config get features
): NOT SETkubectl version
): v1.22.15/etc/os-release
):Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.