Open foosinn opened 4 years ago
More detailed output for the restic logs would be in the pods on the node that the restic job ran on.
Velero's implementation of restic support is in a DaemonSet, with pods on each node so that the restic pods can access all workload volumes on that node. Right now, the logs for each of those pods is on the node, and not aggregated. So finding the node is a matter of looking for the failed PodVolumeBackups
and the relevant pod, and inspecting the pod for it's node name, then finding the restic pod on that node and looking at the kubectl logs <restic pod name> -n velero
.
I was unable to find any hints why the backup failed in the output. Any idears?
Thanks a lot for your help!
Restic pod log:
time="2020-09-26T22:08:25Z" level=error msg="Error running command=restic backup --repo=***HIDDEN*** --password-file=/tmp/velero-restic-credentials-ot-richcoon666888759 --cache-dir=/scratch/.cache/restic . --tag=ns=ot-richcoon --tag=pod=ot-richcoon-mariadb-0 --tag=pod-uid=6c04f2a8-fc15-11ea-a0e6-9cdc71c0dd8e --tag=pvc-uid=8e0fd405-adf2-11e9-bacf-9cdc71c0df4c --tag=volume=data --tag=backup=daily-20200926220020 --tag=backup-uid=ab6e46ab-0043-11eb-a0e6-9cdc71c0dd8e --host=velero --json --parent=66d571c2, stdout={\"message_type\":\"status\",\"percent_done\":0,\"total_files\":1,\"total_bytes\":2752740}\n{\"message_type\":\"status\",\"percent_done\":0.029610298414193936,\"total_files\":101,\"files_done\":4,\"total_bytes\":129750229,\"bytes_done\":3841943,\"current_files\":[\"/data/ib_logfile0\",\"/data/ib_logfile1\"]}\n{\"message_type\":\"status\",\"percent_done\":0.07830972577420471,\"total_files\":129,\"files_done\":4,\"total_bytes\":153017443,\"bytes_done\":11982754,\"current_files\":[\"/data/ib_logfile0\",\"/data/ib_logfile1\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":1,\"seconds_remaining\":8,\"percent_done\":0.1062338167551264,\"total_files\":129,\"files_done\":4,\"total_bytes\":153017443,\"bytes_done\":16255627,\"current_files\":[\"/data/ib_logfile0\",\"/data/ib_logfile1\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":1,\"seconds_remaining\":8,\"percent_done\":0.15110612585520725,\"total_files\":129,\"files_done\":4,\"total_bytes\":153017443,\"bytes_done\":23121873,\"current_files\":[\"/data/ib_logfile0\",\"/data/ib_logfile1\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":1,\"seconds_remaining\":8,\"percent_done\":0.172560523050957,\"total_files\":129,\"files_done\":4,\"total_bytes\":153017443,\"bytes_done\":26404770,\"current_files\":[\"/data/ib_logfile0\",\"/data/ib_logfile1\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":1,\"seconds_remaining\":8,\"percent_done\":0.2800502031653999,\"total_files\":129,\"files_done\":4,\"total_bytes\":153017443,\"bytes_done\":42852566,\"current_files\":[\"/data/ib_logfile0\",\"/data/ib_logfile1\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":1,\"seconds_remaining\":8,\"percent_done\":0.30185872338750297,\"total_files\":129,\"files_done\":4,\"total_bytes\":153017443,\"bytes_done\":46189650,\"current_files\":[\"/data/ib_logfile0\",\"/data/ib_logfile1\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":1,\"seconds_remaining\":8,\"percent_done\":0.3775959581287736,\"total_files\":129,\"files_done\":4,\"total_bytes\":153017443,\"bytes_done\":57778768,\"current_files\":[\"/data/ib_logfile0\",\"/data/ib_logfile1\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":1,\"seconds_remaining\":8,\"percent_done\":0.41558080407865655,\"total_files\":129,\"files_done\":4,\"total_bytes\":153017443,\"bytes_done\":63591112,\"current_files\":[\"/data/ib_logfile0\",\"/data/ib_logfile1\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":1,\"seconds_remaining\":8,\"percent_done\":0.4479426897755702,\"total_files\":129,\"files_done\":4,\"total_bytes\":153017443,\"bytes_done\":68543045,\"current_files\":[\"/data/ib_logfile0\",\"/data/ib_logfile1\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":1,\"seconds_remaining\":8,\"percent_done\":0.5027193860506478,\"total_files\":129,\"files_done\":4,\"total_bytes\":153017443,\"bytes_done\":76924835,\"current_files\":[\"/data/ib_logfile0\",\"/data/ib_logfile1\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":1,\"seconds_remaining\":8,\"percent_done\":0.5354016469873961,\"total_files\":129,\"files_done\":4,\"total_bytes\":153017443,\"bytes_done\":81925791,\"current_files\":[\"/data/ib_logfile0\",\"/data/ib_logfile1\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":1,\"seconds_remaining\":8,\"percent_done\":0.5838848581465317,\"total_files\":129,\"files_done\":4,\"total_bytes\":153017443,\"bytes_done\":89344568,\"current_files\":[\"/data/ib_logfile0\",\"/data/ib_logfile1\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":1,\"seconds_remaining\":8,\"percent_done\":0.6193810139671462,\"total_files\":129,\"files_done\":4,\"total_bytes\":153017443,\"bytes_done\":94776099,\"current_files\":[\"/data/ib_logfile0\",\"/data/ib_logfile1\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":1,\"seconds_remaining\":8,\"percent_done\":0.673913692310229,\"total_files\":129,\"files_done\":4,\"total_bytes\":153017443,\"bytes_done\":103120550,\"current_files\":[\"/data/ib_logfile0\",\"/data/ib_logfile1\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":1,\"seconds_remaining\":8,\"percent_done\":0.6759880963374875,\"total_files\":129,\"files_done\":5,\"total_bytes\":153017443,\"bytes_done\":103437970,\"current_files\":[\"/data/ib_logfile0\",\"/data/ib_logfile1\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":1,\"seconds_remaining\":8,\"percent_done\":0.8421503553683092,\"total_files\":129,\"files_done\":36,\"total_bytes\":153017443,\"bytes_done\":128863694,\"current_files\":[\"/data/ib_logfile0\",\"/data/ibdata1\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":2,\"percent_done\":0.8451642862702914,\"total_files\":129,\"files_done\":47,\"total_bytes\":153017443,\"bytes_done\":129324878,\"current_files\":[\"/data/ib_logfile0\",\"/data/ibdata1\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":2,\"percent_done\":0.8451642862702914,\"total_files\":129,\"files_done\":48,\"total_bytes\":153017443,\"bytes_done\":129324878,\"current_files\":[\"/data/ib_logfile0\",\"/data/ibdata1\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":2,\"percent_done\":0.8489080947457736,\"total_files\":129,\"files_done\":103,\"total_bytes\":153017443,\"bytes_done\":129897746,\"current_files\":[\"/data/ibdata1\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":3,\"percent_done\":0.9927619166920728,\"total_files\":129,\"files_done\":125,\"total_bytes\":153017443,\"bytes_done\":151909890,\"current_files\":[\"/data/ibdata1\",\"/data/richcoon/sessions.ibd\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":3,\"percent_done\":0.9927619166920728,\"total_files\":129,\"files_done\":126,\"total_bytes\":153017443,\"bytes_done\":151909890,\"current_files\":[\"/data/ibdata1\",\"/data/richcoon/sessions.ibd\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":3,\"percent_done\":1,\"total_files\":129,\"files_done\":129,\"total_bytes\":153017443,\"bytes_done\":153017443,\"current_files\":[\"/full_dump.gz\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":4,\"percent_done\":1,\"total_files\":129,\"files_done\":129,\"total_bytes\":153017443,\"bytes_done\":153017443}\n{\"message_type\":\"status\",\"seconds_elapsed\":5,\"percent_done\":1,\"total_files\":129,\"files_done\":129,\"total_bytes\":153017443,\"bytes_done\":153017443}\n{\"message_type\":\"status\",\"second_new\":0,\"files_changed\":7,\"filesotal_files\":129,\"files_done\":129,\"total_bytes\":153017443,\"bytes_done\":153017443}\ncurrent_files\":[\"/full_dump.gz\"]}\n{\"message_type\":\"status\",\"second_new\":0,\"files_changed\":7,\"filesotal_files\":129,\"files_done\":129,\"total_bytes\":153017443,\"bytes_done\":153017443}\ncurrent_files\":[\"/full_dump.gz\"]}\n{\"message_type\":\"status\",\"seconds_elapsed\":4,\"percent_done\":1,\"total_files\":129,\"files_done\":129,\"total_bytes\":153017443,\"{\"message_type\":\"status\",\"seconds_elapsed\":5,\"percent_done\":1,\"total_files\":129,\"files_done\":129,\"total_bytes\":153017443,\"bytes_done\":153017443}\n{\"message_type\":\"status\",\"seconds_elapsed\":4,\"percent_done\":1,\"total_files\":129,\"files_done\":129,\"total_bytes\":153017443,\"{\"message_type\":\"status\",\"seconds_elapsed\":5,\"percent_done\":1,\"total_files\":129,\"files_done\":129,\"total_bytes\":153017443,\"bytes_done\":153017443}\n, stderr=" backup=velero/daily-20200926220020 controller=pod-volume-backup error="unable to find summary in restic backup command output" error.file="/github.com/vmware-tanzu/velero/pkg/restic/exec_commands.go:175" error.function=github.com/vmware-tanzu/velero/pkg/restic.getSummaryLine logSource="pkg/controller/pod_volume_backup_controller.go:283" name=daily-20200926220020-nwgvb namespace=velero
Pretty stdout:
{
"message_type": "status",
"percent_done": 0,
"total_files": 1,
"total_bytes": 2752740
}
{
"message_type": "status",
"percent_done": 0.029610298414193936,
"total_files": 101,
"files_done": 4,
"total_bytes": 129750229,
"bytes_done": 3841943,
"current_files": [
"/data/ib_logfile0",
"/data/ib_logfile1"
]
}
{
"message_type": "status",
"percent_done": 0.07830972577420471,
"total_files": 129,
"files_done": 4,
"total_bytes": 153017443,
"bytes_done": 11982754,
"current_files": [
"/data/ib_logfile0",
"/data/ib_logfile1"
]
}
{
"message_type": "status",
"seconds_elapsed": 1,
"seconds_remaining": 8,
"percent_done": 0.1062338167551264,
"total_files": 129,
"files_done": 4,
"total_bytes": 153017443,
"bytes_done": 16255627,
"current_files": [
"/data/ib_logfile0",
"/data/ib_logfile1"
]
}
{
"message_type": "status",
"seconds_elapsed": 1,
"seconds_remaining": 8,
"percent_done": 0.15110612585520725,
"total_files": 129,
"files_done": 4,
"total_bytes": 153017443,
"bytes_done": 23121873,
"current_files": [
"/data/ib_logfile0",
"/data/ib_logfile1"
]
}
{
"message_type": "status",
"seconds_elapsed": 1,
"seconds_remaining": 8,
"percent_done": 0.172560523050957,
"total_files": 129,
"files_done": 4,
"total_bytes": 153017443,
"bytes_done": 26404770,
"current_files": [
"/data/ib_logfile0",
"/data/ib_logfile1"
]
}
{
"message_type": "status",
"seconds_elapsed": 1,
"seconds_remaining": 8,
"percent_done": 0.2800502031653999,
"total_files": 129,
"files_done": 4,
"total_bytes": 153017443,
"bytes_done": 42852566,
"current_files": [
"/data/ib_logfile0",
"/data/ib_logfile1"
]
}
{
"message_type": "status",
"seconds_elapsed": 1,
"seconds_remaining": 8,
"percent_done": 0.30185872338750297,
"total_files": 129,
"files_done": 4,
"total_bytes": 153017443,
"bytes_done": 46189650,
"current_files": [
"/data/ib_logfile0",
"/data/ib_logfile1"
]
}
{
"message_type": "status",
"seconds_elapsed": 1,
"seconds_remaining": 8,
"percent_done": 0.3775959581287736,
"total_files": 129,
"files_done": 4,
"total_bytes": 153017443,
"bytes_done": 57778768,
"current_files": [
"/data/ib_logfile0",
"/data/ib_logfile1"
]
}
{
"message_type": "status",
"seconds_elapsed": 1,
"seconds_remaining": 8,
"percent_done": 0.41558080407865655,
"total_files": 129,
"files_done": 4,
"total_bytes": 153017443,
"bytes_done": 63591112,
"current_files": [
"/data/ib_logfile0",
"/data/ib_logfile1"
]
}
{
"message_type": "status",
"seconds_elapsed": 1,
"seconds_remaining": 8,
"percent_done": 0.4479426897755702,
"total_files": 129,
"files_done": 4,
"total_bytes": 153017443,
"bytes_done": 68543045,
"current_files": [
"/data/ib_logfile0",
"/data/ib_logfile1"
]
}
{
"message_type": "status",
"seconds_elapsed": 1,
"seconds_remaining": 8,
"percent_done": 0.5027193860506478,
"total_files": 129,
"files_done": 4,
"total_bytes": 153017443,
"bytes_done": 76924835,
"current_files": [
"/data/ib_logfile0",
"/data/ib_logfile1"
]
}
{
"message_type": "status",
"seconds_elapsed": 1,
"seconds_remaining": 8,
"percent_done": 0.5354016469873961,
"total_files": 129,
"files_done": 4,
"total_bytes": 153017443,
"bytes_done": 81925791,
"current_files": [
"/data/ib_logfile0",
"/data/ib_logfile1"
]
}
{
"message_type": "status",
"seconds_elapsed": 1,
"seconds_remaining": 8,
"percent_done": 0.5838848581465317,
"total_files": 129,
"files_done": 4,
"total_bytes": 153017443,
"bytes_done": 89344568,
"current_files": [
"/data/ib_logfile0",
"/data/ib_logfile1"
]
}
{
"message_type": "status",
"seconds_elapsed": 1,
"seconds_remaining": 8,
"percent_done": 0.6193810139671462,
"total_files": 129,
"files_done": 4,
"total_bytes": 153017443,
"bytes_done": 94776099,
"current_files": [
"/data/ib_logfile0",
"/data/ib_logfile1"
]
}
{
"message_type": "status",
"seconds_elapsedparse error: Invalid numeric literal at line 28, column 14
": 1,
"seconds_remaining": 8,
"percent_done": 0.673913692310229,
"total_files": 129,
"files_done": 4,
"total_bytes": 153017443,
"bytes_done": 103120550,
"current_files": [
"/data/ib_logfile0",
"/data/ib_logfile1"
]
}
{
"message_type": "status",
"seconds_elapsed": 1,
"seconds_remaining": 8,
"percent_done": 0.6759880963374875,
"total_files": 129,
"files_done": 5,
"total_bytes": 153017443,
"bytes_done": 103437970,
"current_files": [
"/data/ib_logfile0",
"/data/ib_logfile1"
]
}
{
"message_type": "status",
"seconds_elapsed": 1,
"seconds_remaining": 8,
"percent_done": 0.8421503553683092,
"total_files": 129,
"files_done": 36,
"total_bytes": 153017443,
"bytes_done": 128863694,
"current_files": [
"/data/ib_logfile0",
"/data/ibdata1"
]
}
{
"message_type": "status",
"seconds_elapsed": 2,
"percent_done": 0.8451642862702914,
"total_files": 129,
"files_done": 47,
"total_bytes": 153017443,
"bytes_done": 129324878,
"current_files": [
"/data/ib_logfile0",
"/data/ibdata1"
]
}
{
"message_type": "status",
"seconds_elapsed": 2,
"percent_done": 0.8451642862702914,
"total_files": 129,
"files_done": 48,
"total_bytes": 153017443,
"bytes_done": 129324878,
"current_files": [
"/data/ib_logfile0",
"/data/ibdata1"
]
}
{
"message_type": "status",
"seconds_elapsed": 2,
"percent_done": 0.8489080947457736,
"total_files": 129,
"files_done": 103,
"total_bytes": 153017443,
"bytes_done": 129897746,
"current_files": [
"/data/ibdata1"
]
}
{
"message_type": "status",
"seconds_elapsed": 3,
"percent_done": 0.9927619166920728,
"total_files": 129,
"files_done": 125,
"total_bytes": 153017443,
"bytes_done": 151909890,
"current_files": [
"/data/ibdata1",
"/data/richcoon/sessions.ibd"
]
}
{
"message_type": "status",
"seconds_elapsed": 3,
"percent_done": 0.9927619166920728,
"total_files": 129,
"files_done": 126,
"total_bytes": 153017443,
"bytes_done": 151909890,
"current_files": [
"/data/ibdata1",
"/data/richcoon/sessions.ibd"
]
}
{
"message_type": "status",
"seconds_elapsed": 3,
"percent_done": 1,
"total_files": 129,
"files_done": 129,
"total_bytes": 153017443,
"bytes_done": 153017443,
"current_files": [
"/full_dump.gz"
]
}
{
"message_type": "status",
"seconds_elapsed": 4,
"percent_done": 1,
"total_files": 129,
"files_done": 129,
"total_bytes": 153017443,
"bytes_done": 153017443
}
{
"message_type": "status",
"seconds_elapsed": 5,
"percent_done": 1,
"total_files": 129,
"files_done": 129,
"total_bytes": 153017443,
"bytes_done": 153017443
}
{
"message_type": "status",
"second_new": 0,
"files_changed": 7,
"filesotal_files": 129,
"files_done": 129,
"total_bytes": 153017443,
"bytes_done": 153017443
}
What steps did you take and what happened:
We have setup Velero 1.4 in our Openshift Cluster. The majority of our Backups work fine. Sometimes the restic backup fails taking almost no time. This happens randomly over all our PVCs.
What did you expect to happen:
Consistent backup results
The output of the following commands will help us better understand what's going on:
Velero Backup Logs https://paste.kube.f2o.io/raw/F6ko42DoVWLMo6LjTqUog
Anything else you would like to add:
Is there a way to see the full output of the restic command?
Environment:
velero version
):velero client config get features
):kubectl version
):Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.