rustic-rs / rustic

rustic - fast, encrypted, and deduplicated backups powered by Rust
https://rustic.cli.rs
Apache License 2.0
1.94k stars 70 forks source link

Rustic backup to B2 dies after error instead of retrying #1326

Open ociaw opened 2 weeks ago

ociaw commented 2 weeks ago

Rustic keeps dying during my backups to B2, using opendal:b2. As a result, I have to repeatedly re-run rustic to complete the backup. During the backup, sometimes when B2 returns an error (usually no tomes available), rustic abruptly stops with no error message and an exit code of 141. However, this doesn't always occur - other times, it correctly retries again and continues with the backup.

I'm unsure why there's no error message, or what the significance of the 141 exit code is. I've run rustic with the log level set to DEBUG, but that hasn't seemed to reveal anything more. Is there something more I can do to debug this better?

Tested Versions: 0.8.0, 0.9.1, 0.9.3 OS: TrueNas 13.1-RELEASE-p9

Truncated rustic log with Log Level = DEBUG ``` 2024-10-11T06:22:49.795402039Z [DEBUG] (240) hyper_util::client::legacy::pool: reuse idle connection for ("https", api000.backblazeb2.com) 2024-10-11T06:22:49.846311897Z [DEBUG] (240) reqwest::connect: starting new connection: https://pod-000-1075-04.backblaze.com/ 2024-10-11T06:22:49.846506576Z [DEBUG] (240) hyper_util::client::legacy::connect::dns: resolve; host=pod-000-1075-04.backblaze.com 2024-10-11T06:22:49.846686687Z [DEBUG] (3) hyper_util::client::legacy::pool: pooling idle connection for ("https", api000.backblazeb2.com) 2024-10-11T06:22:49.846892617Z [DEBUG] (477) hyper_util::client::legacy::connect::dns: resolving host="pod-000-1075-04.backblaze.com" 2024-10-11T06:22:49.858056306Z [DEBUG] (1) rustic_core::archiver::tree_archiver: new file: "/file/name/is/redacted.flac" 2024-10-11T06:22:49.939037195Z [DEBUG] (240) hyper_util::client::legacy::connect::http: connecting to 149.137.132.124:443 2024-10-11T06:22:49.979073034Z [DEBUG] (240) hyper_util::client::legacy::connect::http: connected to 149.137.132.124:443 2024-10-11T06:22:50.138864052Z [WARN] will retry after 1.914994597s because: Unexpected (temporary) at Writer::close, context: { uri: https://pod-000-1075-04.backblaze.com/b2api/v2/b2_upload_file/192976a3bf05eebe6a08d3ac/c 000_v0001075_t0004, response: Parts { status: 503, version: HTTP/1.1, headers: {"server": "nginx", "date": "Fri, 11 Oct 2024 06:22:50 GMT", "content-type": "application/json;charset=utf-8", "content-length": "87", "connect ion": "keep-alive", "cache-control": "max-age=0, no-cache, no-store", "strict-transport-security": "max-age=63072000"} }, service: b2, path: data/13/13a1ac8ec8f274cf7ac57483255dd7e465a636d03f24a0af8a39900ae7ecaea8, written : 72223669 } => B2Error { status: 503, code: "service_unavailable", message: "no tomes available" } ```
My anonymized config ```toml [backup] globs = [ "!*/Thumbs.db", "!*/.recycle/*" ] exclude-if-present = [".nobackup"] [[backup.snapshots]] label = "home/user1" sources = [ "/mnt/home/user1" ] [[backup.snapshots]] label = "home/user2" sources = [ "/mnt/home/user2" ] [[backup.snapshots]] label = "home/user3" sources = [ "/mnt/home/user3" ] [[backup.snapshots]] label = "home/user4" sources = [ "/mnt/home/user4" ] [[backup.snapshots]] label = "shared/user1" sources = [ "/mnt/shared/user1/documents", "/mnt/shared/user1/music", "/mnt/shared/user1/pictures" ] [[backup.snapshots]] label = "shared/user4" sources = [ "/mnt/shared/user4/Documents", "/mnt/shared/user4/Music" ] [[backup.snapshots]] label = "shared/user2" sources = [ "/mnt/shared/user2" ] [[backup.snapshots]] label = "shared/user3" sources = [ "/mnt/shared/user3" ] [repository] repository = "opendal:b2" password = "password" [repository.options] application_key_id = "id" application_key = "key" bucket = "bucket" bucket_id = "192976a3bf05eebe6a08d3ac" ```
aawsome commented 2 weeks ago

Thanks for opening this issue @ociaw

In fact, yes rustic should retry here as this is classified as a temporary error - and it is even saying it is retrying.

But in fact it is just stopping instead of retrying. The return code 141 indicates that rustic was terminated by a SIGPIPE signal. Can you show how you are calling the rustic binary? could it be that you pipe the output to some other program?

ociaw commented 2 weeks ago

Hmm, no piping, but I'm running it inside screen, with fish shell if it matters. The exact command is rustic backup --log-level Debug --log-file ~/rustic-log.log, though I only added the log-level and log-file options trying to track down this issue.