kdave / btrfs-progs

Development of userspace BTRFS tools
GNU General Public License v2.0
563 stars 243 forks source link

btrfs-receive complains about not being able to open its temporary file(s) (e.g. o123802-148688-0) #147

Open msoltyspl opened 6 years ago

msoltyspl commented 6 years ago

(I made the post on mailing list as well but it will get/got burdened under emails - so adding issue here as well)

I have small btrfs filesystem (originally used for systemd's containers, mounted over /var/lib/machines as a loopback). The filesystem is fully clean - or at least seems so - no errors of any kind are reported via btrfs-scrub or btrfs-check.

The structure is as follows:

17:35 # btrfs subvolume list /src -qu
ID 263 gen 141492 top level 5 parent_uuid -                                    uuid e3a77929-0e4d-6744-9aed-d4d6a4de5f78 path xenial
ID 340 gen 134847 top level 5 parent_uuid e3a77929-0e4d-6744-9aed-d4d6a4de5f78 uuid f05f0983-3954-e440-ba54-3cc1d458f317 path xenial2
ID 461 gen 160056 top level 5 parent_uuid f05f0983-3954-e440-ba54-3cc1d458f317 uuid 3292eba1-4129-b841-9ab9-8a4b03d79187 path orig1
ID 462 gen 160532 top level 5 parent_uuid f05f0983-3954-e440-ba54-3cc1d458f317 uuid c806ca4d-40c1-e04a-b33f-7799648f8aff path tr
ID 464 gen 160056 top level 5 parent_uuid f05f0983-3954-e440-ba54-3cc1d458f317 uuid 51de47cb-9fbb-a64b-84c9-0ed7614cba22 path edge1
ID 465 gen 160056 top level 5 parent_uuid f05f0983-3954-e440-ba54-3cc1d458f317 uuid bcfe5026-d781-b04e-83e1-c20b9b57f1c6 path edge2
ID 466 gen 160542 top level 5 parent_uuid f05f0983-3954-e440-ba54-3cc1d458f317 uuid bc3035c5-1609-be4f-959a-2cf89559df08 path anal1
ID 475 gen 160062 top level 5 parent_uuid f05f0983-3954-e440-ba54-3cc1d458f317 uuid 5e23ff06-dbbb-7743-966f-75408084be36 path back1
ID 488 gen 160056 top level 5 parent_uuid 5e23ff06-dbbb-7743-966f-75408084be36 uuid 06a5956f-864d-4243-b303-3e9768a24a31 path back2

The interesting (problematic) part is xenial -> xenial2 -> back1 -> back2

As I recreated my main filesystem as btrfs, I wanted to move the old stuff with btrfs send/receive - and everything worked fine except back2.

mount /dev/loop0 /src -o noatime
for x in /src/*; do btrfs property set "$x" ro; done
btrfs send                 /src/xenial  | btrfs receive -v -e /var/lib/machines/
btrfs send -p /src/xenial  /src/xenial2 | btrfs receive -v -e /var/lib/machines/
btrfs send -p /src/xenial2 /src/orig1   | btrfs receive -v -e /var/lib/machines/
# ...
btrfs send -p /src/xenial2 /src/back1   | btrfs receive -v -e /var/lib/machines/
# and the failing one:
btrfs send -p /src/back1   /src/back2   | btrfs receive -v -e /var/lib/machines/

The last of the above commands exits with ERROR: cannot open /var/lib/machines/back2/o123802-148688-0: No such file or directory.

This looks like a bug, though perhaps I'm doing something fundamentally wrong. The sequence is 100% reproducible (I keep the source fs intact - so I can provide any debug output or assist as necessary).

Tested (so far) with:

kernel version: 4.18.3, 4.18.9 btrfs-progs: 4.17.1

msoltyspl commented 6 years ago

Additional info:

I've added -E0 and multiple -v options to btrfs-receive, this is how it looks in the debug log:

utimes home/ansible
unlink o123789-148160-0
utimes var/lib/postgresql
utimes var/lib/postgresql
mkfile o123802-148688-0
rename etc/postgresql/9.6/main/conf.d/02-master.conf.example -> o124068-148688-0
rename o123802-148688-0 -> etc/postgresql/9.6/main/conf.d/02-master.conf.example
utimes etc/postgresql/9.6/main/conf.d
ERROR: cannot open /dst/back2/o123802-148688-0: No such file or directory
chown o123802-148688-0 - uid=109, gid=114
ERROR: chown o123802-148688-0 failed: No such file or directory
chmod o123802-148688-0 - mode=0644
ERROR: chmod o123802-148688-0 failed: No such file or directory
utimes o123802-148688-0
ERROR: utimes o123802-148688-0 failed: No such file or directory
unlink o123805-148167-0/id_rsa
mkfile o123804-148688-0
rename o123804-148688-0 -> etc/postgresql/9.6/main/recovery.conf.example
utimes etc/postgresql/9.6/main

So it looks like btrfs-receive:

When run with -E0, the receiving side is simply missing the above file.

msoltyspl commented 5 years ago

For the record, still happening with:

kernel: 4.20.12 btrfs-progs: 4.20.1

Gronkdalonka commented 5 years ago

Exactly the same behavior on kernel 5.1.18 / 5.2.1 / 5.2.2 and btrfs-progs 4.20.1 / 5.2

ghost commented 5 years ago

You weren't doing any deduplication on the source volume? That could cause issues I've heard. https://github.com/Zygo/bees/issues/115

Gronkdalonka commented 5 years ago

No, no dedupe took place.

msoltyspl commented 5 years ago

For the record I still have the btrfs filesystem mentioned in the first post - in case it could help with this bug.

Gronkdalonka commented 5 years ago

For the time beeing i created a new subvol and rsynced the content but i also preserved the original subvol and snapshots for testing if needed.

mikhailnov commented 4 years ago

+1, also encountered this... Did not use bees or other deduplicaions utils

SudoerReodus commented 4 years ago

I get the same "Error: utimes ...." on a particular file system, when using btrfs send and receive (with -p or -c options) to move snapshots to another filesystem (on another disk) ... even tried reformating the filesystems on both source and destination disk but it doesn't solve the problem

It only happens when using -p or -c options, but if i send the full subvolume (without -p or -c options) it works without any problem ...

Also if i use the oldest snapshot as parent/clone-source (with -p or -c options) it works without problem, it only happens if i want to use a more recent snapshot as parent/clone-source

Also if i delete all snapshots older than the snapshot i want to use as parent/clone-source (and i.e. make that snapshot the oldest remaining snapshot) it works without problem ...

alialipr commented 4 years ago

I'm encountering a similar problem: ERROR: unlink foo.bar failed: No such file or directory ERROR: unlink foo.bar failed: No such file or directory ERROR: unlink foo.bar failed: No such file or directory it only occurs for some file systems

bronger commented 3 years ago

With btrfs-progs v5.4.1, I see

root@myhost:/mnt/mymount# btrfs receive -E 0 . < send-output.btrfs 
At snapshot VirtualBox_brad_latest
ERROR: cannot open /mnt/mymount//VirtualBox_brad_latest/o776-18725303-0: No such file or directory
ERROR: chown o776-18725303-0 failed: No such file or directory
ERROR: chmod o776-18725303-0 failed: No such file or directory
ERROR: utimes o776-18725303-0 failed: No such file or directory
ERROR: cannot open /mnt/mymount//VirtualBox_brad_latest/o777-18725303-0: No such file or directory
ERROR: chown o777-18725303-0 failed: No such file or directory
ERROR: chmod o777-18725303-0 failed: No such file or directory
ERROR: utimes o777-18725303-0 failed: No such file or directory
marcosps commented 3 years ago

There is a patch currently on btrfs/misc-next that can help to fix this issue on the kernel side: https://patchwork.kernel.org/project/linux-btrfs/patch/900493c40f7edbd42fe861ccd9a68851ea952499.1610363502.git.fdmanana@suse.com/

As this commit is tagged for stable, maybe @kdave will queue it for the next rc maybe?

kdave commented 3 years ago

That patch has been merged to Linus' tree as 518837e65068c385dddc0 and released in stable 5.10.11

marcosps commented 3 years ago

@kdave oops, my bad. @bronger would you mind testing the same send+receive process in kernel 5.10.11?

bronger commented 3 years ago

If the errors occurs again, I will give it a try. I’ve never compiled a kernel, though.

marcosps commented 3 years ago

@bronger kernel 5.4.93 (current stable of 5.4) also has the fix. If your distribution updates their kernel to the latest stable, it also could be there... So apparently no need to compile a kernel :)

cphuntington97 commented 3 years ago

I get the same "Error: utimes ...." on a particular file system, when using btrfs send and receive (with -p or -c options) to move snapshots to another filesystem (on another disk) ... even tried reformating the filesystems on both source and destination disk but it doesn't solve the problem

It only happens when using -p or -c options, but if i send the full subvolume (without -p or -c options) it works without any problem ...

Also if i use the oldest snapshot as parent/clone-source (with -p or -c options) it works without problem, it only happens if i want to use a more recent snapshot as parent/clone-source

Also if i delete all snapshots older than the snapshot i want to use as parent/clone-source (and i.e. make that snapshot the oldest remaining snapshot) it works without problem ...

I have been experiencing exactly this since June 2021 (new system build). It's still happening with linux kernel 5.13.19 and btrfs-progs v5.14.1

As a work-around, I have been using the oldest snapshot as the parent for each differential backup as mentioned above.

@SudoerReodus has the issue been fixed for you?

alialipr commented 3 years ago

@cphuntington97 I created a new filesystem and used rsync to transfer files from the old one to new one and then got rid of the old one, which so far has solved the issue for me.