koverstreet / bcachefs

Other
688 stars 72 forks source link

compression leads to filesystem hangs #16

Closed hyperfekt closed 5 years ago

hyperfekt commented 5 years ago

When I create a filesystem withbcachefs format --compression=zstd --background_compression=zstd --foreground_target=/dev/sda1 --background_target=/dev/sdb1 --promote_target=/dev/sda1 --encrypted --label=nixos --force /dev/sdb1 --discard /dev/sda1 and attempt to install NixOS on it the install hangs on an fdatasync syscall with 100% CPU usage on one core not attributed to any process in top. Here is the strace -f log, with anything from line 76676 onwards trigged by sending SIGINT, which does not stop the hang.

hyperfekt commented 5 years ago

I'm also getting hangs trying to copy data off of the filesystem or trying to get a git status afterwards - it seems like trying to install nixos makes the entire file system unusable.

Mind that these hangs happen both with the reverted commits and the current head.

koverstreet commented 5 years ago

Which commits were you reverting?

I'm not seeing anything with xfstests, and I wouldn't expect a hang in fdatasync to have anything to do with the format options, so something weird is going on...

Can you try without zstd?

hyperfekt commented 5 years ago

Sorry, I should have used the term 'amended', it seems you made this change at some point and I was referring to it happening even with it:

diff --git a/fs/bcachefs/btree_gc.h b/fs/bcachefs/btree_gc.h
index 1905acfa028a..8af5f841a537 100644
--- a/fs/bcachefs/btree_gc.h
+++ b/fs/bcachefs/btree_gc.h
@@ -109,7 +109,7 @@ static inline bool gc_visited(struct bch_fs *c, struct gc_pos pos)

        do {
                seq = read_seqcount_begin(&c->gc_pos_lock);
-               ret = gc_pos_cmp(pos, c->gc_pos) < 0;
+               ret = gc_pos_cmp(pos, c->gc_pos) <= 0;
        } while (read_seqcount_retry(&c->gc_pos_lock, seq));

        return ret;

I'm still installing but I got past the point where it usually hangs by omitting zstd, looks like that is where the problem lies.

hyperfekt commented 5 years ago

The issue seems to also occur with lz4, but I can confirm it does not happen without compression.

koverstreet commented 5 years ago

Oh, it might be because of the background_compression option. Can you try without that?

On Sun, Feb 10, 2019 at 5:00 PM hyperfekt notifications@github.com wrote:

The issue seems to also occur with lz4, but I can confirm it does not happen without compression.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/koverstreet/bcachefs/issues/16#issuecomment-462179137, or mute the thread https://github.com/notifications/unsubscribe-auth/AB5r7qAs20vZ9Nd90YEofv9AxuvmTbVhks5vMJaGgaJpZM4ays8k .

hyperfekt commented 5 years ago

Happens with only --compression, too.

koverstreet commented 5 years ago

Can you get me some backtraces from when it happens? perf top output?

cat /proc/pid/stack is useful for getting backtraces.

I've got a torture test running with lz4 - copying some data to a filesystem, doing a sync (which hits the same paths as fdatasync), deleting and starting over - but I'm not seeing anything so far.

Can you tell me anything more about how to repro it?

On Sun, Feb 10, 2019 at 5:19 PM hyperfekt notifications@github.com wrote:

Happens with only --compression, too.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/koverstreet/bcachefs/issues/16#issuecomment-462182215, or mute the thread https://github.com/notifications/unsubscribe-auth/AB5r7hqbq780dlT_vspjDXyVV9-geOkUks5vMJrugaJpZM4ays8k .

hyperfekt commented 5 years ago

perf top output does not show whatever is maxing out the core. backtrace:

[<0>] io_schedule+0x12/0x40
[<0>] wait_on_page_bit_common+0xfe/0x210
[<0>] __filemap_fdatawait_range+0xd5/0x130
[<0>] file_write_and_wait_range+0x68/0x90
[<0>] bch2_fsync+0x28/0xa0 [bcachefs]
[<0>] do_fsync+0x38/0x60
[<0>] __x64_sys_fdatasync+0x13/0x20
[<0>] do_syscall_64+0x4e/0x100
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0>] 0xffffffffffffffff

To reproduce, first, we need a NixOS ISO with bcachefs support, which can be created by installing nix, cloning https://github.com/hyperfekt/nix-exp, doing nix-channel --add http://nixos.org/channels/nixos-unstable nixos-unstable and nix-channel --add http://nixos.org/channels/nixos-18.09 nixos, and then doing nix-build '<nixpkgs/nixos>' -A config.system.build.isoImage -I nixos-config=nix-exp/installation-cd-bcachefs-graphical-kde-new-kernel-git.nix. The ISO will be located in a directory named result/iso.

After starting from the NixOS ISO, creating the filesystem, mounting it under /mnt, doing nixos-generate-config --root /mnt, copying this file to /mnt/etc/nixos, add it to the imports list of the /mnt/etc/nixos/configuration.nix file. Then do nix-channel --add http://nixos.org/channels/nixos-unstable nixos-unstable and run nixos-install --root /mnt. The installation should hang with the output of querying info about missing paths. Make sure you've provided networking.

It might be possible to get the nixos-install tool without the ISO and test the whole thing without NixOS, but I'm not able to figure out on the quick how.

koverstreet commented 5 years ago

I managed to repro it - working on it now.

On Mon, Feb 11, 2019 at 1:15 AM hyperfekt notifications@github.com wrote:

perf top output https://github.com/koverstreet/bcachefs/files/2849984/perf-top.log backtrace https://github.com/koverstreet/bcachefs/files/2849985/backtrace.log

You should be able to reproduce it by starting a NixOS ISO, creating the filesystem, mounting it under /mnt, doing nixos-generate-config --root /mnt, copying this file to /mnt/etc/nixos and adding it to the imports list of the /mnt/etc/nixos/configuration.nix file. Then do nix-channel --add http://nixos.org/channels/nixos-unstable nixos-unstable and run nixos-install --root /mnt. The installation should hang with the output of querying info about missing paths. Make sure you've provided networking.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/koverstreet/bcachefs/issues/16#issuecomment-462227850, or mute the thread https://github.com/notifications/unsubscribe-auth/AB5r7gUV3_Kkmu8nQFIeTMysT6IYE21iks5vMQpygaJpZM4ays8k .

koverstreet commented 5 years ago

I think I've got it fixed - can you test it?

On Mon, Feb 11, 2019 at 3:46 PM Kent Overstreet kent.overstreet@gmail.com wrote:

I managed to repro it - working on it now.

On Mon, Feb 11, 2019 at 1:15 AM hyperfekt notifications@github.com wrote:

perf top output https://github.com/koverstreet/bcachefs/files/2849984/perf-top.log backtrace https://github.com/koverstreet/bcachefs/files/2849985/backtrace.log

You should be able to reproduce it by starting a NixOS ISO, creating the filesystem, mounting it under /mnt, doing nixos-generate-config --root /mnt, copying this file to /mnt/etc/nixos and adding it to the imports list of the /mnt/etc/nixos/configuration.nix file. Then do nix-channel --add http://nixos.org/channels/nixos-unstable nixos-unstable and run nixos-install --root /mnt. The installation should hang with the output of querying info about missing paths. Make sure you've provided networking.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/koverstreet/bcachefs/issues/16#issuecomment-462227850, or mute the thread https://github.com/notifications/unsubscribe-auth/AB5r7gUV3_Kkmu8nQFIeTMysT6IYE21iks5vMQpygaJpZM4ays8k .

hyperfekt commented 5 years ago

The install works fine now, thanks for the quick fix!