Closed Dikay900 closed 11 months ago
It seems i have data on non existing devices now based on the metadata.
available devices are currently ids: 0 1 2 3 4 8 9 11
It seems as device 5 is getting checked although I do not see any data on this id.
replicas_v0:
user: 2 [2 11] user: 2 [0 2] cached: 1 [0] user: 1 [1] user: 2 [1 4] user: 2 [4 11] cached: 1 [4] btree: 2 [0 1] user: 1 [8] user: 2 [0 9] user: 2 [2 3] user: 2 [3 11] user: 2 [8 11] cached: 1 [2] cached: 1 [9] journal: 2 [4 9] btree: 2 [0 3] user: 1 [3] user: 1 [11] user: 2 [0 4] user: 2 [1 2] user: 2 [1 9] user: 2 [2 8] user: 2 [3 8] user: 2 [4 9] user: 2 [8 9] user: 2 [9 11] cached: 1 [1] cached: 1 [3] cached: 1 [8] cached: 1 [11] journal: 2 [0 2] journal: 2 [8 9] btree: 2 [0 2] user: 1 [0] user: 1 [2] user: 1 [4] user: 1 [9] user: 2 [0 1] user: 2 [0 3] user: 2 [0 8] user: 2 [0 11] user: 2 [1 3] user: 2 [1 8] user: 2 [1 11] user: 2 [2 4] user: 2 [2 9] user: 2 [3 4] user: 2 [3 9] user: 2 [4 8]
With the following trace output added in bch2_have_enough_devs: replicas.c:932
bool bch2_have_enough_devs(struct bch_fs *c, struct bch_devs_mask devs,
unsigned flags, bool print)
{
struct bch_replicas_entry *e;
bool ret = true;
percpu_down_read(&c->mark_lock);
for_each_cpu_replicas_entry(&c->replicas, e) {
unsigned i, nr_online = 0, nr_failed = 0, dflags = 0;
bool metadata = e->data_type < BCH_DATA_user;
if (e->data_type == BCH_DATA_cached)
continue;
for (i = 0; i < e->nr_devs; i++) {
+ bch_err(c, "checking dev index (%u) of replicas (%u) with type (%u)",
+ e->devs[i], e->nr_devs, e->data_type);
struct bch_dev *ca = bch_dev_bkey_exists(c, e->devs[i]);
nr_online += test_bit(e->devs[i], devs.d);
nr_failed += ca->mi.state == BCH_MEMBER_STATE_failed;
}
I am getting the following output:
checking dev index (2) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (11) of replicas (1) with type (4)
checking dev index (0) of replicas (2) with type (3)
checking dev index (3) of replicas (2) with type (3)
checking dev index (1) of replicas (2) with type (4)
checking dev index (2) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (2)
checking dev index (9) of replicas (2) with type (2)
checking dev index (3) of replicas (1) with type (4)
checking dev index (0) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (1) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (3) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (11) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (2)
checking dev index (5) of replicas (2) with type (2)
bcachefs: libbcachefs/sb-members.h:160: bch_dev_bkey_exists: Assertion `!(idx >= c->sb.nr_devices || !c->devs[idx])' failed.
looking into bch2_dev_remove I see the following comment, which could actually be the case here
/*
* At this point the device object has been removed in-core, but the
* on-disk journal might still refer to the device index via sb device
* usage entries. Recovery fails if it sees usage information for an
* invalid device. Flush journal pins to push the back of the journal
* past now invalid device index references before we update the
* superblock, but after the device object has been removed so any
* further journal writes elide usage info for the device.
*/
bch2_journal_flush_all_pins(&c->journal);
actually I see that the device 5 was added during the journal replay. Since the checking dev index trace is printed at the beginning of the fsck where no dev 5 is there:
~/bcachefs-tools# bcachefs fsck -pv /dev/sdl /dev/sda /dev/sdd /dev/sdb /dev/sdg /dev/sdh /dev/sdj /dev/sdf
checking dev index (2) of replicas (2) with type (4)
checking dev index (11) of replicas (2) with type (4)
checking dev index (0) of replicas (2) with type (4)
checking dev index (2) of replicas (2) with type (4)
checking dev index (1) of replicas (1) with type (4)
checking dev index (1) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (11) of replicas (2) with type (4)
checking dev index (0) of replicas (2) with type (3)
checking dev index (1) of replicas (2) with type (3)
checking dev index (8) of replicas (1) with type (4)
checking dev index (0) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (2) of replicas (2) with type (4)
checking dev index (3) of replicas (2) with type (4)
checking dev index (3) of replicas (2) with type (4)
checking dev index (11) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (11) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (2)
checking dev index (9) of replicas (2) with type (2)
checking dev index (0) of replicas (2) with type (3)
checking dev index (3) of replicas (2) with type (3)
checking dev index (3) of replicas (1) with type (4)
checking dev index (11) of replicas (1) with type (4)
checking dev index (0) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (1) of replicas (2) with type (4)
checking dev index (2) of replicas (2) with type (4)
checking dev index (1) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (2) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (3) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (11) of replicas (2) with type (4)
checking dev index (0) of replicas (2) with type (2)
checking dev index (2) of replicas (2) with type (2)
checking dev index (8) of replicas (2) with type (2)
checking dev index (9) of replicas (2) with type (2)
checking dev index (0) of replicas (2) with type (3)
checking dev index (2) of replicas (2) with type (3)
checking dev index (0) of replicas (1) with type (4)
checking dev index (2) of replicas (1) with type (4)
checking dev index (4) of replicas (1) with type (4)
checking dev index (9) of replicas (1) with type (4)
checking dev index (0) of replicas (2) with type (4)
checking dev index (1) of replicas (2) with type (4)
checking dev index (0) of replicas (2) with type (4)
checking dev index (3) of replicas (2) with type (4)
checking dev index (0) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (0) of replicas (2) with type (4)
checking dev index (11) of replicas (2) with type (4)
checking dev index (1) of replicas (2) with type (4)
checking dev index (3) of replicas (2) with type (4)
checking dev index (1) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (1) of replicas (2) with type (4)
checking dev index (11) of replicas (2) with type (4)
checking dev index (2) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (2) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (3) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (3) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
mounting version 1.3: rebalance_work opts=metadata_replicas=2,data_replicas=2,metadata_target=fast,foreground_target=fast,background_target=hdd,promote_target=fast,degraded,fsck,fix_errors=yes
recovering from unclean shutdown
starting journal read
journal read done on device 0x55fc71bd15c0g, ret 0
journal read done on device 0x55fc71bca5e0g, ret 0
journal read done on device 0x55fc71bc27f0g, ret 0
journal read done on device 0x55fc71bc3e60g, ret 0
journal read done on device 0x55fc71bd1770g, ret 0
journal read done on device 0x55fc71bdd400g, ret 0
journal read done on device 0x55fc71bca770g, ret 0
journal read done on device 0x55fc71bdd770g, ret 0
journal read done, replaying entries 25734944-25735261
invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734944 offset=2009/2151: bad dev, fixing
[..]
invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25735261 offset=67279/67421: bad dev, fixing
Journal keys: 27066 read, 18675 after sorting and compacting
checking dev index (2) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (11) of replicas (1) with type (4)
checking dev index (0) of replicas (2) with type (3)
checking dev index (3) of replicas (2) with type (3)
checking dev index (1) of replicas (2) with type (4)
checking dev index (2) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (2)
checking dev index (9) of replicas (2) with type (2)
checking dev index (3) of replicas (1) with type (4)
checking dev index (0) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (1) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (3) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (11) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (2)
checking dev index (5) of replicas (2) with type (2)
bcachefs: libbcachefs/sb-members.h:160: bch_dev_bkey_exists: Assertion `!(idx >= c->sb.nr_devices || !c->devs[idx])' failed.
data type 2 is journal so in the beginning there was no journal entry for dev 5 but after the replay there is. Probably the scenario described in the comment above during dev remove.
So now the question would be how can i just ignore the readded device or properly remove it after journal replay or how to skip the journal replay here.
Can you pipe that through scripts/decode_stacktrace.sh?
Ah forgot the stacktrace yesterday while trying to find a way, thats from the userspace tool but the same crash:
(gdb) run fsck -pv /dev/sdl /dev/sda /dev/sdd /dev/sdb /dev/sdg /dev/sdh /dev/sdj /dev/sdf
Starting program: /root/bcachefs-tools/bcachefs fsck -pv /dev/sdl /dev/sda /dev/sdd /dev/sdb /dev/sdg /dev/sdh /dev/sdj /dev/sdf
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7aa4680 (LWP 56759)]
[New Thread 0x7ffff7a9b680 (LWP 56760)]
[New Thread 0x7ffff7a92680 (LWP 56761)]
[New Thread 0x7ffff7a89680 (LWP 56762)]
[New Thread 0x7ffff7a80680 (LWP 56763)]
[New Thread 0x7ffff7a77680 (LWP 56764)]
[New Thread 0x7ffff7a6e680 (LWP 56765)]
[New Thread 0x7ffff7a65680 (LWP 56766)]
[New Thread 0x7ffff79da680 (LWP 56768)]
[New Thread 0x7ffff79d1680 (LWP 56769)]
[New Thread 0x7ffff79c8680 (LWP 56770)]
[New Thread 0x7ffff79bf680 (LWP 56771)]
[New Thread 0x7ffff79b6680 (LWP 56772)]
[New Thread 0x7ffff6f0f680 (LWP 56773)]
mounting version 1.3: rebalance_work opts=metadata_replicas=2,data_replicas=2,metadata_target=fast,foreground_target=fast,background_target=hdd,promote_target=fast,degraded,fsck,fix_errors=yes
recovering from unclean shutdown
starting journal read
journal read done on device 0x555555b145e0g, ret 0
journal read done on device 0x555555b0d5c0g, ret 0
journal read done on device 0x555555b057f0g, ret 0
journal read done on device 0x555555b06e60g, ret 0
journal read done on device 0x555555b14e60g, ret 0
journal read done on device 0x555555b205c0g, ret 0
journal read done on device 0x555555b1b720g, ret 0
journal read done on device 0x555555b14f20g, ret 0
journal read done, replaying entries 25734944-25735261
invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734944 offset=2009/2151: bad dev, fixing
[...]
invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25735261 offset=67279/67421: bad dev, fixing
Journal keys: 27066 read, 18675 after sorting and compacting
bcachefs: libbcachefs/sb-members.h:160: bch_dev_bkey_exists: Assertion `!(idx >= c->sb.nr_devices || !c->devs[idx])' failed.
Thread 1 "bcachefs" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
(gdb) backtrace
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1 0x00007ffff7b39d9f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 0x00007ffff7aeaf32 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007ffff7ad5472 in __GI_abort () at ./stdlib/abort.c:79
#4 0x00007ffff7ad5395 in __assert_fail_base (fmt=0x7ffff7c49a90 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5555558988b8 "!(idx >= c->sb.nr_devices || !c->devs[idx])",
file=file@entry=0x55555589869f "libbcachefs/sb-members.h", line=line@entry=160, function=function@entry=0x5555558bb2f0 <__PRETTY_FUNCTION__.0> "bch_dev_bkey_exists") at ./assert/assert.c:92
#5 0x00007ffff7ae3e32 in __GI___assert_fail (assertion=assertion@entry=0x5555558988b8 "!(idx >= c->sb.nr_devices || !c->devs[idx])", file=file@entry=0x55555589869f "libbcachefs/sb-members.h", line=line@entry=160,
function=function@entry=0x5555558bb2f0 <__PRETTY_FUNCTION__.0> "bch_dev_bkey_exists") at ./assert/assert.c:101
#6 0x00005555556749e5 in bch_dev_bkey_exists (idx=<optimized out>, c=0x7ffff79dc000) at libbcachefs/sb-members.h:160
#7 bch2_have_enough_devs (c=c@entry=0x7ffff79dc000, devs=..., flags=flags@entry=12, print=print@entry=false) at libbcachefs/replicas.c:949
#8 0x0000555555685fb2 in bch2_write_super (c=c@entry=0x7ffff79dc000) at libbcachefs/super-io.c:985
#9 0x00005555556607d2 in bch2_journal_seq_blacklist_add (c=c@entry=0x7ffff79dc000, start=25735262, end=25735270) at libbcachefs/journal_seq_blacklist.c:115
#10 0x000055555566fc2d in bch2_fs_recovery (c=0x7ffff79dc000) at libbcachefs/recovery.c:839
#11 0x000055555568aa5d in bch2_fs_start (c=c@entry=0x7ffff79dc000) at libbcachefs/super.c:966
#12 0x000055555568d9e3 in bch2_fs_open (devices=devices@entry=0x7fffffffe998, nr_devices=nr_devices@entry=8, opts=...) at libbcachefs/super.c:1956
#13 0x00005555555b1941 in cmd_fsck (argc=8, argv=0x7fffffffe998, argv@entry=0x7fffffffe988) at cmd_fsck.c:94
#14 0x00005555555aa97d in main (argc=<optimized out>, argv=0x7fffffffe988) at bcachefs.c:232
adding output to journal_replay_entry_early
case BCH_JSET_ENTRY_data_usage: {
struct jset_entry_data_usage *u =
container_of(entry, struct jset_entry_data_usage, entry);
ret = bch2_replicas_set_usage(c, &u->r,
le64_to_cpu(u->v));
bch_err(c, "adding data usage");
for (unsigned i = 0; i < u->r.nr_devs; i++) {
bch_err(c, "for device %u", u->r.devs[i]);
}
break;
}
I see that this is adding dev 5 during the replay which then triggers the assert afterwards when journal_replay_early is done.
No dev usage is added when adding a similiar output to the
case BCH_JSET_ENTRY_dev_usage
updated to bcachefs-for-upstream b20b156ea4331e541ca9268d410e8dc48d1f2d13 and tools to 138397d89212cd0b5abdbfdd644dc7702ea00f4a
cat trace.decoded
[Sat Nov 25 11:36:34 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): mounting version 1.3: rebalance_work opts=metadata_replicas=2,data_replicas=2,metadata_target=fast,foreground_target=fast,background_target=hdd,promote_target=fast,fsck,fix_errors=yes
[Sat Nov 25 11:36:34 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): recovering from unclean shutdown
[Sat Nov 25 11:38:34 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): journal read done, replaying entries 25734944-25735261
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734944 offset=2009/2151: bad dev, fixing
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734945 offset=923/1065: bad dev, fixing
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734946 offset=1216/1358: bad dev, fixing
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734947 offset=956/1098: bad dev, fixing
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734948 offset=714/856: bad dev, fixing
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734949 offset=808/950: bad dev, fixing
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734950 offset=783/925: bad dev, fixing
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734951 offset=1229/1371: bad dev, fixing
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734952 offset=1165/1307: bad dev, fixing
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734953 offset=712/854: bad dev, fixing
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734954 offset=803/945: bad dev, fixing
[Sat Nov 25 11:38:34 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): Ratelimiting new instances of previous error
[Sat Nov 25 11:38:34 2023] BUG: kernel NULL pointer dereference, address: 00000000000000be
[Sat Nov 25 11:38:34 2023] #PF: supervisor read access in kernel mode
[Sat Nov 25 11:38:34 2023] #PF: error_code(0x0000) - not-present page
[Sat Nov 25 11:38:34 2023] PGD 0 P4D 0
[Sat Nov 25 11:38:34 2023] Oops: 0000 [#1] PREEMPT SMP NOPTI
[Sat Nov 25 11:38:34 2023] CPU: 0 PID: 2404 Comm: bcachefs Not tainted 6.7.0-rc2+ #3
[Sat Nov 25 11:38:34 2023] Hardware name: System manufacturer System Product Name/PRIME B450M-A, BIOS 2409 12/02/2020
[Sat Nov 25 11:38:34 2023] RIP: 0010:bch2_have_enough_devs (/mnt/bcachefs/fs/bcachefs/replicas.c:952)
[Sat Nov 25 11:38:34 2023] Code: 74 7c 31 ff 31 ed 31 d2 89 d0 0f b6 4c 06 03 48 89 c8 48 83 c1 44 49 8b 4c cc 08 48 0f a3 04 24 0f 92 c0 0f b6 c0 01 c5 31 c0 <80> b9 be 00 00 00 02 0f 94 c0 83 c2 01 01 c7 0f b6 46 01 39 c2 72
All code
========
0: 74 7c je 0x7e
2: 31 ff xor %edi,%edi
4: 31 ed xor %ebp,%ebp
6: 31 d2 xor %edx,%edx
8: 89 d0 mov %edx,%eax
a: 0f b6 4c 06 03 movzbl 0x3(%rsi,%rax,1),%ecx
f: 48 89 c8 mov %rcx,%rax
12: 48 83 c1 44 add $0x44,%rcx
16: 49 8b 4c cc 08 mov 0x8(%r12,%rcx,8),%rcx
1b: 48 0f a3 04 24 bt %rax,(%rsp)
20: 0f 92 c0 setb %al
23: 0f b6 c0 movzbl %al,%eax
26: 01 c5 add %eax,%ebp
28: 31 c0 xor %eax,%eax
2a:* 80 b9 be 00 00 00 02 cmpb $0x2,0xbe(%rcx) <-- trapping instruction
31: 0f 94 c0 sete %al
34: 83 c2 01 add $0x1,%edx
37: 01 c7 add %eax,%edi
39: 0f b6 46 01 movzbl 0x1(%rsi),%eax
3d: 39 c2 cmp %eax,%edx
3f: 72 .byte 0x72
Code starting with the faulting instruction
===========================================
0: 80 b9 be 00 00 00 02 cmpb $0x2,0xbe(%rcx)
7: 0f 94 c0 sete %al
a: 83 c2 01 add $0x1,%edx
d: 01 c7 add %eax,%edi
f: 0f b6 46 01 movzbl 0x1(%rsi),%eax
13: 39 c2 cmp %eax,%edx
15: 72 .byte 0x72
[Sat Nov 25 11:38:34 2023] RSP: 0018:ffffacfac9bff758 EFLAGS: 00010246
[Sat Nov 25 11:38:34 2023] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[Sat Nov 25 11:38:34 2023] RDX: 0000000000000001 RSI: ffff8c3d4acdc04b RDI: 0000000000000000
[Sat Nov 25 11:38:34 2023] RBP: 0000000000000001 R08: 0000000000000002 R09: 00000000fffffff3
[Sat Nov 25 11:38:34 2023] R10: 0000000000000001 R11: ffff8c3e2a254000 R12: ffff8c3d98680000
[Sat Nov 25 11:38:34 2023] R13: 0000000000000008 R14: 000000000000000c R15: 0000000000000004
[Sat Nov 25 11:38:34 2023] FS: 00007fd7475e2c80(0000) GS:ffff8c483ea00000(0000) knlGS:0000000000000000
[Sat Nov 25 11:38:34 2023] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sat Nov 25 11:38:34 2023] CR2: 00000000000000be CR3: 000000010acf0000 CR4: 00000000003506f0
[Sat Nov 25 11:38:34 2023] Call Trace:
[Sat Nov 25 11:38:34 2023] <TASK>
[Sat Nov 25 11:38:34 2023] ? __die (/mnt/bcachefs/arch/x86/kernel/dumpstack.c:421 /mnt/bcachefs/arch/x86/kernel/dumpstack.c:434)
[Sat Nov 25 11:38:34 2023] ? page_fault_oops (/mnt/bcachefs/arch/x86/mm/fault.c:707)
[Sat Nov 25 11:38:34 2023] ? __cgroup_account_cputime (/mnt/bcachefs/./arch/x86/include/asm/preempt.h:104 /mnt/bcachefs/kernel/cgroup/rstat.c:395 /mnt/bcachefs/kernel/cgroup/rstat.c:405)
[Sat Nov 25 11:38:34 2023] ? exc_page_fault (/mnt/bcachefs/./arch/x86/include/asm/paravirt.h:695 /mnt/bcachefs/arch/x86/mm/fault.c:1513 /mnt/bcachefs/arch/x86/mm/fault.c:1561)
[Sat Nov 25 11:38:34 2023] ? asm_exc_page_fault (/mnt/bcachefs/./arch/x86/include/asm/idtentry.h:570)
[Sat Nov 25 11:38:34 2023] ? bch2_have_enough_devs (/mnt/bcachefs/fs/bcachefs/replicas.c:952)
[Sat Nov 25 11:38:34 2023] ? bch2_have_enough_devs (/mnt/bcachefs/./arch/x86/include/asm/preempt.h:104 /mnt/bcachefs/./include/linux/percpu-rwsem.h:70 /mnt/bcachefs/fs/bcachefs/replicas.c:940)
[Sat Nov 25 11:38:34 2023] ? schedule (/mnt/bcachefs/./arch/x86/include/asm/bitops.h:206 /mnt/bcachefs/./arch/x86/include/asm/bitops.h:238 /mnt/bcachefs/./include/linux/thread_info.h:184 /mnt/bcachefs/./include/linux/sched.h:2263 /mnt/bcachefs/kernel/sched/core.c:6765 /mnt/bcachefs/kernel/sched/core.c:6778)
[Sat Nov 25 11:38:34 2023] ? __closure_sync (/mnt/bcachefs/lib/closure.c:132)
[Sat Nov 25 11:38:34 2023] bch2_write_super (/mnt/bcachefs/fs/bcachefs/super-io.c:993)
[Sat Nov 25 11:38:34 2023] bch2_journal_seq_blacklist_add (/mnt/bcachefs/fs/bcachefs/journal_seq_blacklist.c:117)
[Sat Nov 25 11:38:34 2023] bch2_fs_recovery (/mnt/bcachefs/fs/bcachefs/recovery.c:802)
[Sat Nov 25 11:38:34 2023] ? bch2_printbuf_exit (/mnt/bcachefs/fs/bcachefs/printbuf.c:110)
[Sat Nov 25 11:38:34 2023] ? print_mount_opts (/mnt/bcachefs/fs/bcachefs/super.c:938)
[Sat Nov 25 11:38:34 2023] bch2_fs_start (/mnt/bcachefs/fs/bcachefs/super.c:971)
[Sat Nov 25 11:38:34 2023] bch2_fs_open (/mnt/bcachefs/fs/bcachefs/super.c:1961)
[Sat Nov 25 11:38:34 2023] ? __kmem_cache_alloc_node (/mnt/bcachefs/mm/slub.c:3480 /mnt/bcachefs/mm/slub.c:3517)
[Sat Nov 25 11:38:34 2023] ? bch2_mount (/mnt/bcachefs/fs/bcachefs/fs.c:1823)
[Sat Nov 25 11:38:34 2023] bch2_mount (/mnt/bcachefs/fs/bcachefs/fs.c:1823)
[Sat Nov 25 11:38:34 2023] legacy_get_tree (/mnt/bcachefs/fs/fs_context.c:664)
[Sat Nov 25 11:38:34 2023] vfs_get_tree (/mnt/bcachefs/fs/super.c:1772)
[Sat Nov 25 11:38:34 2023] path_mount (/mnt/bcachefs/fs/namespace.c:3337 /mnt/bcachefs/fs/namespace.c:3664)
[Sat Nov 25 11:38:34 2023] __x64_sys_mount (/mnt/bcachefs/fs/namespace.c:3678 /mnt/bcachefs/fs/namespace.c:3886 /mnt/bcachefs/fs/namespace.c:3863 /mnt/bcachefs/fs/namespace.c:3863)
[Sat Nov 25 11:38:34 2023] do_syscall_64 (/mnt/bcachefs/arch/x86/entry/common.c:51 /mnt/bcachefs/arch/x86/entry/common.c:82)
[Sat Nov 25 11:38:34 2023] ? vfs_write (/mnt/bcachefs/./include/linux/fs.h:2020 /mnt/bcachefs/fs/read_write.c:491 /mnt/bcachefs/fs/read_write.c:584)
[Sat Nov 25 11:38:34 2023] ? fpregs_assert_state_consistent (/mnt/bcachefs/arch/x86/kernel/fpu/context.h:38 /mnt/bcachefs/arch/x86/kernel/fpu/core.c:822)
[Sat Nov 25 11:38:34 2023] ? exit_to_user_mode_prepare (/mnt/bcachefs/./arch/x86/include/asm/entry-common.h:57 /mnt/bcachefs/kernel/entry/common.c:206)
[Sat Nov 25 11:38:34 2023] ? syscall_exit_to_user_mode (/mnt/bcachefs/kernel/entry/common.c:299)
[Sat Nov 25 11:38:34 2023] ? do_syscall_64 (/mnt/bcachefs/./arch/x86/include/asm/cpufeature.h:171 /mnt/bcachefs/arch/x86/entry/common.c:97)
[Sat Nov 25 11:38:34 2023] entry_SYSCALL_64_after_hwframe (/mnt/bcachefs/arch/x86/entry/entry_64.S:129)
[Sat Nov 25 11:38:34 2023] RIP: 0033:0x7fd7476f5b7a
[Sat Nov 25 11:38:34 2023] Code: 48 8b 0d 89 82 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 56 82 0c 00 f7 d8 64 89 01 48
All code
========
0: 48 8b 0d 89 82 0c 00 mov 0xc8289(%rip),%rcx # 0xc8290
7: f7 d8 neg %eax
9: 64 89 01 mov %eax,%fs:(%rcx)
c: 48 83 c8 ff or $0xffffffffffffffff,%rax
10: c3 ret
11: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
18: 00 00 00
1b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
20: 49 89 ca mov %rcx,%r10
23: b8 a5 00 00 00 mov $0xa5,%eax
28: 0f 05 syscall
2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction
30: 73 01 jae 0x33
32: c3 ret
33: 48 8b 0d 56 82 0c 00 mov 0xc8256(%rip),%rcx # 0xc8290
3a: f7 d8 neg %eax
3c: 64 89 01 mov %eax,%fs:(%rcx)
3f: 48 rex.W
Code starting with the faulting instruction
===========================================
0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
6: 73 01 jae 0x9
8: c3 ret
9: 48 8b 0d 56 82 0c 00 mov 0xc8256(%rip),%rcx # 0xc8266
10: f7 d8 neg %eax
12: 64 89 01 mov %eax,%fs:(%rcx)
15: 48 rex.W
[Sat Nov 25 11:38:34 2023] RSP: 002b:00007ffec9372ed8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[Sat Nov 25 11:38:34 2023] RAX: ffffffffffffffda RBX: 000055bd2ddc1000 RCX: 00007fd7476f5b7a
[Sat Nov 25 11:38:34 2023] RDX: 000055bd2ddcbe00 RSI: 000055bd2ddcb680 RDI: 000055bd2ddc1000
[Sat Nov 25 11:38:34 2023] RBP: 000055bd2ddcbe00 R08: 000055bd2ddd0b00 R09: 000055bd2ddc1fe0
[Sat Nov 25 11:38:34 2023] R10: 0000000000000000 R11: 0000000000000206 R12: 000055bd2ddd0b00
[Sat Nov 25 11:38:34 2023] R13: 0000000000000000 R14: 000055bd2ddc91f0 R15: 0000000000000008
[Sat Nov 25 11:38:34 2023] </TASK>
[Sat Nov 25 11:38:34 2023] Modules linked in: softdog xt_conntrack xt_CHECKSUM nft_chain_nat nf_conntrack_netlink xt_MASQUERADE nf_nat xfrm_user xfrm_algo nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype xt_tcpudp nft_compat br_netfilter bridge stp llc nf_tables nfnetlink overlay zstd zram zsmalloc binfmt_misc nls_ascii nls_cp437 vfat fat intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd kvm irqbypass ghash_clmulni_intel evdev sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 eeepc_wmi aesni_intel asus_wmi snd_pcm crypto_simd battery snd_timer ledtrig_audio cryptd sparse_keymap rapl platform_profile snd video sp5100_tco wmi_bmof soundcore acpi_cpufreq watchdog pcspkr k10temp ccp rng_core cfg80211 button rfkill sg nfsd auth_rpcgss nfs_acl lockd grace bcache sunrpc drm dm_mod efi_pstore fuse loop configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 ses enclosure scsi_transport_sas btrfs blake2b_generic efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear
[Sat Nov 25 11:38:34 2023] md_mod sd_mod uas t10_pi crc64_rocksoft usb_storage crc_t10dif crct10dif_generic ahci xhci_pci libahci crct10dif_pclmul crct10dif_common libata xhci_hcd r8169 atlantic crc32_pclmul realtek mdio_devres crc32c_intel usbcore libphy macsec scsi_mod i2c_piix4 usb_common scsi_common wmi gpio_amdpt gpio_generic
[Sat Nov 25 11:38:34 2023] CR2: 00000000000000be
[Sat Nov 25 11:38:34 2023] ---[ end trace 0000000000000000 ]---
[Sat Nov 25 11:38:34 2023] RIP: 0010:bch2_have_enough_devs (/mnt/bcachefs/fs/bcachefs/replicas.c:952)
[Sat Nov 25 11:38:34 2023] Code: 74 7c 31 ff 31 ed 31 d2 89 d0 0f b6 4c 06 03 48 89 c8 48 83 c1 44 49 8b 4c cc 08 48 0f a3 04 24 0f 92 c0 0f b6 c0 01 c5 31 c0 <80> b9 be 00 00 00 02 0f 94 c0 83 c2 01 01 c7 0f b6 46 01 39 c2 72
All code
========
0: 74 7c je 0x7e
2: 31 ff xor %edi,%edi
4: 31 ed xor %ebp,%ebp
6: 31 d2 xor %edx,%edx
8: 89 d0 mov %edx,%eax
a: 0f b6 4c 06 03 movzbl 0x3(%rsi,%rax,1),%ecx
f: 48 89 c8 mov %rcx,%rax
12: 48 83 c1 44 add $0x44,%rcx
16: 49 8b 4c cc 08 mov 0x8(%r12,%rcx,8),%rcx
1b: 48 0f a3 04 24 bt %rax,(%rsp)
20: 0f 92 c0 setb %al
23: 0f b6 c0 movzbl %al,%eax
26: 01 c5 add %eax,%ebp
28: 31 c0 xor %eax,%eax
2a:* 80 b9 be 00 00 00 02 cmpb $0x2,0xbe(%rcx) <-- trapping instruction
31: 0f 94 c0 sete %al
34: 83 c2 01 add $0x1,%edx
37: 01 c7 add %eax,%edi
39: 0f b6 46 01 movzbl 0x1(%rsi),%eax
3d: 39 c2 cmp %eax,%edx
3f: 72 .byte 0x72
Code starting with the faulting instruction
===========================================
0: 80 b9 be 00 00 00 02 cmpb $0x2,0xbe(%rcx)
7: 0f 94 c0 sete %al
a: 83 c2 01 add $0x1,%edx
d: 01 c7 add %eax,%edi
f: 0f b6 46 01 movzbl 0x1(%rsi),%eax
13: 39 c2 cmp %eax,%edx
15: 72 .byte 0x72
[Sat Nov 25 11:38:34 2023] RSP: 0018:ffffacfac9bff758 EFLAGS: 00010246
[Sat Nov 25 11:38:34 2023] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[Sat Nov 25 11:38:34 2023] RDX: 0000000000000001 RSI: ffff8c3d4acdc04b RDI: 0000000000000000
[Sat Nov 25 11:38:34 2023] RBP: 0000000000000001 R08: 0000000000000002 R09: 00000000fffffff3
[Sat Nov 25 11:38:34 2023] R10: 0000000000000001 R11: ffff8c3e2a254000 R12: ffff8c3d98680000
[Sat Nov 25 11:38:34 2023] R13: 0000000000000008 R14: 000000000000000c R15: 0000000000000004
[Sat Nov 25 11:38:34 2023] FS: 00007fd7475e2c80(0000) GS:ffff8c483ea00000(0000) knlGS:0000000000000000
[Sat Nov 25 11:38:34 2023] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sat Nov 25 11:38:34 2023] CR2: 00000000000000be CR3: 000000010acf0000 CR4: 00000000003506f0
[Sat Nov 25 11:38:34 2023] note: bcachefs[2404] exited with irqs disabled
So we seem to have a replicas entry that points to an invalid device. Which is odd - it doesn't look like the bad entry was in the superblock.
Could you try with the replicas_debug branch? I've added an extra assertion in the path that adds new cpu replicas entries.
When you run that branch you'll need to run it under gdb to get a backtrace.
There you go:
bcachefs: libbcachefs/replicas.c:178: cpu_replicas_add_entry: Assertion `!(!bch2_dev_exists2(c, new_entry->devs[i]))' failed.
Thread 1 "bcachefs" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
(gdb) backtrace
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1 0x00007ffff7b39d9f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 0x00007ffff7aeaf32 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007ffff7ad5472 in __GI_abort () at ./stdlib/abort.c:79
#4 0x00007ffff7ad5395 in __assert_fail_base (fmt=0x7ffff7c49a90 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x5555558bf440 "!(!bch2_dev_exists2(c, new_entry->devs[i]))", file=file@entry=0x5555558bf006 "libbcachefs/replicas.c", line=line@entry=178,
function=function@entry=0x5555558bf710 <__PRETTY_FUNCTION__.7> "cpu_replicas_add_entry") at ./assert/assert.c:92
#5 0x00007ffff7ae3e32 in __GI___assert_fail (assertion=assertion@entry=0x5555558bf440 "!(!bch2_dev_exists2(c, new_entry->devs[i]))",
file=file@entry=0x5555558bf006 "libbcachefs/replicas.c", line=line@entry=178, function=function@entry=0x5555558bf710 <__PRETTY_FUNCTION__.7> "cpu_replicas_add_entry")
at ./assert/assert.c:101
#6 0x000055555567594f in cpu_replicas_add_entry (c=c@entry=0x7ffff79dc000, old=old@entry=0x7ffff79dc420, new_entry=new_entry@entry=0x7fffec0cf6b0) at libbcachefs/replicas.c:178
#7 0x00005555556773fa in bch2_replicas_set_usage (c=c@entry=0x7ffff79dc000, r=r@entry=0x7fffec0cf6b0, sectors=0) at libbcachefs/replicas.c:605
#8 0x000055555567119b in journal_replay_entry_early (c=c@entry=0x7ffff79dc000, entry=entry@entry=0x7fffec0cf6a0) at libbcachefs/recovery.c:306
#9 0x0000555555672c5d in journal_replay_early (clean=<optimized out>, c=0x7ffff79dc000) at libbcachefs/recovery.c:383
#10 bch2_fs_recovery (c=0x7ffff79dc000) at libbcachefs/recovery.c:839
#11 0x000055555568e10d in bch2_fs_start (c=c@entry=0x7ffff79dc000) at libbcachefs/super.c:969
#12 0x000055555569104b in bch2_fs_open (devices=devices@entry=0x7fffffffe9d8, nr_devices=nr_devices@entry=8, opts=...) at libbcachefs/super.c:1959
#13 0x00005555555b1a31 in cmd_fsck (argc=8, argv=0x7fffffffe9d8, argv@entry=0x7fffffffe9c8) at cmd_fsck.c:94
#14 0x00005555555aa9a7 in main (argc=<optimized out>, argv=0x7fffffffe9c8) at bcachefs.c:234
yep, that found the missing validation
the replicas_debug branch has been updated, tell me if that works
Thank you that works:
[Sun Nov 26 09:08:10 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): mounting version 1.3: rebalance_work opts=metadata_replicas=2,data_replicas=2,metadata_target=fast,foreground_target=fast,background_target=hdd,promote_target=fast,fsck,fix_errors=yes
[Sun Nov 26 09:08:10 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): recovering from unclean shutdown
[Sun Nov 26 09:10:10 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): journal read done, replaying entries 25734944-25735261
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=data_usage seq=25734944 offset=1728/2151: invalid journal entry usage: invalid device 5 in entry journal: 1/2 [4 5], fixing
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734944 offset=2009/2151: bad dev, fixing
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=data_usage seq=25734945 offset=642/1065: invalid journal entry usage: invalid device 5 in entry journal: 1/2 [4 5], fixing
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734945 offset=923/1065: bad dev, fixing
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=data_usage seq=25734946 offset=935/1358: invalid journal entry usage: invalid device 5 in entry journal: 1/2 [4 5], fixing
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734946 offset=1216/1358: bad dev, fixing
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=data_usage seq=25734947 offset=675/1098: invalid journal entry usage: invalid device 5 in entry journal: 1/2 [4 5], fixing
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734947 offset=956/1098: bad dev, fixing
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=data_usage seq=25734948 offset=433/856: invalid journal entry usage: invalid device 5 in entry journal: 1/2 [4 5], fixing
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734948 offset=714/856: bad dev, fixing
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=data_usage seq=25734949 offset=527/950: invalid journal entry usage: invalid device 5 in entry journal: 1/2 [4 5], fixing
[Sun Nov 26 09:10:10 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): Ratelimiting new instances of previous error
[Sun Nov 26 09:10:10 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): alloc_read... done
[Sun Nov 26 09:10:10 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): stripes_read... done
[Sun Nov 26 09:10:10 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): snapshots_read... done
[Sun Nov 26 09:10:10 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_allocations... done
[Sun Nov 26 09:24:48 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): journal_replay...
[Sun Nov 26 09:24:48 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): going read-write done
[Sun Nov 26 09:24:50 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_alloc_info... done
[Sun Nov 26 09:25:16 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_lrus... done
[Sun Nov 26 09:25:22 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_btree_backpointers...
[Sun Nov 26 09:36:04 2023] done
[Sun Nov 26 09:36:04 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_backpointers_to_extents... done
[Sun Nov 26 10:05:24 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_extents_to_backpointers... done
[Sun Nov 26 10:17:30 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_alloc_to_lru_refs... done
[Sun Nov 26 10:17:51 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_snapshot_trees... done
[Sun Nov 26 10:17:51 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_snapshots... done
[Sun Nov 26 10:17:51 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_subvols... done
[Sun Nov 26 10:17:51 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): delete_dead_snapshots... done
[Sun Nov 26 10:17:51 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): resume_logged_ops... done
[Sun Nov 26 10:17:51 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_inodes... done
[Sun Nov 26 10:18:06 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_extents...
[Sun Nov 26 10:18:55 2023] inode 141010656:4294967295 has incorrect i_sectors: got 2597304, should be 2597200, fixing
[Sun Nov 26 10:18:55 2023] inode 141011051:4294967295 has incorrect i_sectors: got 2534976, should be 2534920, fixing
[Sun Nov 26 10:18:55 2023] inode 141011067:4294967295 has incorrect i_sectors: got 4375512, should be 4375464, fixing
[Sun Nov 26 10:18:56 2023] inode 141011319:4294967295 has incorrect i_sectors: got 3558888, should be 3558840, fixing
[Sun Nov 26 10:18:56 2023] inode 141011524:4294967295 has incorrect i_sectors: got 22248072, should be 22247960, fixing
[Sun Nov 26 10:18:57 2023] inode 141011550:4294967295 has incorrect i_sectors: got 26808424, should be 26808416, fixing
[Sun Nov 26 10:18:57 2023] inode 141011594:4294967295 has incorrect i_sectors: got 753640, should be 753600, fixing
[Sun Nov 26 10:18:57 2023] inode 141011630:4294967295 has incorrect i_sectors: got 3679112, should be 3679008, fixing
[Sun Nov 26 10:18:57 2023] inode 141011631:4294967295 has incorrect i_sectors: got 3679168, should be 3679128, fixing
[Sun Nov 26 10:18:57 2023] inode 141011661:4294967295 has incorrect i_sectors: got 3889336, should be 3889296, fixing
[Sun Nov 26 10:18:57 2023] inode 141011837:4294967295 has incorrect i_sectors: got 3793104, should be 3792992, fixing
[Sun Nov 26 10:18:57 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): Ratelimiting new instances of previous error
[Sun Nov 26 10:19:45 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_indirect_extents... done
[Sun Nov 26 10:19:46 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_dirents... done
[Sun Nov 26 10:19:55 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_xattrs... done
[Sun Nov 26 10:19:55 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_root... done
[Sun Nov 26 10:19:55 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_directory_structure... done
[Sun Nov 26 10:20:12 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_nlinks... done
[Sun Nov 26 10:20:28 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): delete_dead_inodes... done
Thank you very much for your work!
Version
bcachefs master e2cd10f254d
Generic info
I am currently tearing down my old array and moving the data to a new one, removing some devices from the existing array. After the latest removal I cannot mount the array anymore. Trying to fsck results into a crash see below: Are there more options available i can get into read only mode of this array without replaying the journal or dropping the journal keys? Since I am reading only from this array anyway I do not need write access anymore.
fsck with debug usertools gives me
bcachefs: libbcachefs/sb-members.h:160: bch_dev_bkey_exists: Assertion '!(idx >= c->sb.nr_devices || !c->devs[idx])' failed.