koverstreet / bcachefs-tools

http://bcachefs.org
GNU General Public License v2.0
123 stars 87 forks source link

bcachefs goes into read only mode after first read from system idle #105

Open demizer opened 2 years ago

demizer commented 2 years ago

I am using freshly built (as of Jan 8) bcachefs-tools and bcachefs kernel.

The filesystem is going into read only mode when I read from it after the system being idle over night. I believe one of the hard disks is going to sleep, at least, I see the "zzz" icon next to one of my disks in the gnome program "Disks".

Jan 09 11:40:47 lithium kernel: ata2.00: exception Emask 0x11 SAct 0x10000000 SErr 0x400000 action 0x6 frozen
Jan 09 11:40:47 lithium kernel: ata2.00: irq_stat 0x48000008, interface fatal error
Jan 09 11:40:47 lithium kernel: ata2: SError: { Handshk }
Jan 09 11:40:47 lithium kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Jan 09 11:40:47 lithium kernel: ata2.00: cmd 61/08:e0:88:51:34/00:00:8d:02:00/40 tag 28 ncq dma 4096 out
                                         res 40/00:e0:88:51:34/00:00:8d:02:00/40 Emask 0x10 (ATA bus error)
Jan 09 11:40:47 lithium kernel: ata2.00: status: { DRDY }
Jan 09 11:40:47 lithium kernel: ata2: hard resetting link
Jan 09 11:40:48 lithium kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 09 11:40:48 lithium kernel: ata2.00: configured for UDMA/133
Jan 09 11:40:48 lithium kernel: ata2: EH complete
Jan 09 11:40:48 lithium kernel: ata2.00: exception Emask 0x11 SAct 0x100 SErr 0x400000 action 0x6 frozen
Jan 09 11:40:48 lithium kernel: ata2.00: irq_stat 0x48000008, interface fatal error
Jan 09 11:40:48 lithium kernel: ata2: SError: { Handshk }
Jan 09 11:40:48 lithium kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Jan 09 11:40:48 lithium kernel: ata2.00: cmd 61/08:40:88:51:34/00:00:8d:02:00/40 tag 8 ncq dma 4096 out
                                         res 40/00:40:88:51:34/00:00:8d:02:00/40 Emask 0x10 (ATA bus error)
Jan 09 11:40:48 lithium kernel: ata2.00: status: { DRDY }
Jan 09 11:40:48 lithium kernel: ata2: hard resetting link
Jan 09 11:40:48 lithium kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 09 11:40:48 lithium kernel: ata2.00: configured for UDMA/133
Jan 09 11:40:48 lithium kernel: ata2: EH complete
Jan 09 11:40:48 lithium kernel: ata2.00: exception Emask 0x11 SAct 0x400000 SErr 0x400000 action 0x6 frozen
Jan 09 11:40:48 lithium kernel: ata2.00: irq_stat 0x48000008, interface fatal error
Jan 09 11:40:48 lithium kernel: ata2: SError: { Handshk }
Jan 09 11:40:48 lithium kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Jan 09 11:40:48 lithium kernel: ata2.00: cmd 61/08:b0:88:51:34/00:00:8d:02:00/40 tag 22 ncq dma 4096 out
                                         res 40/00:b0:88:51:34/00:00:8d:02:00/40 Emask 0x10 (ATA bus error)
Jan 09 11:40:48 lithium kernel: ata2.00: status: { DRDY }
Jan 09 11:40:48 lithium kernel: ata2: hard resetting link
Jan 09 11:40:49 lithium kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 09 11:40:49 lithium kernel: ata2.00: configured for UDMA/133
Jan 09 11:40:49 lithium kernel: ata2: EH complete
Jan 09 11:40:49 lithium kernel: ata2: limiting SATA link speed to 1.5 Gbps
Jan 09 11:40:49 lithium kernel: ata2.00: exception Emask 0x11 SAct 0x1000000 SErr 0x400000 action 0x6 frozen
Jan 09 11:40:49 lithium kernel: ata2.00: irq_stat 0x48000008, interface fatal error
Jan 09 11:40:49 lithium kernel: ata2: SError: { Handshk }
Jan 09 11:40:49 lithium kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Jan 09 11:40:49 lithium kernel: ata2.00: cmd 61/08:c0:88:51:34/00:00:8d:02:00/40 tag 24 ncq dma 4096 out
                                         res 40/00:c0:88:51:34/00:00:8d:02:00/40 Emask 0x10 (ATA bus error)
Jan 09 11:40:49 lithium kernel: ata2.00: status: { DRDY }
Jan 09 11:40:49 lithium kernel: ata2: hard resetting link
Jan 09 11:40:51 lithium kernel: ata2: SATA link down (SStatus 1 SControl 310)
Jan 09 11:40:51 lithium kernel: ata2: hard resetting link
Jan 09 11:40:53 lithium kernel: ata2: SATA link down (SStatus 1 SControl 310)
Jan 09 11:40:53 lithium kernel: ata2: hard resetting link
Jan 09 11:40:55 lithium kernel: ata2: SATA link down (SStatus 1 SControl 310)
Jan 09 11:40:55 lithium kernel: ata2.00: disabled
Jan 09 11:40:55 lithium kernel: sd 1:0:0:0: [sdb] tag#24 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=8s
Jan 09 11:40:55 lithium kernel: sd 1:0:0:0: [sdb] tag#24 Sense Key : Not Ready [current]
Jan 09 11:40:55 lithium kernel: sd 1:0:0:0: [sdb] tag#24 Add. Sense: Logical unit not ready, hard reset required
Jan 09 11:40:55 lithium kernel: sd 1:0:0:0: [sdb] tag#24 CDB: Write(16) 8a 00 00 00 00 02 8d 34 51 88 00 00 00 08 00 00
Jan 09 11:40:55 lithium kernel: blk_update_request: I/O error, dev sdb, sector 10958950792 op 0x1:(WRITE) flags 0x1000 phys_seg 1 prio class 0
Jan 09 11:40:55 lithium kernel: bcachefs (sdb): btree write error: I/O
Jan 09 11:40:55 lithium kernel: ata2: EH complete
Jan 09 11:40:55 lithium kernel: sd 1:0:0:0: rejecting I/O to offline device
Jan 09 11:40:55 lithium kernel: blk_update_request: I/O error, dev sdb, sector 0 op 0x2:(FLUSH) flags 0x0 phys_seg 0 prio class 0
Jan 09 11:40:55 lithium kernel: bcachefs (sdb): error writing journal entry 681298: I/O
Jan 09 11:40:55 lithium kernel: blk_update_request: I/O error, dev sdb, sector 10959058624 op 0x1:(WRITE) flags 0x1000 phys_seg 1 prio class 0
Jan 09 11:40:55 lithium kernel: bcachefs (sdb): btree write error: I/O
Jan 09 11:40:55 lithium kernel: ata2.00: detaching (SCSI 1:0:0:0)
Jan 09 11:40:56 lithium kernel: bcachefs (sdb): btree write error: I/O
Jan 09 11:40:56 lithium kernel: sd 1:0:0:0: [sdb] Stopping disk
Jan 09 11:40:56 lithium kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 09 11:40:56 lithium kernel: ata2.00: ATA-10: ST6000VN001-2BB186, SC60, max UDMA/133
Jan 09 11:40:56 lithium kernel: ata2.00: 11721045168 sectors, multi 16: LBA48 NCQ (depth 32), AA
Jan 09 11:40:56 lithium kernel: ata2.00: configured for UDMA/133
Jan 09 11:40:56 lithium kernel: blk_update_request: I/O error, dev sdb, sector 10959926512 op 0x1:(WRITE) flags 0x1000 phys_seg 1 prio class 0
Jan 09 11:40:56 lithium kernel: bcachefs (sdb): btree write error: I/O
Jan 09 11:40:56 lithium kernel: blk_update_request: I/O error, dev sdb, sector 10960288440 op 0x1:(WRITE) flags 0x1000 phys_seg 1 prio class 0
Jan 09 11:40:56 lithium kernel: bcachefs (sdb): btree write error: I/O
Jan 09 11:40:56 lithium kernel: blk_update_request: I/O error, dev sdb, sector 10959254488 op 0x1:(WRITE) flags 0x1000 phys_seg 1 prio class 0
Jan 09 11:40:56 lithium kernel: bcachefs (sdb): btree write error: I/O
Jan 09 11:40:56 lithium kernel: blk_update_request: I/O error, dev sdb, sector 10959692056 op 0x1:(WRITE) flags 0x1000 phys_seg 1 prio class 0
Jan 09 11:40:56 lithium kernel: bcachefs (sdb): btree write error: I/O
Jan 09 11:40:56 lithium kernel: blk_update_request: I/O error, dev sdb, sector 10960237528 op 0x1:(WRITE) flags 0x1000 phys_seg 1 prio class 0
Jan 09 11:40:57 lithium kernel: bcachefs (sdb): btree write error: I/O
Jan 09 11:40:57 lithium kernel: blk_update_request: I/O error, dev sdb, sector 10960243000 op 0x1:(WRITE) flags 0x1000 phys_seg 1 prio class 0
Jan 09 11:40:57 lithium kernel: bcachefs (sdb): btree write error: I/O
Jan 09 11:40:57 lithium kernel: blk_update_request: I/O error, dev sdb, sector 10960231872 op 0x1:(WRITE) flags 0x1000 phys_seg 1 prio class 0
Jan 09 11:40:57 lithium kernel: bcachefs (sdb): btree write error: I/O
Jan 09 11:40:57 lithium kernel: bcachefs (sdb): error writing journal entry 681299: I/O
Jan 09 11:40:57 lithium kernel: scsi 1:0:0:0: Direct-Access     ATA      ST6000VN001-2BB1 SC60 PQ: 0 ANSI: 5
Jan 09 11:40:57 lithium kernel: sd 1:0:0:0: [sdg] 11721045168 512-byte logical blocks: (6.00 TB/5.46 TiB)
Jan 09 11:40:57 lithium kernel: sd 1:0:0:0: [sdg] 4096-byte physical blocks
Jan 09 11:40:57 lithium kernel: sd 1:0:0:0: [sdg] Write Protect is off
Jan 09 11:40:57 lithium kernel: sd 1:0:0:0: [sdg] Mode Sense: 00 3a 00 00
Jan 09 11:40:57 lithium kernel: sd 1:0:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jan 09 11:40:57 lithium kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0
Jan 09 11:40:57 lithium dbus-daemon[811]: [system] Activating via systemd: service name='org.freedesktop.Avahi' unit='dbus-org.freedesktop.Avahi.service' requested by ':1.253' (uid=971 pid=234529 comm="/usr/lib/colord-sane ")
Jan 09 11:40:57 lithium dbus-daemon[811]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.Avahi.service': Unit dbus-org.freedesktop.Avahi.service not found.
Jan 09 11:40:59 lithium kernel: bcachefs (sdb): error writing journal entry 681300: I/O
Jan 09 11:41:00 lithium kernel: bcachefs (sdb): error writing journal entry 681301: I/O
Jan 09 11:41:00 lithium kernel: sd 1:0:0:0: [sdg] Attached SCSI disk
Jan 09 11:41:00 lithium kernel: bcachefs (sdb): error writing journal entry 681302: I/O
Jan 09 11:41:00 lithium kernel: bcachefs (sdb inum 83899478 offset 1024): data write error: I/O
Jan 09 11:41:00 lithium kernel: bcachefs (sdb inum 83899478 offset 1016): data write error: I/O
Jan 09 11:41:00 lithium kernel: bcachefs (sdb inum 83899478 offset 1576): data write error: I/O
Jan 09 11:41:00 lithium kernel: bcachefs (sdb inum 83899478 offset 3064): data write error: I/O
Jan 09 11:41:00 lithium kernel: bcachefs (sdb inum 83899478 offset 3624): data write error: I/O
Jan 09 11:41:00 lithium kernel: bcachefs (sdb inum 83899478 offset 5104): data write error: I/O
Jan 09 11:41:00 lithium kernel: bcachefs (sdb inum 83899478 offset 5096): data write error: I/O
Jan 09 11:41:00 lithium kernel: bcachefs (sdb inum 83899478 offset 5672): data write error: I/O
Jan 09 11:41:00 lithium kernel: bcachefs (sdb inum 83899478 offset 7152): data write error: I/O
Jan 09 11:41:00 lithium kernel: bcachefs (sdb inum 83899478 offset 7136): data write error: I/O
Jan 09 11:41:01 lithium kernel: bcachefs (sdb): superblock write error: I/O
Jan 09 11:41:01 lithium kernel: bcachefs (000000003b19194f): Unable to write superblock to sufficient devices (from bch2_mark_replicas_slowpath [bcachefs])
Jan 09 11:41:01 lithium kernel: bcachefs (000000003b19194f): emergency read only
Jan 09 11:41:01 lithium kernel: bcachefs (000000003b19194f inum 83899478): write error -30 from btree update
Jan 09 11:41:01 lithium kernel: bcachefs (000000003b19194f inum 83899478): write error -30 from btree update
Jan 09 11:41:01 lithium kernel: bcachefs (000000003b19194f inum 83899478): write error -30 from btree update
Jan 09 11:41:01 lithium kernel: bcachefs (000000003b19194f inum 83899478): write error -30 from btree update
Jan 09 11:41:01 lithium kernel: bcachefs (000000003b19194f inum 83899478): write error -30 from btree update
Jan 09 11:41:01 lithium kernel: bcachefs (000000003b19194f inum 83899478): write error -30 from btree update
demizer commented 2 years ago

More syslog output on unmount:

Jan 09 11:55:20 lithium kernel: ------------[ cut here ]------------
Jan 09 11:55:20 lithium kernel: kernfs: can not remove 'bcachefs', no directory
Jan 09 11:55:20 lithium kernel: WARNING: CPU: 0 PID: 247004 at fs/kernfs/dir.c:1535 kernfs_remove_by_name_ns+0x89/0xa0
Jan 09 11:55:20 lithium kernel: Modules linked in: bcachefs crc64 hid_logitech_hidpp hid_steam hid_logitech_dj snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device mc mousedev joydev wacom btusb btrtl btbcm btintel vfat bluetooth fat usbhid ecdh_generic crc16 snd_hda_codec_hdmi snd_hda_intel iwlmvm snd_intel_dspcfg intel_rapl_msr snd_intel_sdw_acpi intel_rapl_common amd64_edac snd_hda_codec edac_mce_amd mac80211 snd_hda_core kvm_amd ucsi_ccg libarc4 snd_hwdep kvm typec_ucsi typec irqbypass iwlwifi crct10dif_pclmul roles gigabyte_wmi wmi_bmof mxm_wmi crc32_pclmul snd_pcm ghash_clmulni_intel aesni_intel snd_timer crypto_simd cryptd rapl cfg80211 igb snd pcspkr ccp soundcore sp5100_tco i2c_nvidia_gpu k10temp i2c_piix4 rng_core rfkill dca wmi pinctrl_amd gpio_amdpt gpio_generic mac_hid acpi_cpufreq sg crypto_user fuse bpf_preload ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq xhci_pci xhci_pci_renesas nvidia_drm(POE) nvidia_uvm(POE) nvidia_modeset(POE)
Jan 09 11:55:20 lithium kernel:  nvidia(POE)
Jan 09 11:55:20 lithium kernel: CPU: 0 PID: 247004 Comm: umount Tainted: P           OE     5.15.12-arch1-1-bcachefs-git-384152-ga419efdd5697 #1 af4c7d38044bd5a31fce62e77ca1e3b236a5a583
Jan 09 11:55:20 lithium kernel: Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE EX-CF, BIOS F12 12/11/2019
Jan 09 11:55:20 lithium kernel: RIP: 0010:kernfs_remove_by_name_ns+0x89/0xa0
Jan 09 11:55:20 lithium kernel: Code: 41 5c 41 5d 31 d2 89 d6 89 d7 c3 48 c7 c7 60 82 32 a0 e8 aa 97 d0 ff b8 fe ff ff ff eb e1 48 c7 c7 c8 3d aa 9f e8 a9 90 75 00 <0f> 0b b8 fe ff ff ff eb cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
Jan 09 11:55:20 lithium kernel: RSP: 0018:ffffbc12c3447e20 EFLAGS: 00010246
Jan 09 11:55:20 lithium kernel: RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
Jan 09 11:55:20 lithium kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Jan 09 11:55:20 lithium kernel: RBP: ffffa0f2ea3c0000 R08: 0000000000000000 R09: 0000000000000000
Jan 09 11:55:20 lithium kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc36b01b4
Jan 09 11:55:20 lithium kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Jan 09 11:55:20 lithium kernel: FS:  00007f77b4016580(0000) GS:ffffa1013d200000(0000) knlGS:0000000000000000
Jan 09 11:55:20 lithium kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 09 11:55:20 lithium kernel: CR2: 00007f7561982698 CR3: 0000000959c34000 CR4: 00000000003506f0
Jan 09 11:55:20 lithium kernel: Call Trace:
Jan 09 11:55:20 lithium kernel:  <TASK>
Jan 09 11:55:20 lithium kernel:  __bch2_fs_stop+0xc2/0x290 [bcachefs 1b62109d01a8d4571ae2e6e5fbed6611c6e63892]
Jan 09 11:55:20 lithium kernel:  generic_shutdown_super+0x74/0x110
Jan 09 11:55:20 lithium kernel:  bch2_kill_sb+0x12/0x20 [bcachefs 1b62109d01a8d4571ae2e6e5fbed6611c6e63892]
Jan 09 11:55:20 lithium kernel:  deactivate_locked_super+0x36/0xa0
Jan 09 11:55:20 lithium kernel:  cleanup_mnt+0x131/0x190
Jan 09 11:55:20 lithium kernel:  task_work_run+0x5c/0x90
Jan 09 11:55:20 lithium kernel:  exit_to_user_mode_prepare+0x17e/0x180
Jan 09 11:55:20 lithium kernel:  syscall_exit_to_user_mode+0x23/0x50
Jan 09 11:55:20 lithium kernel:  do_syscall_64+0x69/0x90
Jan 09 11:55:20 lithium kernel:  ? exc_page_fault+0x72/0x180
Jan 09 11:55:20 lithium kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Jan 09 11:55:20 lithium kernel: RIP: 0033:0x7f77b417961b
Jan 09 11:55:20 lithium kernel: Code: 18 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 21 18 0c 00 f7 d8
Jan 09 11:55:20 lithium kernel: RSP: 002b:00007ffde55c81d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
Jan 09 11:55:20 lithium kernel: RAX: 0000000000000000 RBX: 000055fba839e4c0 RCX: 00007f77b417961b
Jan 09 11:55:20 lithium kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055fba839e510
Jan 09 11:55:20 lithium kernel: RBP: 000055fba8399630 R08: 0000000000000000 R09: 00007ffde55c6f60
Jan 09 11:55:20 lithium kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Jan 09 11:55:20 lithium kernel: R13: 000055fba839e510 R14: 000055fba8399740 R15: 000055fba8399630
Jan 09 11:55:20 lithium kernel:  </TASK>
Jan 09 11:55:20 lithium kernel: ---[ end trace 6276d34c2dc4cf05 ]---
demizer commented 2 years ago

Well, got this attempting to remount:

Jan 09 11:57:19 lithium kernel: bcachefs (0000000069289adf): recovering from unclean shutdown
Jan 09 11:57:22 lithium kernel: bcachefs (0000000069289adf): sdg sector 8005160 seq 643647: journal checksum bad, exiting
Jan 09 11:57:22 lithium kernel: bcachefs (0000000069289adf): Unable to continue, halting
Jan 09 11:58:27 lithium kernel: bcachefs (0000000069289adf): Error in recovery: cannot allocate memory (1)
Jan 09 11:58:27 lithium kernel: bcachefs (0000000069289adf): filesystem contains errors: please report this to the developers
Jan 09 11:58:27 lithium kernel: mount with -o fix_errors to repair

I have 64gb of ECC memory, and only 20% in use at the time of error.

demizer commented 2 years ago

And here is output when trying to remount with "-o fix_errors":

Jan 09 12:01:30 lithium kernel: bcachefs (000000001aafd54d): recovering from unclean shutdown
Jan 09 12:02:38 lithium kernel: bcachefs (000000001aafd54d): journal read done, 345 keys in 4 entries, seq 681303
Jan 09 12:02:38 lithium kernel: ata2.00: exception Emask 0x11 SAct 0x1 SErr 0x400000 action 0x6 frozen
Jan 09 12:02:38 lithium kernel: ata2.00: irq_stat 0x48000008, interface fatal error
Jan 09 12:02:38 lithium kernel: ata2: SError: { Handshk }
Jan 09 12:02:38 lithium kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Jan 09 12:02:38 lithium kernel: ata2.00: cmd 61/8a:00:08:00:00/00:00:00:00:00/40 tag 0 ncq dma 70656 out
                                         res 40/00:00:08:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Jan 09 12:02:38 lithium kernel: ata2.00: status: { DRDY }
Jan 09 12:02:38 lithium kernel: ata2: hard resetting link
Jan 09 12:02:39 lithium kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 09 12:02:39 lithium kernel: ata2.00: configured for UDMA/133
Jan 09 12:02:39 lithium kernel: ata2: EH complete
Jan 09 12:02:39 lithium kernel: ata2.00: exception Emask 0x11 SAct 0x10000000 SErr 0x400000 action 0x6 frozen
Jan 09 12:02:39 lithium kernel: ata2.00: irq_stat 0x48000008, interface fatal error
Jan 09 12:02:39 lithium kernel: ata2: SError: { Handshk }
Jan 09 12:02:39 lithium kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Jan 09 12:02:39 lithium kernel: ata2.00: cmd 61/8a:e0:08:00:00/00:00:00:00:00/40 tag 28 ncq dma 70656 out
                                         res 40/00:e0:08:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Jan 09 12:02:39 lithium kernel: ata2.00: status: { DRDY }
Jan 09 12:02:39 lithium kernel: ata2: hard resetting link
Jan 09 12:02:39 lithium kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 09 12:02:39 lithium kernel: ata2.00: configured for UDMA/133
Jan 09 12:02:39 lithium kernel: ata2: EH complete
Jan 09 12:02:39 lithium kernel: ata2.00: exception Emask 0x11 SAct 0x400000 SErr 0x400000 action 0x6 frozen
Jan 09 12:02:39 lithium kernel: ata2.00: irq_stat 0x48000008, interface fatal error
Jan 09 12:02:39 lithium kernel: ata2: SError: { Handshk }
Jan 09 12:02:39 lithium kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Jan 09 12:02:39 lithium kernel: ata2.00: cmd 61/8a:b0:08:00:00/00:00:00:00:00/40 tag 22 ncq dma 70656 out
                                         res 40/00:b0:08:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Jan 09 12:02:39 lithium kernel: ata2.00: status: { DRDY }
Jan 09 12:02:39 lithium kernel: ata2: hard resetting link
Jan 09 12:02:40 lithium kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 09 12:02:40 lithium kernel: ata2.00: configured for UDMA/133
Jan 09 12:02:40 lithium kernel: ata2: EH complete
Jan 09 12:02:40 lithium kernel: ata2: limiting SATA link speed to 1.5 Gbps
Jan 09 12:02:40 lithium kernel: ata2.00: exception Emask 0x11 SAct 0x80000000 SErr 0x400000 action 0x6 frozen
Jan 09 12:02:40 lithium kernel: ata2.00: irq_stat 0x48000008, interface fatal error
Jan 09 12:02:40 lithium kernel: ata2: SError: { Handshk }
Jan 09 12:02:40 lithium kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Jan 09 12:02:40 lithium kernel: ata2.00: cmd 61/8a:f8:08:00:00/00:00:00:00:00/40 tag 31 ncq dma 70656 out
                                         res 40/00:f8:08:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Jan 09 12:02:40 lithium kernel: ata2.00: status: { DRDY }
Jan 09 12:02:40 lithium kernel: ata2: hard resetting link
Jan 09 12:02:42 lithium kernel: ata2: SATA link down (SStatus 1 SControl 310)
Jan 09 12:02:42 lithium kernel: ata2: hard resetting link
Jan 09 12:02:44 lithium kernel: ata2: SATA link down (SStatus 1 SControl 310)
Jan 09 12:02:44 lithium kernel: ata2: hard resetting link
Jan 09 12:02:47 lithium org.gnome.DiskUtility[242711]: Error calling Drive.Ata.PmGetState: GDBus.Error:org.freedesktop.UDisks2.Error.Failed: Error opening device file /dev/sdg while getting PM state: No such device or address (udisks-error-quark, 0)
Jan 09 12:02:47 lithium kernel: ata2: SATA link down (SStatus 1 SControl 310)
Jan 09 12:02:47 lithium kernel: ata2.00: disabled
Jan 09 12:02:47 lithium kernel: sd 1:0:0:0: [sdg] tag#31 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=8s
Jan 09 12:02:47 lithium kernel: sd 1:0:0:0: [sdg] tag#31 Sense Key : Not Ready [current]
Jan 09 12:02:47 lithium kernel: sd 1:0:0:0: [sdg] tag#31 Add. Sense: Logical unit not ready, hard reset required
Jan 09 12:02:47 lithium kernel: sd 1:0:0:0: [sdg] tag#31 CDB: Write(16) 8a 00 00 00 00 00 00 00 00 08 00 00 00 8a 00 00
Jan 09 12:02:47 lithium kernel: print_req_error: 1 callbacks suppressed
Jan 09 12:02:47 lithium kernel: blk_update_request: I/O error, dev sdg, sector 8 op 0x1:(WRITE) flags 0x1800 phys_seg 2 prio class 0
Jan 09 12:02:47 lithium kernel: bcachefs (sdg): superblock write error: I/O
Jan 09 12:02:47 lithium kernel: ata2: EH complete
Jan 09 12:02:47 lithium kernel: sd 1:0:0:0: rejecting I/O to offline device
Jan 09 12:02:47 lithium kernel: ata2.00: detaching (SCSI 1:0:0:0)
Jan 09 12:02:47 lithium kernel: bcachefs (000000001aafd54d): Unable to write superblock to sufficient devices (from bch2_journal_seq_blacklist_add [bcachefs])
Jan 09 12:02:47 lithium kernel: bcachefs (000000001aafd54d): emergency read only
Jan 09 12:02:47 lithium kernel: bcachefs (000000001aafd54d): error creating new journal seq blacklist entry
Jan 09 12:02:47 lithium kernel: bcachefs (000000001aafd54d): Error in recovery: cannot allocate memory (-1)
Jan 09 12:02:47 lithium kernel: ------------[ cut here ]------------
Jan 09 12:02:47 lithium kernel: kernfs: can not remove 'bcachefs', no directory
Jan 09 12:02:47 lithium kernel: WARNING: CPU: 31 PID: 253983 at fs/kernfs/dir.c:1535 kernfs_remove_by_name_ns+0x89/0xa0
Jan 09 12:02:47 lithium kernel: Modules linked in: bcachefs crc64 hid_logitech_hidpp hid_steam hid_logitech_dj snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device mc mousedev joydev wacom btusb btrtl btbcm btintel vfat bluetooth fat usbhid ecdh_generic crc16 snd_hda_codec_hdmi snd_hda_intel iwlmvm snd_intel_dspcfg intel_rapl_msr snd_intel_sdw_acpi intel_rapl_common amd64_edac snd_hda_codec edac_mce_amd mac80211 snd_hda_core kvm_amd ucsi_ccg libarc4 snd_hwdep kvm typec_ucsi typec irqbypass iwlwifi crct10dif_pclmul roles gigabyte_wmi wmi_bmof mxm_wmi crc32_pclmul snd_pcm ghash_clmulni_intel aesni_intel snd_timer crypto_simd cryptd rapl cfg80211 igb snd pcspkr ccp soundcore sp5100_tco i2c_nvidia_gpu k10temp i2c_piix4 rng_core rfkill dca wmi pinctrl_amd gpio_amdpt gpio_generic mac_hid acpi_cpufreq sg crypto_user fuse bpf_preload ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq xhci_pci xhci_pci_renesas nvidia_drm(POE) nvidia_uvm(POE) nvidia_modeset(POE)
Jan 09 12:02:47 lithium kernel:  nvidia(POE)
Jan 09 12:02:47 lithium kernel: CPU: 31 PID: 253983 Comm: mount Tainted: P        W  OE     5.15.12-arch1-1-bcachefs-git-384152-ga419efdd5697 #1 af4c7d38044bd5a31fce62e77ca1e3b236a5a583
Jan 09 12:02:47 lithium kernel: Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE EX-CF, BIOS F12 12/11/2019
Jan 09 12:02:47 lithium kernel: RIP: 0010:kernfs_remove_by_name_ns+0x89/0xa0
Jan 09 12:02:47 lithium kernel: Code: 41 5c 41 5d 31 d2 89 d6 89 d7 c3 48 c7 c7 60 82 32 a0 e8 aa 97 d0 ff b8 fe ff ff ff eb e1 48 c7 c7 c8 3d aa 9f e8 a9 90 75 00 <0f> 0b b8 fe ff ff ff eb cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
Jan 09 12:02:47 lithium kernel: RSP: 0018:ffffbc1207f07aa0 EFLAGS: 00010246
Jan 09 12:02:47 lithium kernel: RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
Jan 09 12:02:47 lithium kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Jan 09 12:02:47 lithium kernel: RBP: ffffa0f778700000 R08: 0000000000000000 R09: 0000000000000000
Jan 09 12:02:47 lithium kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc36b01b4
Jan 09 12:02:47 lithium kernel: R13: ffffa0f216c1c200 R14: ffffa0f779dd6000 R15: ffffa0f778700000
Jan 09 12:02:47 lithium kernel: FS:  00007f4580060580(0000) GS:ffffa1013d9c0000(0000) knlGS:0000000000000000
Jan 09 12:02:47 lithium kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 09 12:02:47 lithium kernel: CR2: 00003e76677f4000 CR3: 000000088b7b0000 CR4: 00000000003506e0
Jan 09 12:02:47 lithium kernel: Call Trace:
Jan 09 12:02:47 lithium kernel:  <TASK>
Jan 09 12:02:47 lithium kernel:  __bch2_fs_stop+0xc2/0x290 [bcachefs 1b62109d01a8d4571ae2e6e5fbed6611c6e63892]
Jan 09 12:02:47 lithium kernel:  bch2_fs_open+0x7eb/0x1480 [bcachefs 1b62109d01a8d4571ae2e6e5fbed6611c6e63892]
Jan 09 12:02:47 lithium kernel:  bch2_mount+0x567/0x6f0 [bcachefs 1b62109d01a8d4571ae2e6e5fbed6611c6e63892]
Jan 09 12:02:47 lithium kernel:  legacy_get_tree+0x27/0x50
Jan 09 12:02:47 lithium kernel:  vfs_get_tree+0x25/0xc0
Jan 09 12:02:47 lithium kernel:  path_mount+0x45a/0xab0
Jan 09 12:02:47 lithium kernel:  __x64_sys_mount+0x11f/0x160
Jan 09 12:02:47 lithium kernel:  do_syscall_64+0x5c/0x90
Jan 09 12:02:47 lithium kernel:  ? syscall_exit_to_user_mode+0x23/0x50
Jan 09 12:02:47 lithium kernel:  ? do_syscall_64+0x69/0x90
Jan 09 12:02:47 lithium kernel:  ? do_syscall_64+0x69/0x90
Jan 09 12:02:47 lithium kernel:  ? exc_page_fault+0x72/0x180
Jan 09 12:02:47 lithium kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Jan 09 12:02:47 lithium kernel: RIP: 0033:0x7f45801c444e
Jan 09 12:02:47 lithium kernel: Code: 48 8b 0d 25 0a 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f2 09 0c 00 f7 d8 64 89 01 48
Jan 09 12:02:47 lithium kernel: RSP: 002b:00007fff465fa5f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
Jan 09 12:02:47 lithium kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f45801c444e
Jan 09 12:02:47 lithium kernel: RDX: 000055bc461b2870 RSI: 000055bc461b2a30 RDI: 000055bc461b28d0
Jan 09 12:02:47 lithium kernel: RBP: 000055bc461b2640 R08: 000055bc461b2890 R09: 00007f4580285a60
Jan 09 12:02:47 lithium kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Jan 09 12:02:47 lithium kernel: R13: 000055bc461b2870 R14: 000055bc461b28d0 R15: 000055bc461b2640
Jan 09 12:02:47 lithium kernel:  </TASK>
Jan 09 12:02:47 lithium kernel: ---[ end trace 6276d34c2dc4cf06 ]---
Jan 09 12:02:47 lithium kernel: sd 1:0:0:0: [sdg] Stopping disk
Jan 09 12:02:47 lithium sudo[253982]: pam_unix(sudo:session): session closed for user root
Jan 09 12:02:47 lithium audit[253982]: USER_END pid=253982 uid=1000 auid=1000 ses=4 msg='op=PAM:session_close grantors=pam_systemd_home,pam_limits,pam_unix,pam_permit acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/7 res=success'
Jan 09 12:02:47 lithium audit[253982]: CRED_DISP pid=253982 uid=1000 auid=1000 ses=4 msg='op=PAM:setcred grantors=pam_faillock,pam_permit,pam_env,pam_faillock acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/7 res=success'
Jan 09 12:02:47 lithium kernel: audit: type=1106 audit(1641758567.446:204): pid=253982 uid=1000 auid=1000 ses=4 msg='op=PAM:session_close grantors=pam_systemd_home,pam_limits,pam_unix,pam_permit acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/7 res=success'
Jan 09 12:02:47 lithium kernel: audit: type=1104 audit(1641758567.446:205): pid=253982 uid=1000 auid=1000 ses=4 msg='op=PAM:setcred grantors=pam_faillock,pam_permit,pam_env,pam_faillock acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/7 res=success'
Jan 09 12:02:47 lithium kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 09 12:02:47 lithium kernel: ata2.00: ATA-10: ST6000VN001-2BB186, SC60, max UDMA/133
Jan 09 12:02:47 lithium kernel: ata2.00: 11721045168 sectors, multi 16: LBA48 NCQ (depth 32), AA
Jan 09 12:02:47 lithium kernel: ata2.00: configured for UDMA/133
Jan 09 12:02:48 lithium kernel: scsi 1:0:0:0: Direct-Access     ATA      ST6000VN001-2BB1 SC60 PQ: 0 ANSI: 5
Jan 09 12:02:48 lithium kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0
Jan 09 12:02:48 lithium kernel: sd 1:0:0:0: [sdb] 11721045168 512-byte logical blocks: (6.00 TB/5.46 TiB)
Jan 09 12:02:48 lithium kernel: sd 1:0:0:0: [sdb] 4096-byte physical blocks
Jan 09 12:02:48 lithium kernel: sd 1:0:0:0: [sdb] Write Protect is off
Jan 09 12:02:48 lithium kernel: sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
Jan 09 12:02:48 lithium kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jan 09 12:02:48 lithium dbus-daemon[811]: [system] Activating via systemd: service name='org.freedesktop.Avahi' unit='dbus-org.freedesktop.Avahi.service' requested by ':1.276' (uid=971 pid=254692 comm="/usr/lib/colord-sane ")
Jan 09 12:02:48 lithium dbus-daemon[811]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.Avahi.service': Unit dbus-org.freedesktop.Avahi.service not found.
Jan 09 12:02:51 lithium kernel: sd 1:0:0:0: [sdb] Attached SCSI disk

I am not 100% sure the controller on the motherboard is not shit.

demizer commented 2 years ago

After reboot, filesystem mounts:

Jan 09 12:10:48 lithium kernel: bcachefs (000000005b545b3a): recovering from unclean shutdown
Jan 09 12:11:56 lithium kernel: bcachefs (000000005b545b3a): journal read done, 345 keys in 4 entries, seq 681303
Jan 09 12:12:07 lithium kernel: bcachefs (000000005b545b3a): going read-write
Jan 09 12:12:07 lithium kernel: bcachefs (000000005b545b3a): journal replay done
Jan 09 12:12:08 lithium sudo[19864]: pam_unix(sudo:session): session closed for user root
Jan 09 12:12:08 lithium kernel: bcachefs (000000005b545b3a): mounted with opts: metadata_replicas=2,data_replicas=2,compression=zstd,foreground_target=ssds,background_target=hdds,promote_target=ssds,fix_errors,nojournal_transaction_names
demizer commented 2 years ago

I did have some data loss after the event. I will improve my backups.

demizer commented 2 years ago

Filesystem was un-usable. Problem kept occurring every time I tried to read from it while using Krita after a period of idle. Switched to LSI SAS 9210-8i and I am no longer having issues. It seems like the MB controller is putting a disk to sleep, or something, and then all hell breaks loose.

YellowOnion commented 2 years ago

Yeah all those ata errors hint the problem is lower level. Maybe double check to see if the controller driver has bugs.