Closed cyclingwithelephants closed 2 weeks ago
resuming again produced different dmesg
output. It looks like 2 of my disks have failed during this rebuild. I was hoping to confirm this? Interestingly, my install of scrutiny didn't alert me to this
smartctl 7.4 2023-08-01 r5530 [aarch64-linux-6.8.12-edge-rockchip64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Ultrastar (He10/12)
Device Model: WDC WD80EMAZ-00WJTA0
Serial Number: 1EGE740Z
LU WWN Device Id: 5 000cca 27ec60396
Firmware Version: 83.H0A83
User Capacity: 8,001,563,222,016 bytes [8.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
Device is: In smartctl database 7.3/5528
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jun 14 22:37:56 2024 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 93) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: (1069) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0004 127 127 054 Old_age Offline - 120
3 Spin_Up_Time 0x0007 188 188 024 Pre-fail Always - 361 (Average 337)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 486
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000a 100 100 067 Old_age Always - 0
8 Seek_Time_Performance 0x0004 128 128 020 Old_age Offline - 18
9 Power_On_Hours 0x0012 097 097 000 Old_age Always - 24750
10 Spin_Retry_Count 0x0012 100 100 060 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 296
22 Helium_Level 0x0023 100 100 025 Pre-fail Always - 100
192 Power-Off_Retract_Count 0x0032 089 089 000 Old_age Always - 13476
193 Load_Cycle_Count 0x0012 089 089 000 Old_age Always - 13476
194 Temperature_Celsius 0x0002 224 224 000 Old_age Always - 29 (Min/Max 13/38)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 3
SMART Error Log Version: 1
ATA Error Count: 3
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 3 occurred at disk power-on lifetime: 24630 hours (1026 days + 6 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 43 00 00 00 00 00 Error: ICRC, ABRT at LBA = 0x00000000 = 0
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 78 20 38 0a 86 40 08 02:09:41.558 READ FPDMA QUEUED
61 50 70 88 ea f7 40 08 02:09:41.523 WRITE FPDMA QUEUED
60 80 68 38 1e 86 40 08 02:09:41.521 READ FPDMA QUEUED
60 40 60 c8 18 86 40 08 02:09:41.520 READ FPDMA QUEUED
60 80 58 48 18 86 40 08 02:09:41.520 READ FPDMA QUEUED
Error 2 occurred at disk power-on lifetime: 24561 hours (1023 days + 9 hours)
When the command that caused the error occurred, the device was doing SMART Offline or Self-test.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 43 00 00 00 00 00 Error: ICRC, ABRT at LBA = 0x00000000 = 0
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 80 08 50 59 01 40 08 1d+20:15:38.609 READ FPDMA QUEUED
60 80 28 80 3a 01 40 08 1d+20:15:38.579 READ FPDMA QUEUED
60 80 20 00 3a 01 40 08 1d+20:15:38.578 READ FPDMA QUEUED
60 20 18 80 5a 01 40 08 1d+20:15:38.578 READ FPDMA QUEUED
60 30 10 d0 59 01 40 08 1d+20:15:38.576 READ FPDMA QUEUED
Error 1 occurred at disk power-on lifetime: 24534 hours (1022 days + 6 hours)
When the command that caused the error occurred, the device was doing SMART Offline or Self-test.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 43 00 00 00 00 00 Error: ICRC, ABRT at LBA = 0x00000000 = 0
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 80 00 a8 fe a4 40 08 17:13:01.042 READ FPDMA QUEUED
61 18 20 30 30 ca 40 08 17:13:01.005 WRITE FPDMA QUEUED
61 10 18 20 30 ca 40 08 17:13:01.004 WRITE FPDMA QUEUED
61 20 10 00 30 ca 40 08 17:13:01.004 WRITE FPDMA QUEUED
61 60 08 a0 d1 c3 40 08 17:13:01.004 WRITE FPDMA QUEUED
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 12302 -
# 2 Short offline Completed without error 00% 12233 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
The above only provides legacy SMART information - try 'smartctl -x' for more
<truncated>
[ 6572.220573] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763141 offset 790491136): __bch2_write(): error: insufficient_devices
[ 6572.220603] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763141 offset 790601728): __bch2_write(): error: insufficient_devices
[ 6572.220617] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763141 offset 790630400): __bch2_write(): error: insufficient_devices
[ 6572.220632] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763141 offset 790695936): __bch2_write(): error: insufficient_devices
[ 6572.220643] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763141 offset 790716416): __bch2_write(): error: insufficient_devices
[ 6572.223383] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763141 offset 790781952): __bch2_write(): error: insufficient_devices
[ 6572.223413] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763141 offset 791220224): __bch2_write(): error: insufficient_devices
[ 6572.223426] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763141 offset 791240704): __bch2_write(): error: insufficient_devices
[ 6572.223435] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763141 offset 791244800): __bch2_write(): error: insufficient_devices
[ 6572.223450] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763141 offset 791310336): __bch2_write(): error: insufficient_devices
[ 6577.228043] __bch2_write: 13794 callbacks suppressed
[ 6577.228054] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763169 offset 10010624): __bch2_write(): error: insufficient_devices
[ 6577.228081] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763169 offset 10022912): __bch2_write(): error: insufficient_devices
[ 6577.228097] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763169 offset 10076160): __bch2_write(): error: insufficient_devices
[ 6577.228110] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763169 offset 10448896): __bch2_write(): error: insufficient_devices
[ 6577.228123] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763169 offset 10485760): __bch2_write(): error: insufficient_devices
[ 6577.228137] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763169 offset 10534912): __bch2_write(): error: insufficient_devices
[ 6577.228147] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763169 offset 10711040): __bch2_write(): error: insufficient_devices
[ 6577.228161] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763169 offset 10756096): __bch2_write(): error: insufficient_devices
[ 6577.228173] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763169 offset 10797056): __bch2_write(): error: insufficient_devices
[ 6577.228184] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763169 offset 10809344): __bch2_write(): error: insufficient_devices
[ 6582.244465] __bch2_write: 15954 callbacks suppressed
[ 6582.244477] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763546 offset 211378176): __bch2_write(): error: insufficient_devices
[ 6582.244590] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763546 offset 211398656): __bch2_write(): error: insufficient_devices
[ 6582.244701] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763546 offset 211423232): __bch2_write(): error: insufficient_devices
[ 6582.244745] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763546 offset 211439616): __bch2_write(): error: insufficient_devices
[ 6582.244759] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763546 offset 212959232): __bch2_write(): error: insufficient_devices
[ 6582.244776] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763546 offset 212996096): __bch2_write(): error: insufficient_devices
[ 6582.244792] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763546 offset 213061632): __bch2_write(): error: insufficient_devices
[ 6582.244826] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763546 offset 213127168): __bch2_write(): error: insufficient_devices
[ 6582.244836] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763546 offset 213131264): __bch2_write(): error: insufficient_devices
[ 6582.244858] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763546 offset 213164032): __bch2_write(): error: insufficient_devices
[ 6587.252207] __bch2_write: 14795 callbacks suppressed
[ 6587.252221] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763550 offset 1540120576): __bch2_write(): error: insufficient_devices
[ 6587.252274] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763550 offset 1540186112): __bch2_write(): error: insufficient_devices
[ 6587.252293] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763550 offset 1540247552): __bch2_write(): error: insufficient_devices
[ 6587.252313] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763550 offset 1540313088): __bch2_write(): error: insufficient_devices
[ 6587.252330] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763550 offset 1540378624): __bch2_write(): error: insufficient_devices
[ 6587.252341] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763550 offset 1540382720): __bch2_write(): error: insufficient_devices
[ 6587.252359] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763550 offset 1540448256): __bch2_write(): error: insufficient_devices
[ 6587.252374] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763550 offset 1540481024): __bch2_write(): error: insufficient_devices
[ 6587.252392] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763550 offset 1540509696): __bch2_write(): error: insufficient_devices
[ 6587.252410] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763550 offset 1540575232): __bch2_write(): error: insufficient_devices
[ 6592.274324] __bch2_write: 13787 callbacks suppressed
[ 6592.274338] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763936 offset 200269824): __bch2_write(): error: insufficient_devices
[ 6592.274404] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763936 offset 200314880): __bch2_write(): error: insufficient_devices
[ 6592.274418] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763936 offset 200486912): __bch2_write(): error: insufficient_devices
[ 6592.274430] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763936 offset 200495104): __bch2_write(): error: insufficient_devices
[ 6592.274440] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763936 offset 200503296): __bch2_write(): error: insufficient_devices
[ 6592.274466] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763936 offset 200568832): __bch2_write(): error: insufficient_devices
[ 6592.274478] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763936 offset 200589312): __bch2_write(): error: insufficient_devices
[ 6592.274492] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763936 offset 200634368): __bch2_write(): error: insufficient_devices
[ 6592.274504] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763936 offset 200654848): __bch2_write(): error: insufficient_devices
[ 6592.274518] bcachefs (0eee462f-4912-4c9a-8c5c-b988a4bf0f42 inum 1073763936 offset 200720384): __bch2_write(): error: insufficient_devices
Looks like it was a loose SATA cable 🫠
summary
After replacing an ssd with a new one, my machine cannot run
bcachefs data job migrate
succesfully. Happy to provide any more info I canupdate: it looks like my disks have now failed during either this or a subsequent rebuild, so I'm not sure if the original issue is still valid
hardware + kernel
3x8TB HDD /dev/sd{a b c} 1x256GB SSD /dev/sdd 1x4TB SSD /dev/sde
machine:
6.8.12-edge-rockchip64
(this is running armbian)The RAM might simply be the issue, I've been trying to get this machine working in some fashion before throwing in the towel and getting something a little more capable. If this is the problem, I'd appreciate some guidance on minimum specs (at least RAM wise) for general purpose usage
bcachefs show-super
bcachefs fs usage -h
dmesg