Closed killmasta93 closed 3 years ago
Some SMART or other ATA command to the disk failed
It's a common disk-related thing. To verify that it's not a bug, run:
smartctl -x /dev/sda; echo $?
thanks for the reply this is the outcome i got, so when it says status error what does that mean?
smartctl -x /dev/sda; echo $?
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.34-1-pve] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: SEAGATE
Product: ST91000640SS
Revision: 0002
Compliance: SPC-4
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Logical block size: 512 bytes
Rotation Rate: 7200 rpm
Form Factor: 2.5 inches
Logical Unit id: 0x5000c50041ae521b
Serial number: 9XG1MJTN00009241K7QJ
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Sun Nov 22 18:19:28 2020 -05
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
Read Cache is: Enabled
Writeback Cache is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature: 36 C
Drive Trip Temperature: 68 C
Manufactured in week 18 of year 2012
Specified cycle count over device lifetime: 10000
Accumulated start-stop cycles: 134
Specified load-unload count over device lifetime: 300000
Accumulated load-unload cycles: 135
Elements in grown defect list: 0
Vendor (Seagate Cache) information
Blocks sent to initiator = 360877201
Blocks received from initiator = 2794696420
Blocks read from cache and sent to initiator = 378445983
Number of read and write commands whose size <= segment size = 70025532
Number of read and write commands whose size > segment size = 42
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 51959.90
number of minutes until next internal SMART test = 58
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 2498404395 0 0 2498404395 0 8980.862 0
write: 0 0 0 0 0 3651.434 0
Non-medium error count: 0
[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
No Self-tests have been logged
Background scan results log
Status: waiting until BMS interval timer expires
Accumulated power on time, hours:minutes 51959:54 [3117594 minutes]
Number of background scans performed: 377, scan progress: 0.00%
Number of background medium scans performed: 377
# when lba(hex) [sk,asc,ascq] reassign_status
1 8:50 00000000006cd99c [1,17,1] Recovered via rewrite in-place
2 10:55 00000000287e1e0c [1,17,1] Recovered via rewrite in-place
3 24:30 000000002fe36400 [1,17,1] Recovered via rewrite in-place
4 24:52 000000003ca41600 [1,17,1] Recovered via rewrite in-place
5 3053:13 0000000000000a7d [1,18,7] Recovered via rewrite in-place
6 5194:26 00000000258ae919 [1,18,7] Recovered via rewrite in-place
7 8170:50 000000000be6faba [1,18,7] Recovered via rewrite in-place
8 11943:20 000000006cf08d76 [1,17,1] Recovered via rewrite in-place
9 12914:06 0000000017e30644 [1,17,1] Recovered via rewrite in-place
10 13970:35 00000000261cff10 [1,18,7] Recovered via rewrite in-place
11 13975:55 0000000027c3b4a8 [1,18,7] Recovered via rewrite in-place
12 14949:08 0000000015f2c800 [1,17,1] Recovered via rewrite in-place
13 14965:12 000000000d244e5b [1,17,1] Recovered via rewrite in-place
14 15013:03 000000000d5e6400 [1,17,1] Recovered via rewrite in-place
15 15232:16 0000000016bfb400 [1,17,1] Recovered via rewrite in-place
16 15706:39 000000000e85fa00 [1,17,1] Recovered via rewrite in-place
17 17264:29 0000000028cffa15 [1,17,1] Recovered via rewrite in-place
18 17283:53 000000000be6fab9 [1,18,7] Recovered via rewrite in-place
19 17335:16 000000002a95d5e3 [1,17,1] Recovered via rewrite in-place
20 18302:42 0000000070fc90cd [1,17,1] Recovered via rewrite in-place
21 18835:41 0000000016670400 [1,17,1] Recovered via rewrite in-place
22 19103:50 0000000016dea000 [1,17,1] Recovered via rewrite in-place
23 21621:55 00000000732428a5 [1,17,1] Recovered via rewrite in-place
24 24443:10 00000000170a6440 [1,18,7] Recovered via rewrite in-place
25 25773:17 000000000be6faba [1,18,7] Recovered via rewrite in-place
26 29539:55 0000000000000a7d [1,18,7] Recovered via rewrite in-place
27 31899:54 0000000004936321 [1,17,1] Recovered via rewrite in-place
28 33537:48 000000000be6fabb [1,18,7] Recovered via rewrite in-place
29 33571:15 00000000268da000 [1,17,1] Recovered via rewrite in-place
30 35630:44 0000000008931e1c [1,17,1] Recovered via rewrite in-place
31 36962:09 000000001e24405e [1,17,1] Recovered via rewrite in-place
32 38722:58 0000000006a5479e [1,18,7] Recovered via rewrite in-place
33 39932:13 000000001323a008 [1,18,7] Recovered via rewrite in-place
34 40127:22 00000000287429bf [1,18,7] Recovered via rewrite in-place
35 42273:45 000000000da64e17 [1,18,7] Recovered via rewrite in-place
36 43306:20 0000000000000a7d [1,18,7] Recovered via rewrite in-place
37 43591:06 000000000da64e17 [1,18,7] Recovered via rewrite in-place
38 43948:58 0000000035f0a391 [1,17,1] Recovered via rewrite in-place
39 45119:54 000000000d9b7400 [1,17,1] Recovered via rewrite in-place
40 45363:48 000000000ccaf602 [1,17,1] Recovered via rewrite in-place
41 45669:58 00000000667e655f [1,17,1] Recovered via rewrite in-place
42 45671:50 0000000005370c19 [1,17,1] Recovered via rewrite in-place
43 46201:56 00000000071fba1e [1,17,1] Recovered via rewrite in-place
44 49123:07 0000000057640932 [1,17,1] Recovered via rewrite in-place
45 49137:19 00000000092eb600 [1,17,1] Recovered via rewrite in-place
46 49534:49 000000000b2c5600 [1,17,1] Recovered via rewrite in-place
47 49706:53 000000002e903a00 [1,17,1] Recovered via rewrite in-place
48 49891:33 000000000d198800 [1,17,1] Recovered via rewrite in-place
49 50176:44 000000006d8d6bf4 [1,17,1] Recovered via rewrite in-place
50 50211:59 0000000016833400 [1,17,1] Recovered via rewrite in-place
51 50228:59 000000003c10c200 [1,17,1] Recovered via rewrite in-place
52 50326:10 00000000261d9600 [1,17,1] Recovered via rewrite in-place
53 50473:20 0000000008379397 [1,17,1] Recovered via rewrite in-place
54 50759:46 000000001eee4600 [1,17,1] Recovered via rewrite in-place
55 50979:45 0000000036405a00 [1,17,1] Recovered via rewrite in-place
56 50979:48 0000000036405a00 [1,17,1] Recovered via rewrite in-place
57 51146:09 0000000036918a00 [1,17,1] Recovered via rewrite in-place
58 51161:25 0000000036bb2000 [1,17,1] Recovered via rewrite in-place
59 51231:12 000000006aa774a8 [1,17,1] Recovered via rewrite in-place
60 51784:04 000000003c3f5400 [1,17,1] Recovered via rewrite in-place
61 51813:44 000000002e189800 [1,17,1] Recovered via rewrite in-place
62 51814:09 000000002e406200 [1,17,1] Recovered via rewrite in-place
Protocol Specific port log page for SAS SSP
relative target port id = 1
generation code = 4
number of phys = 1
phy identifier = 0
attached device type: SAS or SATA device
attached reason: unknown
reason: loss of dword synchronization
negotiated logical link rate: phy enabled; 6 Gbps
attached initiator port: ssp=1 stp=1 smp=1
attached target port: ssp=0 stp=0 smp=0
SAS address = 0x5000c50041ae5219
attached SAS address = 0x5782bcb052622204
attached phy identifier = 7
Invalid DWORD count = 0
Running disparity error count = 0
Loss of DWORD synchronization = 10
Phy reset problem = 0
Phy event descriptors:
Invalid word count: 0
Running disparity error count: 0
Loss of dword synchronization count: 10
Phy reset problem count: 0
relative target port id = 2
generation code = 4
number of phys = 1
phy identifier = 1
attached device type: no device attached
attached reason: unknown
reason: unknown
negotiated logical link rate: phy enabled; 1.5 Gbps
attached initiator port: ssp=0 stp=0 smp=0
attached target port: ssp=0 stp=0 smp=0
SAS address = 0x5000c50041ae521a
attached SAS address = 0x0
attached phy identifier = 0
Invalid DWORD count = 0
Running disparity error count = 0
Loss of DWORD synchronization = 0
Phy reset problem = 0
Phy event descriptors:
Invalid word count: 0
Running disparity error count: 0
Loss of dword synchronization count: 0
Phy reset problem count: 0
0
Hm. Wrong disk. Try:
smartctl -x /dev/sdb; echo $?
thanks for the quick reply, this is the outcome
smartctl -x /dev/sdb; echo $?
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.34-1-pve] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: SEAGATE
Product: ST91000640SS
Revision: 0001
Compliance: SPC-3
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Logical block size: 512 bytes
Rotation Rate: 7200 rpm
Form Factor: 2.5 inches
Logical Unit id: 0x5000c50033e0be0f
Serial number: 9XG0510S00009129VN7B
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Sun Nov 22 18:23:35 2020 -05
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
Read Cache is: Enabled
Writeback Cache is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature: 0 C
Drive Trip Temperature: 0 C
Elements in grown defect list: 0
Error Counter logging not supported
[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
Device does not support Self Test logging
Device does not support Background scan results logging
4
Last three lines is your answer. You could acknowledge the problem or disable it for this disk (I recommend serial-based identification before doing that).
thanks for the reply, so how can i know a disk is about to fail? i would look at the Elements in grown defect list?
Thank you
That's correct. Also Non-medium error count
and disk log (separate trigger).
thanks for the reply, i do have a few other servers with Non-medium error count
i was reading a bit that may be a false alarm but not sure from your experience what have you seen?
as for the disk log i saw this one time
| sdb: The device error log contains records of errors
-- | --
man smartctl:
Bit 6: The device error log contains records of errors.
but i checked with crystal disk but its SATA and not SAS
The device could operate for years with errors in the log, as well as could fail tomorrow. Most concerning parameter is Reallocated Sectors: if there's a high raise - the device most likely fail.
Describe the bug Hi currently checking the data and saw all the disks on status the error code 4
| ERR_CODE_4
Screenshots![image](https://user-images.githubusercontent.com/13953629/99919390-21be7a00-2ceb-11eb-9051-530ba31a68a5.png)
i get this information
Thank you