Open rstrlcpy opened 5 years ago
Hi Roman, I might have another reproducer for a similar issue. Can you add the ereport?
erepot? What is it?
FMA ereport. On illumos fmdump -eV
or ZoL zpool events -v
# zpool status rtest 1
pool: rtest
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: scrub repaired 0B in 0h0m with 0 errors on Tue Apr 30 19:29:46 2019
config:
NAME STATE READ WRITE CKSUM
rtest ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ram0 ONLINE 0 0 0
ram1 ONLINE 0 0 47
errors: No known data errors
There are no errors in the report:
``` TIME CLASS Apr 30 2019 19:29:42.878638170 sysevent.fs.zfs.scrub_start version = 0x0 class = "sysevent.fs.zfs.scrub_start" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a226 0x345ef45a eid = 0x205 Apr 30 2019 19:29:42.878638170 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "func=1 mintxg=0 maxtxg=2473" history_internal_name = "scan setup" history_txg = 0x9a9 history_time = 0x5cc8a226 time = 0x5cc8a226 0x345ef45a eid = 0x206 Apr 30 2019 19:29:43.038657548 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "errors=0" history_internal_name = "scan done" history_txg = 0x9aa history_time = 0x5cc8a227 time = 0x5cc8a227 0x24dde0c eid = 0x207 Apr 30 2019 19:29:43.038657548 sysevent.fs.zfs.scrub_finish version = 0x0 class = "sysevent.fs.zfs.scrub_finish" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a227 0x24dde0c eid = 0x208 Apr 30 2019 19:29:43.090663852 sysevent.fs.zfs.scrub_start version = 0x0 class = "sysevent.fs.zfs.scrub_start" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a227 0x5676bac eid = 0x209 Apr 30 2019 19:29:43.090663852 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "func=1 mintxg=0 maxtxg=2476" history_internal_name = "scan setup" history_txg = 0x9ac history_time = 0x5cc8a227 time = 0x5cc8a227 0x5676bac eid = 0x20a Apr 30 2019 19:29:43.270685676 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "errors=0" history_internal_name = "scan done" history_txg = 0x9ad history_time = 0x5cc8a227 time = 0x5cc8a227 0x102255ec eid = 0x20b Apr 30 2019 19:29:43.270685676 sysevent.fs.zfs.scrub_finish version = 0x0 class = "sysevent.fs.zfs.scrub_finish" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a227 0x102255ec eid = 0x20c Apr 30 2019 19:29:43.322691981 sysevent.fs.zfs.scrub_start version = 0x0 class = "sysevent.fs.zfs.scrub_start" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a227 0x133be38d eid = 0x20d Apr 30 2019 19:29:43.322691981 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "func=1 mintxg=0 maxtxg=2479" history_internal_name = "scan setup" history_txg = 0x9af history_time = 0x5cc8a227 time = 0x5cc8a227 0x133be38d eid = 0x20e Apr 30 2019 19:29:43.490712349 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "errors=0" history_internal_name = "scan done" history_txg = 0x9b0 history_time = 0x5cc8a227 time = 0x5cc8a227 0x1d3fad1d eid = 0x20f Apr 30 2019 19:29:43.490712349 sysevent.fs.zfs.scrub_finish version = 0x0 class = "sysevent.fs.zfs.scrub_finish" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a227 0x1d3fad1d eid = 0x210 Apr 30 2019 19:29:43.554720109 sysevent.fs.zfs.scrub_start version = 0x0 class = "sysevent.fs.zfs.scrub_start" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a227 0x21105b6d eid = 0x211 Apr 30 2019 19:29:43.562721079 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "func=1 mintxg=0 maxtxg=2486" history_internal_name = "scan setup" history_txg = 0x9b6 history_time = 0x5cc8a227 time = 0x5cc8a227 0x218a7137 eid = 0x212 Apr 30 2019 19:29:43.742742902 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "errors=0" history_internal_name = "scan done" history_txg = 0x9b7 history_time = 0x5cc8a227 time = 0x5cc8a227 0x2c455b76 eid = 0x213 Apr 30 2019 19:29:43.742742902 sysevent.fs.zfs.scrub_finish version = 0x0 class = "sysevent.fs.zfs.scrub_finish" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a227 0x2c455b76 eid = 0x214 Apr 30 2019 19:29:43.766745812 sysevent.fs.zfs.scrub_start version = 0x0 class = "sysevent.fs.zfs.scrub_start" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a227 0x2db39cd4 eid = 0x215 Apr 30 2019 19:29:43.766745812 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "func=1 mintxg=0 maxtxg=2489" history_internal_name = "scan setup" history_txg = 0x9b9 history_time = 0x5cc8a227 time = 0x5cc8a227 0x2db39cd4 eid = 0x216 Apr 30 2019 19:29:43.958769091 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "errors=0" history_internal_name = "scan done" history_txg = 0x9ba history_time = 0x5cc8a227 time = 0x5cc8a227 0x3925a7c3 eid = 0x217 Apr 30 2019 19:29:43.958769091 sysevent.fs.zfs.scrub_finish version = 0x0 class = "sysevent.fs.zfs.scrub_finish" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a227 0x3925a7c3 eid = 0x218 Apr 30 2019 19:29:44.002774424 sysevent.fs.zfs.scrub_start version = 0x0 class = "sysevent.fs.zfs.scrub_start" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a228 0x2a5598 eid = 0x219 Apr 30 2019 19:29:44.002774424 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "func=1 mintxg=0 maxtxg=2492" history_internal_name = "scan setup" history_txg = 0x9bc history_time = 0x5cc8a228 time = 0x5cc8a228 0x2a5598 eid = 0x21a Apr 30 2019 19:29:44.222801136 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "errors=0" history_internal_name = "scan done" history_txg = 0x9bd history_time = 0x5cc8a228 time = 0x5cc8a228 0xd47acf0 eid = 0x21b Apr 30 2019 19:29:44.222801136 sysevent.fs.zfs.scrub_finish version = 0x0 class = "sysevent.fs.zfs.scrub_finish" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a228 0xd47acf0 eid = 0x21c Apr 30 2019 19:29:44.250804535 sysevent.fs.zfs.scrub_start version = 0x0 class = "sysevent.fs.zfs.scrub_start" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a228 0xef2f937 eid = 0x21d Apr 30 2019 19:29:44.250804535 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "func=1 mintxg=0 maxtxg=2495" history_internal_name = "scan setup" history_txg = 0x9bf history_time = 0x5cc8a228 time = 0x5cc8a228 0xef2f937 eid = 0x21e Apr 30 2019 19:29:44.438827360 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "errors=0" history_internal_name = "scan done" history_txg = 0x9c0 history_time = 0x5cc8a228 time = 0x5cc8a228 0x1a27f960 eid = 0x21f Apr 30 2019 19:29:44.438827360 sysevent.fs.zfs.scrub_finish version = 0x0 class = "sysevent.fs.zfs.scrub_finish" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a228 0x1a27f960 eid = 0x220 Apr 30 2019 19:29:44.474831731 sysevent.fs.zfs.scrub_start version = 0x0 class = "sysevent.fs.zfs.scrub_start" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a228 0x1c4d5b73 eid = 0x221 Apr 30 2019 19:29:44.474831731 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "func=1 mintxg=0 maxtxg=2498" history_internal_name = "scan setup" history_txg = 0x9c2 history_time = 0x5cc8a228 time = 0x5cc8a228 0x1c4d5b73 eid = 0x222 Apr 30 2019 19:29:44.662854556 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "errors=0" history_internal_name = "scan done" history_txg = 0x9c3 history_time = 0x5cc8a228 time = 0x5cc8a228 0x27825b9c eid = 0x223 Apr 30 2019 19:29:44.662854556 sysevent.fs.zfs.scrub_finish version = 0x0 class = "sysevent.fs.zfs.scrub_finish" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a228 0x27825b9c eid = 0x224 Apr 30 2019 19:29:44.690857955 sysevent.fs.zfs.scrub_start version = 0x0 class = "sysevent.fs.zfs.scrub_start" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a228 0x292da7e3 eid = 0x225 Apr 30 2019 19:29:44.690857955 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "func=1 mintxg=0 maxtxg=2501" history_internal_name = "scan setup" history_txg = 0x9c5 history_time = 0x5cc8a228 time = 0x5cc8a228 0x292da7e3 eid = 0x226 Apr 30 2019 19:29:44.874880295 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "errors=0" history_internal_name = "scan done" history_txg = 0x9c6 history_time = 0x5cc8a228 time = 0x5cc8a228 0x34259d27 eid = 0x227 Apr 30 2019 19:29:44.874880295 sysevent.fs.zfs.scrub_finish version = 0x0 class = "sysevent.fs.zfs.scrub_finish" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a228 0x34259d27 eid = 0x228 Apr 30 2019 19:29:44.930887093 sysevent.fs.zfs.scrub_start version = 0x0 class = "sysevent.fs.zfs.scrub_start" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a228 0x377c35b5 eid = 0x229 Apr 30 2019 19:29:44.930887093 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "func=1 mintxg=0 maxtxg=2506" history_internal_name = "scan setup" history_txg = 0x9ca history_time = 0x5cc8a228 time = 0x5cc8a228 0x377c35b5 eid = 0x22a Apr 30 2019 19:29:45.122910444 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "errors=0" history_internal_name = "scan done" history_txg = 0x9cb history_time = 0x5cc8a229 time = 0x5cc8a229 0x75376ec eid = 0x22b Apr 30 2019 19:29:45.122910444 sysevent.fs.zfs.scrub_finish version = 0x0 class = "sysevent.fs.zfs.scrub_finish" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a229 0x75376ec eid = 0x22c Apr 30 2019 19:29:45.170916288 sysevent.fs.zfs.scrub_start version = 0x0 class = "sysevent.fs.zfs.scrub_start" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a229 0xa2ff9c0 eid = 0x22d Apr 30 2019 19:29:45.170916288 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "func=1 mintxg=0 maxtxg=2509" history_internal_name = "scan setup" history_txg = 0x9cd history_time = 0x5cc8a229 time = 0x5cc8a229 0xa2ff9c0 eid = 0x22e Apr 30 2019 19:29:45.390943072 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "errors=0" history_internal_name = "scan done" history_txg = 0x9ce history_time = 0x5cc8a229 time = 0x5cc8a229 0x174d5160 eid = 0x22f Apr 30 2019 19:29:45.390943072 sysevent.fs.zfs.scrub_finish version = 0x0 class = "sysevent.fs.zfs.scrub_finish" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a229 0x174d5160 eid = 0x230 Apr 30 2019 19:29:45.470952811 sysevent.fs.zfs.scrub_start version = 0x0 class = "sysevent.fs.zfs.scrub_start" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a229 0x1c122b6b eid = 0x231 Apr 30 2019 19:29:45.470952811 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "func=1 mintxg=0 maxtxg=2516" history_internal_name = "scan setup" history_txg = 0x9d4 history_time = 0x5cc8a229 time = 0x5cc8a229 0x1c122b6b eid = 0x232 Apr 30 2019 19:29:45.670977160 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "errors=0" history_internal_name = "scan done" history_txg = 0x9d5 history_time = 0x5cc8a229 time = 0x5cc8a229 0x27fe4c88 eid = 0x233 Apr 30 2019 19:29:45.670977160 sysevent.fs.zfs.scrub_finish version = 0x0 class = "sysevent.fs.zfs.scrub_finish" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a229 0x27fe4c88 eid = 0x234 Apr 30 2019 19:29:45.694980082 sysevent.fs.zfs.scrub_start version = 0x0 class = "sysevent.fs.zfs.scrub_start" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a229 0x296c8df2 eid = 0x235 Apr 30 2019 19:29:45.694980082 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "func=1 mintxg=0 maxtxg=2519" history_internal_name = "scan setup" history_txg = 0x9d7 history_time = 0x5cc8a229 time = 0x5cc8a229 0x296c8df2 eid = 0x236 Apr 30 2019 19:29:45.883002970 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "errors=0" history_internal_name = "scan done" history_txg = 0x9d8 history_time = 0x5cc8a229 time = 0x5cc8a229 0x34a18e5a eid = 0x237 Apr 30 2019 19:29:45.883002970 sysevent.fs.zfs.scrub_finish version = 0x0 class = "sysevent.fs.zfs.scrub_finish" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a229 0x34a18e5a eid = 0x238 Apr 30 2019 19:29:45.939009788 sysevent.fs.zfs.scrub_start version = 0x0 class = "sysevent.fs.zfs.scrub_start" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a229 0x37f826fc eid = 0x239 Apr 30 2019 19:29:45.939009788 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "func=1 mintxg=0 maxtxg=2525" history_internal_name = "scan setup" history_txg = 0x9dd history_time = 0x5cc8a229 time = 0x5cc8a229 0x37f826fc eid = 0x23a Apr 30 2019 19:29:46.123032209 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "errors=0" history_internal_name = "scan done" history_txg = 0x9de history_time = 0x5cc8a22a time = 0x5cc8a22a 0x7555291 eid = 0x23b Apr 30 2019 19:29:46.123032209 sysevent.fs.zfs.scrub_finish version = 0x0 class = "sysevent.fs.zfs.scrub_finish" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a22a 0x7555291 eid = 0x23c Apr 30 2019 19:29:46.151035622 sysevent.fs.zfs.scrub_start version = 0x0 class = "sysevent.fs.zfs.scrub_start" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a22a 0x9009ee6 eid = 0x23d Apr 30 2019 19:29:46.151035622 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "func=1 mintxg=0 maxtxg=2528" history_internal_name = "scan setup" history_txg = 0x9e0 history_time = 0x5cc8a22a time = 0x5cc8a22a 0x9009ee6 eid = 0x23e Apr 30 2019 19:29:46.335058054 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "errors=0" history_internal_name = "scan done" history_txg = 0x9e1 history_time = 0x5cc8a22a time = 0x5cc8a22a 0x13f89486 eid = 0x23f Apr 30 2019 19:29:46.335058054 sysevent.fs.zfs.scrub_finish version = 0x0 class = "sysevent.fs.zfs.scrub_finish" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a22a 0x13f89486 eid = 0x240 Apr 30 2019 19:29:46.371062443 sysevent.fs.zfs.scrub_start version = 0x0 class = "sysevent.fs.zfs.scrub_start" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a22a 0x161df6ab eid = 0x241 Apr 30 2019 19:29:46.371062443 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "func=1 mintxg=0 maxtxg=2531" history_internal_name = "scan setup" history_txg = 0x9e3 history_time = 0x5cc8a22a time = 0x5cc8a22a 0x161df6ab eid = 0x242 Apr 30 2019 19:29:46.563085850 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 history_hostname = "zol-devel" history_internal_str = "errors=0" history_internal_name = "scan done" history_txg = 0x9e4 history_time = 0x5cc8a22a time = 0x5cc8a22a 0x2190021a eid = 0x243 Apr 30 2019 19:29:46.563085850 sysevent.fs.zfs.scrub_finish version = 0x0 class = "sysevent.fs.zfs.scrub_finish" pool = "rtest" pool_guid = 0x4888d3300934ecba pool_state = 0x0 pool_context = 0x0 time = 0x5cc8a22a 0x2190021a eid = 0x244 ```
!?! every checksum error should also generate an ereport. Does your build include 2bbec1c91 ?
It seem the change is part of 0.8.x release, but the test system is 0.7.5. Will try to build a 0.8.x release and repeat the test
I've built 0.8.0-rc4. The same issue and there are no any related events.
I can reproduce @Ramzec results with 0.8.0-rc4 on an Ubuntu 18 VM. I also didn't see any checksum events:
hutter@ubuntu18:~/zfs$ ./cmd/zpool/zpool status
pool: rtest
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: scrub in progress since Thu Feb 21 20:18:09 2019
87.3M scanned at 87.3M/s, 22.2M issued at 22.2M/s, 88.5M total
0B repaired, 25.02% done, 0 days 00:00:02 to go
config:
NAME STATE READ WRITE CKSUM
rtest ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ram0 ONLINE 0 0 162
ram1 ONLINE 0 0 54
errors: No known data errors
hutter@ubuntu18:~/zfs$ ./cmd/zpool/zpool events
TIME CLASS
cannot get event: permission denied
hutter@ubuntu18:~/zfs$ sudo ./cmd/zpool/zpool events
[sudo] password for hutter:
TIME CLASS
dropped 285 events
Feb 21 2019 20:18:22.900000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:22.900000000 sysevent.fs.zfs.scrub_finish
Feb 21 2019 20:18:22.924000000 sysevent.fs.zfs.scrub_start
Feb 21 2019 20:18:22.924000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.036000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.036000000 sysevent.fs.zfs.scrub_finish
Feb 21 2019 20:18:23.056000000 sysevent.fs.zfs.scrub_start
Feb 21 2019 20:18:23.056000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.116000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.116000000 sysevent.fs.zfs.scrub_finish
Feb 21 2019 20:18:23.140000000 sysevent.fs.zfs.scrub_start
Feb 21 2019 20:18:23.140000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.244000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.244000000 sysevent.fs.zfs.scrub_finish
Feb 21 2019 20:18:23.288000000 sysevent.fs.zfs.scrub_start
Feb 21 2019 20:18:23.292000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.428000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.428000000 sysevent.fs.zfs.scrub_finish
Feb 21 2019 20:18:23.468000000 sysevent.fs.zfs.scrub_start
Feb 21 2019 20:18:23.468000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.572000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.572000000 sysevent.fs.zfs.scrub_finish
Feb 21 2019 20:18:23.600000000 sysevent.fs.zfs.scrub_start
Feb 21 2019 20:18:23.600000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.720000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.720000000 sysevent.fs.zfs.scrub_finish
Feb 21 2019 20:18:23.764000000 sysevent.fs.zfs.scrub_start
Feb 21 2019 20:18:23.764000000 sysevent.fs.zfs.history_event
Feb 21 2019 20:18:23.908000000 sysevent.fs.zfs.history_event
...
Here is what is known by now: scrub might issue ZIO to right now attached disk, that causes the described CHKSUM errors. To solve this I implemented the following fix. With the fix ZFS will not start ZIO-scrub for a vdev, that has missing data
diff --git a/usr/src/uts/common/fs/zfs/vdev_mirror.c b/usr/src/uts/common/fs/zfs/vdev_mirror.c
index 4a869c9ea1..4843d2f7c8 100644
--- a/usr/src/uts/common/fs/zfs/vdev_mirror.c
+++ b/usr/src/uts/common/fs/zfs/vdev_mirror.c
@@ -445,6 +445,14 @@ vdev_mirror_io_start(zio_t *zio)
continue;
}
+ /*
+ * Possible the VDEV was attached in near past
+ * and resilvering is not finished yet for it,
+ * so we need to skip to avoid CHKSUM errors
+ */
+ if (vdev_dtl_contains(mc->mc_vd, DTL_MISSING, zio->io_txg, 1))
+ continue;
+
if (spec_case && c != target)
continue;
Also I found two code-blocks, where ZFS checks vdevs and if some of them is required resilvering. The first one switches code-path from SCRUB to RESILVER if a vdev is required resilvering. The another one doesn't do that switch. The following change fixes the issue:
diff --git a/usr/src/uts/common/fs/zfs/dsl_scan.c b/usr/src/uts/common/fs/zfs/dsl_scan.c
index f39afbb4cb..6950822846 100644
--- a/usr/src/uts/common/fs/zfs/dsl_scan.c
+++ b/usr/src/uts/common/fs/zfs/dsl_scan.c
@@ -470,6 +474,7 @@ dsl_scan_setup_sync(void *arg, dmu_tx_t *tx)
&scn->scn_phys.scn_min_txg, &scn->scn_phys.scn_max_txg)) {
spa_event_notify(spa, NULL, NULL,
ESC_ZFS_RESILVER_START);
+ scn->scn_phys.scn_func = POOL_SCAN_RESILVER;
} else {
spa_event_notify(spa, NULL, NULL, ESC_ZFS_SCRUB_START);
}
@@ -506,7 +511,8 @@ dsl_scan_setup_sync(void *arg, dmu_tx_t *tx)
spa_history_log_internal(spa, "scan setup", tx,
"func=%u mintxg=%llu maxtxg=%llu",
- *funcp, scn->scn_phys.scn_min_txg, scn->scn_phys.scn_max_txg);
+ scn->scn_phys.scn_func, scn->scn_phys.scn_min_txg,
+ scn->scn_phys.scn_max_txg);
}
/* ARGSUSED */
Unfortunately the changes didn't help. I still saw CHKSUM errors, but on another VDEV. Later I found that all CHKSUM-errors are related to blocks, that were created at the same TXG that is syncing now.
Finally I implemented the following solution:
diff --git a/usr/src/uts/common/fs/zfs/dsl_scan.c b/usr/src/uts/common/fs/zfs/dsl_scan.c
index f39afbb4cb..6950822846 100644
--- a/usr/src/uts/common/fs/zfs/dsl_scan.c
+++ b/usr/src/uts/common/fs/zfs/dsl_scan.c
@@ -428,6 +428,10 @@ dsl_scan_setup_check(void *arg, dmu_tx_t *tx)
if (dsl_scan_is_running(scn))
return (SET_ERROR(EBUSY));
+ if (dmu_tx_is_syncing(tx) && scn->scn_restart_txg != 0 &&
+ scn->scn_restart_txg > tx->tx_txg)
+ return (SET_ERROR(EAGAIN));
+
return (0);
}
With the change SCRUB started by a user will be rescheduled if a vdev is required resilvering.
With the last change my test VM was running more than 2 days without any CHKSUM errors.
Any idea, thoughts...
@tonyhutter pointed out that I may had a similar problem, first described on the mailing list. I am using CentOS 7.6 with kernel 3.10.0-957.12.1.el7.x86_64 and ZFS/SPL version 0.7.13-1
Yesterday I replaced a failed disk, with path pci-0000:02:00.1-ata-2.0. As this chassis in not hotplug, I powered off the machine and replaced the disk. On server restart, having autoreplace=on, zed immediately replaced the disk and a resilver started.
A zpool status issued just after booting shows 1.15K CKSUM error on the replaced disk - pci-0000:02:00.1-ata-2.0, which was marked as DEGRADED. This surprised me: the disk was brand new, with absolutely no zfs labels, so how/why zfs tried to access it?
A subsequent scrub found no error, but CKSUM remained at 1.15K until I issued a "zpool clear tank". It seems as if zfs tried to read from the replaced disk (during pool import?), found it "corrupted" (ie: it contains no valid data) and marked it as degraded.
I failed to replicate the issue in a VM, though; maybe it is timing sensitive (ie: there is a race condition)? Thanks.
I'm seeing very similar behavior since upgrading to 0.8.2-1. One of my zpools with <1 year old 10TB drives all reported checksum errors right after the upgrade. (No errors ever before 0.8.2-1.) I also was experiencing the restarting resilver issue. Doing some digging, I saw a tip about stopping zed. Once I stopped zed, all my problems went away.
` pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Fri Oct 25 21:28:17 2019 72.8G scanned at 9.10G/s, 64.9G issued at 8.11G/s, 29.1T total 322M resilvered, 0.22% done, 0 days 01:01:13 to go config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
ata-ST10000NM0086-2AA101_ZA28MD7E DEGRADED 0 0 415 too many errors
ata-ST10000NM0086-2AA101_ZA28MCNG DEGRADED 0 0 64 too many errors
ata-ST10000NM0086-2AA101_ZA25NSFA DEGRADED 0 0 271 too many errors
ata-ST10000NM0086-2AA101_ZA28MD0H DEGRADED 0 0 704 too many errors (resilvering)
ata-ST10000NM0086-2AA101_ZA26F2MG DEGRADED 0 0 34 too many errors (resilvering)
ata-ST10000NM0086-2AA101_ZA26E1X0 DEGRADED 0 0 52 too many errors (resilvering)
ata-ST10000NM0086-2AA101_ZA26E1SS DEGRADED 0 0 57 too many errors
ata-ST10000NM0086-2AA101_ZA28M62Z ONLINE 0 0 4
ata-ST10000NM0086-2AA101_ZA26DN39 DEGRADED 0 0 59 too many errors (resilvering)
ata-ST10000NM0086-2AA101_ZA26E2M3 DEGRADED 0 0 44 too many errors
` CentOS 7.7 kernel 3.10.0-1062.4.1.el7.x86_64.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
anti-stale-bot-post
Hi everyone.
I'm on Proxmox VE 6.3 with ZFS 0.8.5 and Kernel 5.4.73 and I think I hit the same or a similar Bug.
First i thought RAM or (brandnew) Disks are bad so i tested each component separately without any issue. So decided to test on a different server to eliminate possible errors related to the storage components but i still got errors. So i did more tests on both Servers with HDDs, SSDs and RAM Disks and all show the same problem.
~ # modprobe brd rd_nr=2 rd_size=10485760
~ # zpool create test2 mirror /dev/ram0 /dev/ram1
~ # zpool status
pool: test2
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
test2 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ram0 ONLINE 0 0 0
ram1 ONLINE 0 0 0
errors: No known data errors
~ # zfs create -V 5G -o volblocksize=8k test2/testvol
~ # dd if=/dev/urandom of=/dev/zvol/test2/testvol bs=1M status=progress
~ # zpool status
pool: test2
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
test2 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ram0 ONLINE 0 0 0
ram1 ONLINE 0 0 0
errors: No known data errors
~ # zpool scrub test2
~ # zpool detach test2 ram1
~ # zpool add test2 ram1
~ # zpool remove test2 ram0
~ # zpool attach test2 ram1 ram0
~ # zpool status
pool: test2
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: resilvered 5.11G in 0 days 00:00:15 with 0 errors on Fri Nov 27 07:57:58 2020
remove: Removal of vdev 0 copied 5.05G in 0h0m, completed on Fri Nov 27 07:57:32 2020
10.2K memory used for removed device mappings
config:
NAME STATE READ WRITE CKSUM
test2 DEGRADED 0 0 0
mirror-1 DEGRADED 0 0 0
ram1 ONLINE 0 0 0
ram0 DEGRADED 0 0 18 too many errors
errors: No known data errors
Here are the zpool events from a third test where it happened on the second try or so (not sure anymore :) ). test2_ereport.txt
I may have seen the same issue under Rocky Linux 8.4 and ZFS 2.1.0
root@zfsscratch ~]# zpool replace scratch 36000402009e01d5d415ae51700000000 36000402002e022454a69ead900000000 cannot open '/dev/mapper/36000402002e022454a69ead900000000': Device or resource busy mirror-40 ONLINE 0 0 0 36000402002e022454a69908c00000000 ONLINE 0 0 0 spare-1 ONLINE 0 0 0 36000402009e01d5d415ae51700000000 ONLINE 2 0 1 36000402002e022454a69ead900000000 ONLINE 0 0 97 ... spares ... 36000402002e022454a69ead900000000 INUSE currently in use
The resilver finished and I am now running a zpool clear && zpool scrub.
Hi,
I experienced the same checksum error report after attaching a brand new disk to an existing vpool of a single disk. The system runs Debian 11 with ZFS 2.0.6.
An automatic scrub was running when I zpool attache'd the device. I did not stopped it manually as the zpool-attach(8)
manual says:
new_device begins to resilver immediately and any running scrub is cancelled.
The brand new disk is ata-HGST_HUS728T8TALE6L4_VRH4JTEK
. Here is the command I ran (from zpool history
):
2021-11-14.02:26:53 zpool attach tank ata-TOSHIBA_HDWG480_71Q0A07SFR0H /dev/disk/by-id/ata-HGST_HUS728T8TALE6L4_VRH4JTEK
I checked the status few minutes after attaching the device, it was already reporting checksum errors:
# zpool status tank
pool: tank
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Nov 14 02:26:42 2021
1.25T scanned at 5.17G/s, 33.9G issued at 140M/s, 2.43T total
33.9G resilvered, 1.36% done, 04:59:16 to go
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-TOSHIBA_HDWG480_71Q0A07SFR0H ONLINE 0 0 0
ata-HGST_HUS728T8TALE6L4_VRH4JTEK ONLINE 0 0 2.72K (resilvering)
errors: No known data errors
The resilvering is now finished, the checksum errors did not change:
# zpool status
pool: tank
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: resilvered 2.44T in 04:17:38 with 0 errors on Sun Nov 14 06:44:20 2021
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-TOSHIBA_HDWG480_71Q0A07SFR0H ONLINE 0 0 0
ata-HGST_HUS728T8TALE6L4_VRH4JTEK ONLINE 0 0 2.72K
errors: No known data errors
The events:
Nov 14 2021 00:24:01.940729556 sysevent.fs.zfs.scrub_start
version = 0x0
class = "sysevent.fs.zfs.scrub_start"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
time = 0x61904911 0x381264d4
eid = 0xf
Nov 14 2021 00:24:01.940729556 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
history_hostname = "larco"
history_internal_str = "func=1 mintxg=0 maxtxg=307454"
history_internal_name = "scan setup"
history_txg = 0x4b0fe
history_time = 0x61904911
time = 0x61904911 0x381264d4
eid = 0x10
Nov 14 2021 02:26:35.458148864 sysevent.fs.zfs.vdev_attach
version = 0x0
class = "sysevent.fs.zfs.vdev_attach"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
vdev_guid = 0xa35d7634b2634316
vdev_state = "ONLINE" (0x7)
vdev_path = "/dev/disk/by-id/ata-HGST_HUS728T8TALE6L4_VRH4JTEK-part1"
vdev_devid = "ata-HGST_HUS728T8TALE6L4_VRH4JTEK-part1"
time = 0x619065cb 0x1b4ecc00
eid = 0x11
Nov 14 2021 02:26:42.858314108 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
history_hostname = "larco"
history_internal_str = "errors=0"
history_internal_name = "scan aborted, restarting"
history_txg = 0x4b644
history_time = 0x619065d2
time = 0x619065d2 0x3328d57c
eid = 0x12
Nov 14 2021 02:26:42.858314108 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
history_hostname = "larco"
history_internal_str = "errors=0"
history_internal_name = "starting deferred resilver"
history_txg = 0x4b644
history_time = 0x619065d2
time = 0x619065d2 0x3328d57c
eid = 0x13
Nov 14 2021 02:26:42.858314108 sysevent.fs.zfs.resilver_start
version = 0x0
class = "sysevent.fs.zfs.resilver_start"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
resilver_type = "healing"
time = 0x619065d2 0x3328d57c
history_hostname = "larco"
history_internal_str = "errors=0"
history_internal_name = "starting deferred resilver"
history_txg = 0x4b644
history_time = 0x619065d2
time = 0x619065d2 0x3328d57c
eid = 0x13
Nov 14 2021 02:26:42.858314108 sysevent.fs.zfs.resilver_start
version = 0x0
class = "sysevent.fs.zfs.resilver_start"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
resilver_type = "healing"
time = 0x619065d2 0x3328d57c
Nov 14 2021 02:26:42.858314108 sysevent.fs.zfs.resilver_start
version = 0x0
class = "sysevent.fs.zfs.resilver_start"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
resilver_type = "healing"
time = 0x619065d2 0x3328d57c
eid = 0x14
Nov 14 2021 02:26:42.858314108 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
eid = 0x14
Nov 14 2021 02:26:42.858314108 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
history_hostname = "larco"
history_internal_str = "func=2 mintxg=3 maxtxg=308804"
history_internal_name = "scan setup"
history_txg = 0x4b644
history_time = 0x619065d2
time = 0x619065d2 0x3328d57c
eid = 0x15
Nov 14 2021 02:26:46.318391389 sysevent.fs.zfs.config_sync
version = 0x0
class = "sysevent.fs.zfs.config_sync"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
time = 0x619065d6 0x12fa445d
eid = 0x16
Nov 14 2021 02:26:53.542552777 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
history_hostname = "larco"
history_internal_str = "attach vdev=/dev/disk/by-id/ata-HGST_HUS728T8TALE6L4_VRH4JTEK-part1 to vdev=/dev/disk/by-id/ata-TOSHIBA_HDWG480_71Q0A07SFR0H-part1"
history_internal_name = "vdev attach"
history_txg = 0x4b646
history_time = 0x619065dd
time = 0x619065dd 0x2056b2c9
eid = 0x17
Nov 14 2021 06:44:20.973882171 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
history_hostname = "larco"
history_internal_str = "errors=0"
history_internal_name = "scan done"
history_txg = 0x4c11f
history_time = 0x6190a234
time = 0x6190a234 0x3a0c433b
eid = 0x18
Nov 14 2021 06:44:20.973882171 sysevent.fs.zfs.resilver_finish
version = 0x0
class = "sysevent.fs.zfs.resilver_finish"
pool = "tank"
pool_guid = 0xbb90af7e6f5976c2
pool_state = 0x0
pool_context = 0x0
resilver_type = "healing"
time = 0x6190a234 0x3a0c433b
eid = 0x19
For the record, I clear'ed then scrub'ed my pool. No error reported:
# zpool status
pool: tank
state: ONLINE
scan: scrub repaired 0B in 04:02:52 with 0 errors on Mon Nov 15 04:13:53 2021
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-TOSHIBA_HDWG480_71Q0A07SFR0H ONLINE 0 0 0
ata-HGST_HUS728T8TALE6L4_VRH4JTEK ONLINE 0 0 0
errors: No known data errors
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
I too am experiencing this error. A vpool containing one disk has another disk attached to it (mirror), and both resilvering and scrubbing causes checksum errors on the new disk. I can repeat the scrub numerous times and it always finds new checksum errors every scan.
I've reseated the drive onto the backplane a few times with no change. SMART data shows absolutely nothing wrong with either disk. Previous disks connected to this backplane slot have never caused errors. Running Debian 11, ZFS 2.1.4.
pool: tank
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 16K in 00:22:01 with 0 errors on Fri Jan 6 00:07:10 2023
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
wwn-0x55cd2e404c47175f-part3 ONLINE 0 0 0
wwn-0x55cd2e404c471471-part3 DEGRADED 0 0 166 too many errors
errors: No known data errors
@c89williams rather than re-seating the disk, try zpool clear tank; zpool scrub tank
. If, after the scrub, no error are shown, then your disk is ok.
@shodanshok I have tried as you stated, clearing the error counters and rescrubbing, however checksum errors continue to occur.
@c89williams in this case, I don't think you have the same issue as described here. Have you tried rebooting your machine and re-scrubbing the pool?
System information
Describe the problem you're observing
The issue initially was found on a Nexenta's system, but can be successfully reproduced on linux. We thought the issue is a regression after integration of Sequential-resilvering, but it can be reproduced with disabled seq-resilvering, but not so quickly. So looks like this an old general issue.
There is a code, that handles non-finished resilvering, but it seems it not fully covers it. So that
scrub
can issueREAD
zios to just attached disk, that causes the CHKSUM errors.Describe how to reproduce the problem
Create 2 RAM disks (the size of each disk is 200MB)
Create a mirrored pool on the disks
Run the following 3 scripts in parallel.
And monitor the status of the pool
Include any warning/errors/backtraces from the system logs