Open robbat2 opened 2 months ago
The panic seems to be from spa_log_summary_dirty_flushed_metaslab() unable to find log summary entry for transaction group metaslab belongs to. Messages about missing/duplicate segments make me think there may be some mess in what spacemap logs are replay and which are not, but it was a while since I looked there last, so no idea what could cause it other than mentioned non-ECC RAM, etc.
If rollback by few transactions does not help, then you may have to import the pool read-only to evacuate the data, since there is no other way to recover spacemaps.
Thanks. I did try the rollback unsuccessfully, to the entries in the uberblock
1. Is there some other data you'd like for bug analysis?
2.
Assuming I can scrounge up another system with enough RAM; will a zfs send | zfs recv
combination also be affected by the spacemap problem? I'd like to keep the snapshots.
This pool normally runs on my desktop w/ 128GB RAM; and makes good use of dedup; I tried to recv on another handy system with 8GB of RAM and it just ground to a halt and OOM'd.
Assuming I can scrounge up another system with enough RAM; will a
zfs send | zfs recv
combination also be affected by the spacemap problem? I'd like to keep the snapshots.
You won't be able to create a new snapshots on read-only pool, but previous should replicate just fine.
@robbat2 if your dedup tables can fit into 60gb of ram, run two VMs and send/recv between them
The data was recovered; I did zfs send
to (two) external devices; and then did zfs recv
from one of them.
Last call if the devs feel they want any reproduction data from the buggy pool before I delete it outright (will do the deletion in a few days).
Isn't there further analysis possible that the pool can be in a new consistent state to be imported correctly and some possible bugs can be fixed?
The buggy pool has been deleted now because I needed that 3+TB back.
System information
also reproduced on zfs-2.2.99-684_g73866cf34
zfs-kmod-2.2.99-683_g6be8bf555
Describe the problem you're observing
zpool import $POOL
fails; includes a trackback (to follow in next paste)BUG: kernel NULL pointer dereference, address: 0000000000000020
duringzpool import
Describe how to reproduce the problem
I don't have a reproduction case for another system.
Include any warning/errors/backtraces from the system logs
other traces
With the zpool imported read-only