jthornber / thin-provisioning-tools

GNU General Public License v3.0
129 stars 73 forks source link

Requiring some help recovering my pool #252

Closed deetwelve closed 1 year ago

deetwelve commented 1 year ago

Recently I had an issue where my lvm would not mount. After some long days researching this topic. I eventually stumbled upon your tools via someones blog.

I initially ran:

lvconvert --repair pve/data

All I ever get is a repeated "Manual repair required!" message. I don't think it ever does anything. However whenever I reboot the server and it runs the repair. I am greeted with the same message underneath it that "Manual repair required!" but I see this below that for about 5 seconds before it loads proxmox. The part specially that states the clean blocks.

Found volume group "pve" using metadata type lvm2
Check of pool pve/data failed (status:1). Manual repair required!
2 logical volume(s) in volume group "pve" now active
/dev/mapper/pve-root: clean, 58394/6291456 files, 5463211/25165824 blocks

I tried to make sense of this and am now stuck here.

Ran lvchange -ay pve/data_tmeta the following in readonly:

TRANSACTION_ID=10
METADATA_FREE_BLOCKS=3918308
Checking thin metadata
device details tree
device details tree: node error: checksum error AgTL5iMA, effecting keys [..]

Additional information if any of this is helpful:

fdisk -l

Disk /dev/sda: 1.36 TiB, 1499832039424 bytes, 2929359452 sectors
Disk model: LOGICAL VOLUME  
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 262144 bytes / 1310720 bytes
Disklabel type: gpt
Disk identifier: A3A2F44C-5D01-4566-9F1D-A0474AA28060

Device       Start        End    Sectors   Size Type
/dev/sda1     2560       5119       2560   1.3M BIOS boot
/dev/sda2     5120    1054719    1049600 512.5M EFI System
/dev/sda3  1054720 2929357454 2928302735   1.4T Linux filesystem

Disk /dev/sdb: 16.37 TiB, 18003355459584 bytes, 35162803632 sectors
Disk model: LOGICAL VOLUME  
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 262144 bytes / 1572864 bytes
Disklabel type: gpt
Disk identifier: 6B0CC303-8D66-4D65-90DE-196D9B7C90AA

Device       Start         End     Sectors  Size Type
/dev/sdb1       34        2047        2014 1007K BIOS boot
/dev/sdb2     2048     1050623     1048576  512M EFI System
/dev/sdb3  1050624 35162803598 35161752975 16.4T Linux LVM

Disk /dev/loop0: 673.75 MiB, 706473984 bytes, 1379832 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

pvs -a

PV         VG  Fmt  Attr PSize  PFree
  /dev/sda2           ---      0       0
  /dev/sda3           ---      0       0
  /dev/sdb2           ---      0       0
  /dev/sdb3  pve lvm2 a--  16.37t <16.38g

vgs -a

VG  #PV #LV #SN Attr   VSize  VFree

pve   1   9   0 wz--n- 16.37t <16.38g

lvs -a


  data            pve twi---tz-- <16.23t                                                  

  [data_tdata]    pve Twi------- <16.23t                                                  

  [data_tmeta]    pve ewi-------  15.81g                                                  

  [lvol0_pmspare] pve ewi-------  15.81g                                                  

  root            pve -wi-a-----  96.00g                                                  

  swap            pve -wi-a-----   8.00g                                                  

  vm-100-disk-0   pve Vwi---tz-- 500.00g data                                              

  vm-101-disk-0   pve Vwi---tz--   5.00t data                                              

  vm-102-disk-0   pve Vwi---tz-- 500.00g data                                              

  vm-103-disk-0   pve Vwi---tz--   5.00t data                                              

  vm-104-disk-0   pve Vwi---tz-- 100.00g data                                              

  vm-105-disk-0   pve Vwi---tz--  16.00t data                                              

pvscan

PV /dev/sdb3   VG pve             lvm2 [16.37 TiB / <16.38 GiB free]

  Total: 1 [16.37 TiB] / in use: 1 [16.37 TiB] / in no VG: 0 [0   ]

lvscan

inactive          '/dev/pve/data' [<16.23 TiB] inherit

  ACTIVE            '/dev/pve/swap' [8.00 GiB] inherit

  ACTIVE            '/dev/pve/root' [96.00 GiB] inherit

  inactive          '/dev/pve/vm-100-disk-0' [500.00 GiB] inherit

  inactive          '/dev/pve/vm-101-disk-0' [5.00 TiB] inherit

  inactive          '/dev/pve/vm-102-disk-0' [500.00 GiB] inherit

  inactive          '/dev/pve/vm-103-disk-0' [5.00 TiB] inherit

  inactive          '/dev/pve/vm-104-disk-0' [100.00 GiB] inherit

  inactive          '/dev/pve/vm-105-disk-0' [16.00 TiB] inherit

vgdisplay

--- Volume group ---

  VG Name               pve

  System ID            

  Format                lvm2

  Metadata Areas        1

  Metadata Sequence No  31

  VG Access             read/write

  VG Status             resizable

  MAX LV                0

  Cur LV                9

  Open LV               0

  Max PV                0

  Cur PV                1

  Act PV                1

  VG Size               16.37 TiB

  PE Size               4.00 MiB

  Total PE              4292205

  Alloc PE / Size       4288013 / <16.36 TiB

  Free  PE / Size       4192 / <16.38 GiB

  VG UUID               FwXD4j-Y1q4-DyV0-i34b-qGSw-zwAO-2aHh2f

Thank you very much for your time just reading this and also any help. It's much appreciated.

jthornber commented 1 year ago

If you pack the metadata with thin_metadata_pack from the latest thinp tools I'll take a look.

On Thu, 2 Mar 2023 at 09:26, deetwelve @.***> wrote:

Recently I had an issue where my lvm would not mount. After some long days researching this topic. I eventually stumbled upon your tools via someones blog.

I initially ran:

lvconvert --repair pve/data

All I ever get is a repeated "Manual repair required!" message. I don't think it ever does anything. However whenever I reboot the server and it runs the repair. I am greeted with the same message underneath it that "Manual repair required!" but I see this below that for about 5 seconds before it loads proxmox. The part specially that states the clean blocks.

Found volume group "pve" using metadata type lvm2 Check of pool pve/data failed (status:1). Manual repair required! 2 logical volume(s) in volume group "pve" now active /dev/mapper/pve-root: clean, 58394/6291456 files, 5463211/25165824 blocks

I tried to make sense of this and am now stuck here.

Ran lvchange -ay pve/data_tmeta the following in readonly:

TRANSACTION_ID=10 METADATA_FREE_BLOCKS=3918308 Checking thin metadata device details tree device details tree: node error: checksum error AgTL5iMA, effecting keys [..]

Additional information if any of this is helpful:

fdisk -l

Disk /dev/sda: 1.36 TiB, 1499832039424 bytes, 2929359452 sectors Disk model: LOGICAL VOLUME Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 262144 bytes / 1310720 bytes Disklabel type: gpt Disk identifier: A3A2F44C-5D01-4566-9F1D-A0474AA28060

Device Start End Sectors Size Type /dev/sda1 2560 5119 2560 1.3M BIOS boot /dev/sda2 5120 1054719 1049600 512.5M EFI System /dev/sda3 1054720 2929357454 2928302735 1.4T Linux filesystem

Disk /dev/sdb: 16.37 TiB, 18003355459584 bytes, 35162803632 sectors Disk model: LOGICAL VOLUME Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 262144 bytes / 1572864 bytes Disklabel type: gpt Disk identifier: 6B0CC303-8D66-4D65-90DE-196D9B7C90AA

Device Start End Sectors Size Type /dev/sdb1 34 2047 2014 1007K BIOS boot /dev/sdb2 2048 1050623 1048576 512M EFI System /dev/sdb3 1050624 35162803598 35161752975 16.4T Linux LVM

Disk /dev/loop0: 673.75 MiB, 706473984 bytes, 1379832 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes

pvs -a

PV VG Fmt Attr PSize PFree /dev/sda2 --- 0 0 /dev/sda3 --- 0 0 /dev/sdb2 --- 0 0 /dev/sdb3 pve lvm2 a-- 16.37t <16.38g

vgs -a

VG #PV #LV #SN Attr VSize VFree

pve 1 9 0 wz--n- 16.37t <16.38g

lvs -a

data pve twi---tz-- <16.23t

[data_tdata] pve Twi------- <16.23t

[data_tmeta] pve ewi------- 15.81g

[lvol0_pmspare] pve ewi------- 15.81g

root pve -wi-a----- 96.00g

swap pve -wi-a----- 8.00g

vm-100-disk-0 pve Vwi---tz-- 500.00g data

vm-101-disk-0 pve Vwi---tz-- 5.00t data

vm-102-disk-0 pve Vwi---tz-- 500.00g data

vm-103-disk-0 pve Vwi---tz-- 5.00t data

vm-104-disk-0 pve Vwi---tz-- 100.00g data

vm-105-disk-0 pve Vwi---tz-- 16.00t data

pvscan

PV /dev/sdb3 VG pve lvm2 [16.37 TiB / <16.38 GiB free]

Total: 1 [16.37 TiB] / in use: 1 [16.37 TiB] / in no VG: 0 [0 ]

lvscan

inactive '/dev/pve/data' [<16.23 TiB] inherit

ACTIVE '/dev/pve/swap' [8.00 GiB] inherit

ACTIVE '/dev/pve/root' [96.00 GiB] inherit

inactive '/dev/pve/vm-100-disk-0' [500.00 GiB] inherit

inactive '/dev/pve/vm-101-disk-0' [5.00 TiB] inherit

inactive '/dev/pve/vm-102-disk-0' [500.00 GiB] inherit

inactive '/dev/pve/vm-103-disk-0' [5.00 TiB] inherit

inactive '/dev/pve/vm-104-disk-0' [100.00 GiB] inherit

inactive '/dev/pve/vm-105-disk-0' [16.00 TiB] inherit

vgdisplay

--- Volume group ---

VG Name pve

System ID

Format lvm2

Metadata Areas 1

Metadata Sequence No 31

VG Access read/write

VG Status resizable

MAX LV 0

Cur LV 9

Open LV 0

Max PV 0

Cur PV 1

Act PV 1

VG Size 16.37 TiB

PE Size 4.00 MiB

Total PE 4292205

Alloc PE / Size 4288013 / <16.36 TiB

Free PE / Size 4192 / <16.38 GiB

VG UUID FwXD4j-Y1q4-DyV0-i34b-qGSw-zwAO-2aHh2f

Thank you very much for your time just reading this and also any help. It's much appreciated.

— Reply to this email directly, view it on GitHub https://github.com/jthornber/thin-provisioning-tools/issues/252, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABOSQ6QLQEDVA5DDV6WOA3W2BRV3ANCNFSM6AAAAAAVNDY5OY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

deetwelve commented 1 year ago

I have the latest tools you have here currently compiled as of 30 minutes ago. I am unsure what I would type with the thin_metadata_pack command and would I reactivate the metadata in ready only when I type this?

jthornber commented 1 year ago

lvchange --activationmode partial /dev//

On Thu, 2 Mar 2023 at 09:36, deetwelve @.***> wrote:

I have the latest tools you have here currently compiled as of 30 minutes ago. I am unsure what I would type with the thin_metadata_pack command and would I reactivate the metadata in ready only when I type this?

— Reply to this email directly, view it on GitHub https://github.com/jthornber/thin-provisioning-tools/issues/252#issuecomment-1451571008, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABOSQZJYSANJBRVLPXHXXDW2BSZVANCNFSM6AAAAAAVNDY5OY . You are receiving this because you commented.Message ID: @.***>

deetwelve commented 1 year ago

When I run the following command I get the following message:

root@pve:~# lvchange --activationmode partial /dev/pve/data
  No command with matching syntax recognised.  Run 'lvchange --help' for more information.
mingnus commented 1 year ago

I believe you've already activated the pool metadata via lvchange -ay pve/data_tmeta, so just move forward to thin_metadata_pack:

thin_metadata_pack -i /dev/pve/data_tmeta -o <your_backup_directory>/tmeta.pack
deetwelve commented 1 year ago

Is this file suppose to be huge? It's growing to over a gig and I stopped it.

mingnus commented 1 year ago

Possibly since you have terabytes of data in pool so it consumes lots of metadata space.

deetwelve commented 1 year ago

Sorry, it wasn't over a gig. I was looking at the wrong file. It's about 224MB. I am unsure if you want this file uploaded any specific place but I uploaded it to Google Drive if that suffices. If you rather have a specific place. Let me know I can reupload it.

https://drive.google.com/file/d/1McNp5qoV_ZPSULnl854WkI4zgE6lsW-t/view?usp=sharing

Thank you again for taking your time to help.

jthornber commented 1 year ago

The latest version (v1.0.2) of thin_repair on your metadata completes successfully. Finding these volumes:

@.*** support-issue]$ thin_ls metadata-repaired.bin DEV MAPPED CREATE_TIME SNAP_TIME 1 42GiB 0 0 2 1329GiB 0 0 3 45GiB 0 1 4 1457GiB 0 0 5 7031MiB 1 1 6 1572GiB 1 1

So I would make sure you have the latest tools installed, and follow the instructions in 'man lvmthin'.

On Thu, 2 Mar 2023 at 11:20, deetwelve @.***> wrote:

Sorry, it wasn't over a gig. I was looking at the wrong file. It's about 224MB. I am unsure if you want this file uploaded any specific place but I uploaded it to Google Drive if that suffices. If you rather have a specific place. Let me know I can reupload it.

https://drive.google.com/file/d/1McNp5qoV_ZPSULnl854WkI4zgE6lsW-t/view?usp=sharing

Thank you again for taking your time to help.

— Reply to this email directly, view it on GitHub https://github.com/jthornber/thin-provisioning-tools/issues/252#issuecomment-1451709118, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABOSQ2PKINAZ5IAHZXRRD3W2B66XANCNFSM6AAAAAAVNDY5OY . You are receiving this because you commented.Message ID: @.***>

deetwelve commented 1 year ago

I had a read over this but I am unsure I really understand this. I wont be creating a new volume or anything correct? I will just be repairing what I have there.

My actual systems thin_repair is version 0.9.0 however the thin_repair on your tools is 1.0.2. I am assuming I would be using your tools thin_repair.

The help documents of your tools state:

thin_repair [OPTIONS] --input <FILE> --output <FILE>

I am unsure what input I would use.

mingnus commented 1 year ago

The easiest way should be, upgrade the thin_repair binary on your actual system (replace /sbin/pdata_tools or /usr/sbin/pdata_tools by your own build), then let lvconvert --repair handle the rest of the works.

If you don't want to upgrade thin_repair, here's the alternative steps:

  1. Create a new metadata volume

    lvcreate pve  --name data_meta0 --size 15g
  2. Repair the original metadata onto the new volume (and it's the output for thin_repair)

    thin_repair -i /dev/pve/data_tmeta -o /dev/pve/data_meta0
  3. Swap-in the repaired metadata

    lvconvert --thinpool pve/data --poolmetadata pve/data_meta0

The original metadata will be 'renamed' to data_meta0 after running lvconvert --poolmetadata. It's suggested to keep it as a backup.

deetwelve commented 1 year ago

I have tried your initial suggestion of replacing /usr/sbin/pdata_tools with the one I built. When I run thin_repair --version I can see the 1.0.2 version now.

The issue I have however is when I run

lvconvert --repair pve/data

I get the following error:

error: Found argument '' which wasn't expected, or isn't valid in this context

USAGE:
    thin_repair [OPTIONS] --input <FILE> --output <FILE>

For more information try --help
  Repair of thin metadata volume of thin pool pve/data failed (status:2). Manual repair required!

I then tried to do the manually step. But when I run the thin_repair command. I am met with that same error.

mingnus commented 1 year ago

Odd. Looks like the program is receiving an empty string in its argument list, or the clap parser doesn't work. Could you please try running the program directly without symlink:

/usr/sbin/pdata_tools thin_repair -i /dev/pve/data_tmeta -o /dev/pve/data_meta0
deetwelve commented 1 year ago

Activating the tmeta in read-only.

root@pve:~# lvchange -ay pve/data_tmeta
Do you want to activate component LV in read-only mode? [y/n]: y
  Allowing activation of component LV.

Then running the above command:

root@pve:~# /usr/sbin/pdata_tools thin_repair -i /dev/pve/data_tmeta -o /dev/pve/data_meta0
stat failed
jthornber commented 1 year ago

Is data_meta0 activated? If so could you send us the output of:

strace /usr/sbin/pdata_tools thin_repair -i /dev/pve/data_tmeta -o /dev/pve/data_meta0

On Fri, 3 Mar 2023 at 08:04, deetwelve @.***> wrote:

Activating the data in read-only.

@.***:~# lvchange -ay pve/data_tmeta Do you want to activate component LV in read-only mode? [y/n]: y Allowing activation of component LV.

Then running the above command:

@.***:~# /usr/sbin/pdata_tools thin_repair -i /dev/pve/data_tmeta -o /dev/pve/data_meta0 stat failed

— Reply to this email directly, view it on GitHub https://github.com/jthornber/thin-provisioning-tools/issues/252#issuecomment-1453129147, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABOSQ5SOCADUG72RW2OVT3W2GQY3ANCNFSM6AAAAAAVNDY5OY . You are receiving this because you commented.Message ID: @.***>

deetwelve commented 1 year ago

You know what I am sorry. I didn't even create the new pve volume. I was thinking I was doing the automatic repair versus the manual. I have now created the pve volume, reactivated the read only and now running the manual thin_repair with the new 1.0.2. It seems to be doing something now.

Will follow up with the outcome here soon as it's completed.

jthornber commented 1 year ago

we will improve the error message :)

On Fri, 3 Mar 2023 at 09:55, deetwelve @.***> wrote:

You know what I am sorry. I didn't even create the pve.

— Reply to this email directly, view it on GitHub https://github.com/jthornber/thin-provisioning-tools/issues/252#issuecomment-1453262597, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABOSQ7OXLD3GZNGAXIO3FDW2G5Z5ANCNFSM6AAAAAAVNDY5OY . You are receiving this because you commented.Message ID: @.***>

deetwelve commented 1 year ago

You both are miracles. That did it and fixed my issue. I am now seeing my old data and volumes come up. Thank you so much. You don't know how much lost sleep I had over this and how bummed I was.

mingnus commented 1 year ago

I would like to know why thin_repair doesn't work under the symlnk. Is /usr/sbin/thin_repair a symlink topdata_tools? Also, could you please provide the strace logs?

strace thin_repair -i foo -o bar

(I expect the issue could be reproduced with any filename, so just use "foo" and "bar")

deetwelve commented 1 year ago

Yes, I have symlinks to pdata_tools. Here is the logs for both of those:

strace /usr/sbin/pdata_tools thin_repair -i /dev/pve/data_tmeta -o /dev/pve/data_meta0 https://termbin.com/h8e9f

strace thin_repair -i foo -o bar https://termbin.com/qqa78

mingnus commented 1 year ago

Okay, I found the compatibility issue between lvconvert and the new thin_repair. Will fix it in the next release. Thanks for your feedback.

deetwelve commented 1 year ago

You're welcome and thank you for all your help and time as well jthornber. It's very much appreciated.