stratis-storage / stratisd

Easy to use local storage management for Linux.
https://stratis-storage.github.io
Mozilla Public License 2.0
796 stars 55 forks source link

Problem w/ Stratis maintained xfs file system #1789

Closed mulkieran closed 4 years ago

mulkieran commented 4 years ago

Clone of https://github.com/stratis-storage/stratisd/issues/1495#issuecomment-576024509.

On centos 8, one filesystem reach 10To used and go failed. Unable to mount it. Unable to repair it. 
ddrescue make a 1To image only.
Stratis has failed to create new filesystem.
Plenty of "device-mapper: thin: Metadata device dm-3 is larger than 33292800 sectors: excess space will not be used" send to dmesg.
I suspect lvm thin metadata to get full
Please just tell me what kind of information  do you suspect and how to get it.
Thks
mulkieran commented 4 years ago

Can you show the output of lsblk?

gmat commented 4 years ago

Here is of lsblk :

# lsblk 
NAME                                                                                        MAJ:MIN RM   SIZE RO TYPE    MOUNTPOINT
sda                                                                                           8:0    0 106.9T  0 disk    
`-stratis-1-private-ebccd112f00146b284a5e9e63491f9e9-physical-originsub                     253:2    0 106.9T  0 stratis 
  |-stratis-1-private-ebccd112f00146b284a5e9e63491f9e9-flex-thinmeta                        253:3    0 109.5G  0 stratis 
  | `-stratis-1-private-ebccd112f00146b284a5e9e63491f9e9-thinpool-pool                      253:6    0 106.8T  0 stratis 
  |   |-stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-196400d7d4134017bd66a384c87b944c 253:8    0     1T  0 stratis 
  |   |-stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-6286fd5a7c084eddbe33e4ae996a2d84 253:9    0     1T  0 stratis 
  |   |-stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-2039f1ee6a684ada9fee5f33155934a8 253:10   0     1T  0 stratis 
  |   `-stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-8648cd7e9c5040cdbae6c21b18608910 253:11   0     4T  0 stratis /mnt/snap
  |-stratis-1-private-ebccd112f00146b284a5e9e63491f9e9-flex-thindata                        253:5    0 106.8T  0 stratis 
  | `-stratis-1-private-ebccd112f00146b284a5e9e63491f9e9-thinpool-pool                      253:6    0 106.8T  0 stratis 
  |   |-stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-196400d7d4134017bd66a384c87b944c 253:8    0     1T  0 stratis 
  |   |-stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-6286fd5a7c084eddbe33e4ae996a2d84 253:9    0     1T  0 stratis 
  |   |-stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-2039f1ee6a684ada9fee5f33155934a8 253:10   0     1T  0 stratis 
  |   `-stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-8648cd7e9c5040cdbae6c21b18608910 253:11   0     4T  0 stratis /mnt/snap
  `-stratis-1-private-ebccd112f00146b284a5e9e63491f9e9-flex-mdv                             253:7    0    16M  0 stratis 

and here is the output of stratis filesystem list with you can see the real sizes.

# stratis filesystem list ead1-pool 
Pool Name  Name                  Used        Created            Device                                   UUID                            
ead1-pool  home_pers_local       921.83 GiB  Oct 08 2019 16:43  /stratis/ead1-pool/home_pers_local       196400d7d4134017bd66a384c87b944c
ead1-pool  datashare             10.10 TiB   Oct 09 2019 17:46  /stratis/ead1-pool/datashare             6286fd5a7c084eddbe33e4ae996a2d84
ead1-pool  datashare-snap200118  10.10 TiB   Jan 18 2020 21:45  /stratis/ead1-pool/datashare-snap200118  2039f1ee6a684ada9fee5f33155934a8
ead1-pool  myfs                  2.00 TiB    Jan 19 2020 16:15  /stratis/ead1-pool/myfs                  8648cd7e9c5040cdbae6c21b18608910

the UUID for the failed disk is 6286fd5a7c084eddbe33e4ae996a2d84

gmat commented 4 years ago

Is it possible to repair this ?

mulkieran commented 4 years ago

Please also include output of "stratis pool list" and "stratis blockdev list". Thanks.

gmat commented 4 years ago

ok here they are thks

# stratis pool list
Name       Total Physical Size  Total Physical Used
ead1-pool           106.92 TiB            13.10 TiB
# stratis blockdev list
Pool Name  Device Node  Physical Size  State  Tier
ead1-pool  /dev/sda        106.92 TiB  InUse  Data
bgurney-rh commented 4 years ago

Can you run this command?

dmsetup status --target=thin-pool
drckeefe commented 4 years ago

What is the error message when trying to mount filesystem? Also are there any xfs error messages in the logs?

mulkieran commented 4 years ago

This may be a report of a related problem: https://github.com/stratis-storage/stratisd/issues/1495#issuecomment-577607765.

gmat commented 4 years ago

@bgurney-rh here is the output :

# dmsetup status --target=thin-pool
stratis-1-private-ebccd112f00146b284a5e9e63491f9e9-thinpool-pool: 0 229370748928 thin-pool 0 118769/4161600 13628563/111997436 - rw no_discard_passdown error_if_no_space - 1024
gmat commented 4 years ago

What is the error message when trying to mount filesystem? Also are there any xfs error messages in the logs?

here the output of mount :

# mount /dev/mapper/stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-6286fd5a7c084eddbe33e4ae996a2d84 /mnt/datashare/
mount: /mnt/datashare: can't read superblock on /dev/mapper/stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-6286fd5a7c084eddbe33e4ae996a2d84.

and in dmesg there is :

[ 4516.007291] dm-9: rw=4096, want=34359738368, limit=2147483648
[ 4516.007296] XFS (dm-9): last sector read failed
mulkieran commented 4 years ago

We have excluded any current problem with the thinpool metadata device by looking at dmsetup status.

But xfs appears to be experiencing the problem when mounting the device that it can't read the last sector that it believes it should be able to access, based on its metadata. That last sector is exactly the one at address 16 TiB.

Meanwhile lsblk shows the corresponding logical device as having a size of just 1 TiB (and ddrescue seems to be getting the same information).

Please let us know the version of stratisd and stratis-cli that you are running and also the output of dmsetup status --target=thin.

Have you at some time rebooted stratisd? Either directly or by for example, rebooting your machine?

Thanks!

gmat commented 4 years ago

We use the stratis version from the CentOS project.

# dnf list stratis*
Last metadata expiration check: 0:30:23 ago on Thu 23 Jan 2020 05:12:13 PM CET.
Installed Packages
stratis-cli.noarch                                  1.0.4-2.el8                                  @AppStream
stratisd.x86_64                                     1.0.4-2.el8                                  @AppStream

Here the dmsetup output command

# dmsetup status --target=thin
stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-8648cd7e9c5040cdbae6c21b18608910: 0 8589934592 thin 4296282112 8589871103
stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-2039f1ee6a684ada9fee5f33155934a8: 0 2147483648 thin 21680676864 34359738367
stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-196400d7d4134017bd66a384c87b944c: 0 2147483648 thin 1933221888 2147483647
stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-6286fd5a7c084eddbe33e4ae996a2d84: 0 2147483648 thin 21680676864 34359738367

This server has been rebooted many time since it failed. But it was first rebooted cause for it's failed. No access on nfs share. I'm not sure to reply well to your question. Sorry. If it's not what you expected can you precise it. Thanks

mulkieran commented 4 years ago

Ok, thanks. The stratisd version information rules out a bug regarding filesystem metadata, because that bug was fixed in version 1.0.4. Still working...

bgurney-rh commented 4 years ago

If possible, can you run "xfs_info" on the filesystem device that is normally mounted?

gmat commented 4 years ago

here is the xfs_info for the still working filesystem device :

# xfs_info /dev/mapper/stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-196400d7d4134017bd66a384c87b944c
meta-data=/dev/mapper/stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-196400d7d4134017bd66a384c87b944c isize=512    agcount=32, agsize=8388544 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=268433408, imaxpct=25
         =                       sunit=64     swidth=256 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=131072, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
bgurney-rh commented 4 years ago

Actually, I need you to run that command for the filesystem device that is not mountable.

Then, run "blockdev --getsz" on the device, to get the size in 512-byte sectors. We can then compare the actual size of the device with the size reported by the XFS metadata.

mulkieran commented 4 years ago

Other numbers and their meanings, all exact:

dmesg output:
limit = 1 TiB
want = 16 TiB

dmsetup for the problematic thin device:

start = 0
length = 1 TiB
number of mapped sectors = '10586268 MiB'
highest mapped sector = 16 TiB - 512B
gmat commented 4 years ago

@bgurney-rh ok so here are the output

# xfs_info /dev/mapper/stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-6286fd5a7c084eddbe33e4ae996a2d84 
xfs_info: error - read only 0 of 512 bytes
xfs_info: error - read only 0 of 512 bytes
xfs_info: cannot init perag data (5). Continuing anyway.
meta-data=/dev/mapper/stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-6286fd5a7c084eddbe33e4ae996a2d84 isize=512    agcount=513, agsize=8388544 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=4294967296, imaxpct=25
         =                       sunit=64     swidth=256 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=131072, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
# blockdev --getsz /dev/mapper/stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-6286fd5a7c084eddbe33e4ae996a2d84 
2147483648
bgurney-rh commented 4 years ago

Okay, so it looks like the XFS metadata reports a size of 16 TB, but the device is 1 TB (confirming what was discussed earlier).

Could you run "dmsetup table --target=thin"?

gmat commented 4 years ago

@bgurney-rh

# dmsetup table --target=thin
stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-8648cd7e9c5040cdbae6c21b18608910: 0 8589934592 thin 253:6 6
stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-2039f1ee6a684ada9fee5f33155934a8: 0 2147483648 thin 253:6 3
stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-196400d7d4134017bd66a384c87b944c: 0 2147483648 thin 253:6 1
stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-6286fd5a7c084eddbe33e4ae996a2d84: 0 2147483648 thin 253:6 2
bgurney-rh commented 4 years ago

Just to double-check, can you run xfs_info on the other three fs devices?

-stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-196400d7d4134017bd66a384c87b944c -stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-2039f1ee6a684ada9fee5f33155934a8 -stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-8648cd7e9c5040cdbae6c21b18608910

gmat commented 4 years ago
meta-data=/dev/mapper/stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-196400d7d4134017bd66a384c87b944c isize=512    agcount=32, agsize=8388544 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=268433408, imaxpct=25
         =                       sunit=64     swidth=256 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=131072, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
# xfs_info /dev/mapper/stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-2039f1ee6a684ada9fee5f33155934a8 
xfs_info: error - read only 0 of 512 bytes
xfs_info: error - read only 0 of 512 bytes
xfs_info: cannot init perag data (5). Continuing anyway.
meta-data=/dev/mapper/stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-2039f1ee6a684ada9fee5f33155934a8 isize=512    agcount=513, agsize=8388544 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=4294967296, imaxpct=25
         =                       sunit=64     swidth=256 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=131072, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

this one is consistent

# xfs_info /dev/mapper/stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-8648cd7e9c5040cdbae6c21b18608910 
meta-data=/dev/mapper/stratis-1-ebccd112f00146b284a5e9e63491f9e9-thin-fs-8648cd7e9c5040cdbae6c21b18608910 isize=512    agcount=129, agsize=8388544 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=1073741824, imaxpct=25
         =                       sunit=64     swidth=256 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=131072, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
bgurney-rh commented 4 years ago

Okay, so to summarize:

6286fd5a7c084eddbe33e4ae996a2d84: block device is 1 TB, but has XFS metadata for 16 TB 196400d7d4134017bd66a384c87b944c: block device is 1 TB, XFS metadata says 1 TB 2039f1ee6a684ada9fee5f33155934a8: block device is 1 TB, but has XFS metadata for 16 TB 8648cd7e9c5040cdbae6c21b18608910: block device is 4 TB; XFS metadata says 4 TB

gmat commented 4 years ago

yes. But just for reminder 2039f1ee6a684ada9fee5f33155934a8 is a snapshot of 6286fd5a7c084eddbe33e4ae996a2d84 created just for "test" after the outage. It could be deleted if it's easily.

bgurney-rh commented 4 years ago

Ah, okay. I think it's best to leave that device alone for now.

So for this next part, we're going to take a look at the filesystem flexmdv characteristics.

You have a device called "/dev/mapper/stratis-1-private-ebccd112f00146b284a5e9e63491f9e9-flex-mdv". Please perform the following:

  1. Run "mkdir /mnt/flexmdv" to create a directory called "flexmdv", which will be a staging directory for the flexmdv.

  2. Run "mount /dev/mapper/stratis-1-private-ebccd112f00146b284a5e9e63491f9e9-flex-mdv /mnt/flexmdv" to mount the flexmdv.

  3. Run "ls -al /mnt/flexmdv/filesystems", which will display all of the metadata files (which will have the extension ".json".)

  4. To print out the metadata strings of each file, run "grep -H . /mnt/flexmdv/filesystems/*.json"

When I try the above command on my system, I see something like this:

/mnt/flexmdv/filesystems/fb3e1dadf20046cdad1b35ae2912f2e3.json:{"name":"sfs1","uuid":"fb3e1dad-f200-46cd-ad1b-35ae2912f2e3","thin_id":0,"size":34359738368,"created":1580395183}
  1. When finished with the above steps, be sure to unmount the flex-mdv device, by running "umount /mnt/flexmdv".

With the above filesystems in this pool, I expect to see 4 json files in the "filesystems" subdirectory.

gmat commented 4 years ago

ok thanks here is the 4 files

# grep -H . /mnt/flexmdv/filesystems/*.json
/mnt/flexmdv/filesystems/196400d7d4134017bd66a384c87b944c.json:{"name":"home_pers_local","uuid":"196400d7- 413-4017-bd66-a384c87b944c","thin_id":1,"size":2147483648,"created":1570545813}
/mnt/flexmdv/filesystems/2039f1ee6a684ada9fee5f33155934a8.json:{"name":"datashare-snap200118","uuid":"2039 1ee-6a68-4ada-9fee-5f33155934a8","thin_id":3,"size":2147483648,"created":1579380346}
/mnt/flexmdv/filesystems/6286fd5a7c084eddbe33e4ae996a2d84.json:{"name":"datashare","uuid":"6286fd5a-7c08-4 dd-be33-e4ae996a2d84","thin_id":2,"size":2147483648,"created":1570635974}
/mnt/flexmdv/filesystems/8648cd7e9c5040cdbae6c21b18608910.json:{"name":"myfs","uuid":"8648cd7e-9c50-40cd-b e6-c21b18608910","thin_id":6,"size":8589934592,"created":1579446941}
bgurney-rh commented 4 years ago

Okay, so here's how to enlarge the device for the "datashare" fs (with UUID starting with "6286") to the size that the XFS metadata is reporting:

This is the current contents of the "filesystems/6286fd5a7c084eddbe33e4ae996a2d84.json" (with no end-of-line character, I believe):

{"name":"datashare","uuid":"6286fd5a-7c08-4 dd-be33-e4ae996a2d84","thin_id":2,"size":2147483648,"created":1570635974}

Change the "size" to 34359738368, to match the number of sectors corresponding to the XFS reported size (16 TB):

{"name":"datashare","uuid":"6286fd5a-7c08-4 dd-be33-e4ae996a2d84","thin_id":2,"size":34359738368,"created":1570635974}

After this, unmount the flexmdv (in the path "/mnt/flexmdv", if you're using the same path as before), and reboot the system. Check to see if the filesystem has mounted, and/or if xfs has reported anything in the kernel log.

gmat commented 4 years ago

:tada: Great ! That works fine.

There's still some warning messages

Jan 31 08:08:41 srv-ead1 kernel: device-mapper: thin: Metadata device dm-3 is larger than 33292800 sectors:
 excess space will not be used.
Jan 31 08:08:41 srv-ead1 kernel: XFS (dm-8): Mounting V5 Filesystem
Jan 31 08:08:41 srv-ead1 stratisd[1623]: INFO libstratis::engine::strat_engine::thinpool::thinpool: Attempt
ing to extend thinpool metadata sub-device belonging to pool ebccd112f00146b284a5e9e63491f9e9 by 196307584 
sectors
Jan 31 08:08:41 srv-ead1 stratisd[1623]: WARN libstratis::engine::strat_engine::thinpool::thinpool: Insuffi
cient free space available in backstore; extended thinpool metadata sub-device belonging to pool with uuid 
ebccd112f00146b284a5e9e63491f9e9 by 0 sectors, request was 196307584 sectors

dm-3 is the /dev/mapper/stratis-1-private-ebccd112f00146b284a5e9e63491f9e9-flex-thinmeta

What should I do for the snapshot ? Remove it with stratiscli directly ? Change the size before in the json ?

Many thanks :1st_place_medal:

bgurney-rh commented 4 years ago

That's good to hear!

The "excess space will not be used" message from device-mapper can be ignored; we already know about that.

The messages from libstratis is interesting, though; it appeared to try to extend the thinpool metadata sub device. I don't think it's a problem right now, though I'll make sure my colleagues know about it.

Out of an extreme abundance of caution, can you mount the flex-mdv again (using the same process as above), and check to see that the size field in "filesystems/6286fd5a7c084eddbe33e4ae996a2d84.json" still says "34359738368"?

bgurney-rh commented 4 years ago

As for the snapshot, yes, I would go ahead and remove it with the stratis CLI.

(Obviously, be sure to remove the correct one, which would be "datashare-snap200118", based on the comments above.)

gmat commented 4 years ago

ok so I remove the snapshot with the stratis destroy command. I check the json file on the partition flex-mdv the size's still 34359738368.

There's quite lot of log after this ones : ''' Jan 31 20:18:29 srv-ead1 stratisd[1623]: INFO stratisd: Dump timer expired, dumping state Jan 31 20:18:29 srv-ead1 stratisd[1623]: DEBUG stratisd: Engine state: ''' It long and verbose And I don't see error or waming in it.

About that kind of issue, do you know what could be the cause of it ? A bug in stratisd or dmsetup or something else like an hardware bug ?

bgurney-rh commented 4 years ago

Okay; good to hear that the size is still 34359738368. (Don't forget to unmount the flex-mdv now.)

So what you found is a periodic dump of the stratis engine state, once every 10 minutes. This is normal, and doesn't indicate any problems; however, in the future we anticipate tidying up this messaging into something less verbose.

drckeefe commented 4 years ago

gmat, we have been working on trying to come up with a way to reproduce this issue consistently. Do you have some thought on how we can reproduce this issue?

gmat commented 4 years ago

@drckeefe I don't know how to reproduce it.

First I believe the bug appears just after a resize. But no, the partition was 10To used many days before. It's was a Centos 8.0 with stratis from this repository. I do the update since the beginning of this issue.

drckeefe commented 4 years ago

@gmat Thank you for your response. I want to make sure that I understand your process.

  1. you used stratis from "this repository", was that version from this github repository or centos, and do you know if the version was 1.0.4?
  2. when you say that you updated, did you update stratis version or centos version?
gmat commented 4 years ago

@drckeefe I start answers by the easyest question :

  1. I've updated both Centos and Stratis. Now this is :
    # more /etc/redhat-release 
    CentOS Linux release 8.1.1911 (Core) 
  2. I use the distrib rpm version and the issue came with a older version of stratis.
    # dnf history info 13 |grep strat
    Upgrade  stratis-cli-1.0.4-2.el8.noarch                                                   @AppStream
    Upgraded stratis-cli-1.0.2-1.el8.noarch                                                   @@System
    Upgrade  stratisd-1.0.4-2.el8.x86_64                                                      @AppStream
    Upgraded stratisd-1.0.3-1.el8.x86_64                                                      @@System

    stratisd was 1.0.3 and the cli was 1.0.2

If I well understand there was a bug in versions before 1.0.4.

drckeefe commented 4 years ago

This makes sense. There was a bug in the older version of stratis, which has been fixed. I believe it was in 1.0.3 and now you are running 1.0.4. Unless anyone has other questions or comments, this issue can be closed.

mulkieran commented 4 years ago

I think we can go ahead and close this. We've resolved the problem, and are reasonably confident it was caused by a known and fixed bug.