openstreetmap / operations

OSMF Operations Working Group issue tracking
https://operations.osmfoundation.org/
98 stars 13 forks source link

Poor I/O performance on faffy #1055

Closed tomhughes closed 3 months ago

tomhughes commented 3 months ago

Something is very wrong with faffy - although there is no obvious problem in any of the visible metrics anything which does I/O is so glacially slow that the machine is essentially unusable.

Firefishy commented 3 months ago

I have set IRQBALANCE_ONESHOT=1 and it seems to have improved the abysmal performance. I am not sure why irqbalance would be causing such issues.

Firefishy commented 3 months ago

I think this is now fixed. I will close, but feel free to re-open if issue returns.

Firefishy commented 3 months ago

Still has insanely poor write performance. Seems to be linked to flush / sync.

Firefishy commented 3 months ago
$ dd if=/dev/zero of=latency.img bs=512 count=1000 oflag=dsync status=progress
163840 bytes (164 kB, 160 KiB) copied, 59 s, 2.8 kB/s
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 58.7052 s, 8.7 kB/s

oflag=dsync is normally slow, but this is extreme.

Firefishy commented 3 months ago

And while running test-kitchen via docker:

$ dd if=/dev/zero of=latency.img bs=512 count=1000 oflag=dsync status=progress
429056 bytes (429 kB, 419 KiB) copied, 838 s, 0.5 kB/s
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 838.223 s, 0.6 kB/s
Firefishy commented 3 months ago
/dev/md0:
           Version : 1.2
     Creation Time : Sat Sep 24 22:16:19 2022
        Raid Level : raid6
        Array Size : 22497017856 (20.95 TiB 23.04 TB)
     Used Dev Size : 3749502976 (3.49 TiB 3.84 TB)
      Raid Devices : 8
     Total Devices : 8
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Tue Apr 16 18:37:08 2024
             State : active
    Active Devices : 8
   Working Devices : 8
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 256K

Consistency Policy : bitmap

              Name : ubuntu-server:0
              UUID : 92fef35e:80c247eb:bd3ef6ec:45af481e
            Events : 142669

    Number   Major   Minor   RaidDevice State
      15     259       11        0      active sync   /dev/nvme3n1p2
       8     259        5        1      active sync   /dev/nvme0n1p2
      13     259       18        2      active sync   /dev/nvme7n1p2
       9     259       23        3      active sync   /dev/nvme4n1p2
      12     259       21        4      active sync   /dev/nvme5n1p2
      14     259        8        5      active sync   /dev/nvme1n1p2
      11     259       16        6      active sync   /dev/nvme6n1p2
      10     259        3        7      active sync   /dev/nvme2n1p2
Firefishy commented 3 months ago
sudo tune2fs -l /dev/md0
tune2fs 1.46.5 (30-Dec-2021)
Filesystem volume name:   <none>
Last mounted on:          /
Filesystem UUID:          63fb01c2-57dd-4097-aacf-20c1d20fd71c
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              351516672
Block count:              5624254464
Reserved block count:     56242544
Overhead clusters:        22642289
Free blocks:              3910873047
Free inodes:              323258149
First block:              0
Block size:               4096
Fragment size:            4096
Group descriptor size:    64
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         2048
Inode blocks per group:   128
RAID stride:              128
RAID stripe width:        768
Flex block group size:    16
Filesystem created:       Sat Sep 24 22:16:23 2022
Last mount time:          Tue Apr 16 11:42:54 2024
Last write time:          Tue Apr 16 11:42:51 2024
Mount count:              8
Maximum mount count:      -1
Last checked:             Tue Apr 16 04:15:24 2024
Check interval:           0 (<none>)
Lifetime writes:          87 TB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:           256
Required extra isize:     32
Desired extra isize:      32
Journal inode:            8
First orphan inode:       50004050
Default directory hash:   half_md4
Directory Hash Seed:      c26d955d-c40b-467c-84de-f16daae24f7e
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0xb79863b9
grischard commented 3 months ago

Block size: 4096 Chunk Size : 256K RAID stride: 128 RAID stripe width: 768

Firefishy commented 3 months ago

As a test I upgraded us to kernel v6.8.7, it resolves the IO performance issue!

Since we will upgrade the machine to Ubuntu 24.04 soon which comes with a v6.8.x kernel I will close this issue as resolved.

grischard commented 3 months ago

Hopefully helping anyone else with the same problem: we presume that upgrading the kernel from 6.2.0 to 6.5.0 in March is what started the problem.

The RAID settings are still not exactly in sync, causing more reads per write than necessary. Not worth looking into unless we reformat and reuse this machine for another purpose.

Firefishy commented 2 months ago

I should have posted a follow up to the changing the kernel release to a mainline release, here is the dd performance again...

$ dd if=/dev/zero of=latency.img bs=512 count=1000 oflag=dsync status=progress
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 0.275372 s, 1.9 MB/s

As can be seen the performance is back up to a perfectly acceptable level.

It would appear that something is fundamentally broken in the official linux-generic-hwe-22.04 (6.5.0) kernel.