openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.53k stars 1.74k forks source link

Large kmem_alloc(127104, 0x1000) #15043

Open vherrlein opened 1 year ago

vherrlein commented 1 year ago

System information

Type Version/Name
Distribution Name Proxmox VE (Debian)
Distribution Version 8.0 (bookworm)
Kernel Version 6.2
Architecture x68_64
OpenZFS Version 2.1.12
> lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  24
  On-line CPU(s) list:   0-23
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Intel
  Model name:            Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
    BIOS Model name:           Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz  CPU @ 2.6GHz
    BIOS CPU family:     179
    CPU family:          6
    Model:               62
    Thread(s) per core:  2
    Core(s) per socket:  6
    Socket(s):           2
    Stepping:            4
    CPU(s) scaling MHz:  55%
    CPU max MHz:         3100.0000
    CPU min MHz:         1200.0000
    BogoMIPS:            5199.98
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc
                         arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_                         1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase sm                         ep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d
Virtualization features:
  Virtualization:        VT-x
Caches (sum of all):
  L1d:                   384 KiB (12 instances)
  L1i:                   384 KiB (12 instances)
  L2:                    3 MiB (12 instances)
  L3:                    30 MiB (2 instances)
NUMA:
  NUMA node(s):          2
  NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22
  NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23
Vulnerabilities:
  Itlb multihit:         KVM: Mitigation: Split huge pages
  L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                   Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Unknown: No mitigations
  Retbleed:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

Describe the problem you're observing

A soon as I try to bench ot upload large amount of data into a pool, the CPU hang then some errors are coming up from syslog. The following error is repeated for each CPU Core (in my case 24 times) Even if I try to set sync=disabled to bypass ZIL/ZVOL from the SLOG device of the target pool, that error continue.

Describe how to reproduce the problem

  1. Make a RAIDZ1 pool with default settings and add a SLOG device
  2. Make a mount point for the new pool
  3. Upload a large file by sftp from a remote machine into that new mount point

Include any warning/errors/backtraces from the system logs

Large kmem_alloc(127104, 0x1000), please file an issue at: https://github.com/openzfs/zfs/issues/new
CPU: 7 PID: 3053944 Comm: zvol Tainted: P        W  O       6.2.16-3-pve #1
Hardware name: Dell Inc. PowerEdge R720/0HJK12, BIOS 2.9.0 12/06/2019
Call Trace:
 <TASK>
 dump_stack_lvl+0x48/0x70
 dump_stack+0x10/0x20
 spl_kmem_zalloc+0xfd/0x120 [spl]
 dmu_buf_hold_array_by_dnode+0x8b/0x6c0 [zfs]
 dmu_write_uio_dnode+0x5c/0x1a0 [zfs]
 zvol_write.isra.0+0x1a4/0x470 [zfs]
 zvol_write_task+0x19/0x40 [zfs]
 taskq_thread+0x2af/0x4d0 [spl]
 ? __pfx_default_wake_function+0x10/0x10
 ? __pfx_zvol_write_task+0x10/0x10 [zfs]
 ? __pfx_taskq_thread+0x10/0x10 [spl]
 kthread+0xe9/0x110
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2c/0x50
 </TASK>
rincebrain commented 1 year ago

You didn't include any of the information from the template you deleted, which would be necessary to do anything useful with what you posted.

vherrlein commented 1 year ago

You didn't include any of the information from the template you deleted, which would be necessary to do anything useful with what you posted.

I didn't deleted the template, I used the link https://github.com/openzfs/zfs/issues/new from the error message, and unfortunatelly, no template.

The inital post has been updated accordingly :)

rincebrain commented 1 year ago

Cute, I'll figure out how that needs to change to include that.

rincebrain commented 1 year ago

So, you said uploading a large file to a raidz1 pool with default settings, but the backtrace there is for a zvol, not a filesystem, so there seems to be some additional step missing?

Can you share the zpool list -v output and zpool get all output, and the zfs get all output from the dataset this is on?

vherrlein commented 1 year ago

I shorted steps, of course you need to create a mountpoint, formating the FS and so on. In addintion, that error is also happening from a virtual machine (qemu/KVM).

> zpool list -v
NAME                                              SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
tank                                             2.72T  1.10T  1.62T        -         -    16%    40%  1.00x    ONLINE  -
  raidz1-0                                       2.72T  1.10T  1.62T        -         -    16%  40.6%      -    ONLINE
    ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T336445Z   932G      -      -        -         -      -      -      -    ONLINE
    ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T354524K   932G      -      -        -         -      -      -      -    ONLINE
    ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T336369R   932G      -      -        -         -      -      -      -    ONLINE
logs                                                 -      -      -        -         -      -      -      -  -
  sde                                             932G   632K   928G        -         -     0%  0.00%      -    ONLINE
> zpool get all
NAME  PROPERTY                       VALUE                          SOURCE
tank  size                           2.72T                          -
tank  capacity                       40%                            -
tank  altroot                        -                              default
tank  health                         ONLINE                         -
tank  guid                           14348373411064071793           -
tank  version                        -                              default
tank  bootfs                         -                              default
tank  delegation                     on                             default
tank  autoreplace                    off                            default
tank  cachefile                      -                              default
tank  failmode                       wait                           default
tank  listsnapshots                  off                            default
tank  autoexpand                     off                            default
tank  dedupratio                     1.00x                          -
tank  free                           1.62T                          -
tank  allocated                      1.10T                          -
tank  readonly                       off                            -
tank  ashift                         12                             local
tank  comment                        -                              default
tank  expandsize                     -                              -
tank  freeing                        0                              -
tank  fragmentation                  16%                            -
tank  leaked                         0                              -
tank  multihost                      off                            default
tank  checkpoint                     -                              -
tank  load_guid                      5437561673175508623            -
tank  autotrim                       on                             local
tank  compatibility                  off                            default
tank  feature@async_destroy          enabled                        local
tank  feature@empty_bpobj            active                         local
tank  feature@lz4_compress           active                         local
tank  feature@multi_vdev_crash_dump  enabled                        local
tank  feature@spacemap_histogram     active                         local
tank  feature@enabled_txg            active                         local
tank  feature@hole_birth             active                         local
tank  feature@extensible_dataset     active                         local
tank  feature@embedded_data          active                         local
tank  feature@bookmarks              enabled                        local
tank  feature@filesystem_limits      enabled                        local
tank  feature@large_blocks           enabled                        local
tank  feature@large_dnode            enabled                        local
tank  feature@sha512                 enabled                        local
tank  feature@skein                  enabled                        local
tank  feature@edonr                  enabled                        local
tank  feature@userobj_accounting     active                         local
tank  feature@encryption             enabled                        local
tank  feature@project_quota          active                         local
tank  feature@device_removal         enabled                        local
tank  feature@obsolete_counts        enabled                        local
tank  feature@zpool_checkpoint       enabled                        local
tank  feature@spacemap_v2            active                         local
tank  feature@allocation_classes     enabled                        local
tank  feature@resilver_defer         enabled                        local
tank  feature@bookmark_v2            enabled                        local
tank  feature@redaction_bookmarks    enabled                        local
tank  feature@redacted_datasets      enabled                        local
tank  feature@bookmark_written       enabled                        local
tank  feature@log_spacemap           active                         local
tank  feature@livelist               enabled                        local
tank  feature@device_rebuild         enabled                        local
tank  feature@zstd_compress          enabled                        local
tank  feature@draid                  enabled                        local
> zfs get all
NAME                PROPERTY              VALUE                  SOURCE
tank                type                  filesystem             -
tank                creation              Thu Oct 13 20:40 2022  -
tank                used                  1.72T                  -
tank                available             36.9G                  -
tank                referenced            128K                   -
tank                compressratio         1.12x                  -
tank                mounted               yes                    -
tank                quota                 none                   default
tank                reservation           none                   default
tank                recordsize            128K                   default
tank                mountpoint            /tank                  default
tank                sharenfs              off                    default
tank                checksum              on                     default
tank                compression           on                     local
tank                atime                 on                     default
tank                devices               on                     default
tank                exec                  on                     default
tank                setuid                on                     default
tank                readonly              off                    default
tank                zoned                 off                    default
tank                snapdir               hidden                 default
tank                aclmode               discard                default
tank                aclinherit            restricted             default
tank                createtxg             1                      -
tank                canmount              on                     default
tank                xattr                 on                     default
tank                copies                1                      default
tank                version               5                      -
tank                utf8only              off                    -
tank                normalization         none                   -
tank                casesensitivity       sensitive              -
tank                vscan                 off                    default
tank                nbmand                off                    default
tank                sharesmb              off                    default
tank                refquota              none                   default
tank                refreservation        none                   default
tank                guid                  5638437393970797065    -
tank                primarycache          all                    default
tank                secondarycache        all                    default
tank                usedbysnapshots       0B                     -
tank                usedbydataset         128K                   -
tank                usedbychildren        1.72T                  -
tank                usedbyrefreservation  0B                     -
tank                logbias               latency                default
tank                objsetid              54                     -
tank                dedup                 off                    default
tank                mlslabel              none                   default
tank                sync                  standard               default
tank                dnodesize             legacy                 default
tank                refcompressratio      1.00x                  -
tank                written               128K                   -
tank                logicalused           461G                   -
tank                logicalreferenced     42K                    -
tank                volmode               default                default
tank                filesystem_limit      none                   default
tank                snapshot_limit        none                   default
tank                filesystem_count      none                   default
tank                snapshot_count        none                   default
tank                snapdev               hidden                 default
tank                acltype               off                    default
tank                context               none                   default
tank                fscontext             none                   default
tank                defcontext            none                   default
tank                rootcontext           none                   default
tank                relatime              off                    default
tank                redundant_metadata    all                    default
tank                overlay               on                     default
tank                encryption            off                    default
tank                keylocation           none                   default
tank                keyformat             none                   default
tank                pbkdf2iters           0                      default
tank                special_small_blocks  0                      default
tank/backups        type                  filesystem             -
tank/backups        creation              Fri Feb  3  9:48 2023  -
tank/backups        used                  176K                   -
tank/backups        available             36.9G                  -
tank/backups        referenced            176K                   -
tank/backups        compressratio         1.00x                  -
tank/backups        mounted               yes                    -
tank/backups        quota                 none                   default
tank/backups        reservation           none                   default
tank/backups        recordsize            128K                   default
tank/backups        mountpoint            /mnt/tank-backups      local
tank/backups        sharenfs              off                    default
tank/backups        checksum              on                     default
tank/backups        compression           on                     inherited from tank
tank/backups        atime                 on                     default
tank/backups        devices               on                     default
tank/backups        exec                  on                     default
tank/backups        setuid                on                     default
tank/backups        readonly              off                    default
tank/backups        zoned                 off                    default
tank/backups        snapdir               hidden                 default
tank/backups        aclmode               discard                default
tank/backups        aclinherit            restricted             default
tank/backups        createtxg             5620537                -
tank/backups        canmount              on                     default
tank/backups        xattr                 on                     default
tank/backups        copies                1                      default
tank/backups        version               5                      -
tank/backups        utf8only              off                    -
tank/backups        normalization         none                   -
tank/backups        casesensitivity       sensitive              -
tank/backups        vscan                 off                    default
tank/backups        nbmand                off                    default
tank/backups        sharesmb              off                    default
tank/backups        refquota              none                   default
tank/backups        refreservation        none                   default
tank/backups        guid                  7191816376809622973    -
tank/backups        primarycache          all                    default
tank/backups        secondarycache        all                    default
tank/backups        usedbysnapshots       0B                     -
tank/backups        usedbydataset         176K                   -
tank/backups        usedbychildren        0B                     -
tank/backups        usedbyrefreservation  0B                     -
tank/backups        logbias               latency                default
tank/backups        objsetid              47085                  -
tank/backups        dedup                 off                    default
tank/backups        mlslabel              none                   default
tank/backups        sync                  standard               default
tank/backups        dnodesize             legacy                 default
tank/backups        refcompressratio      1.00x                  -
tank/backups        written               176K                   -
tank/backups        logicalused           59.5K                  -
tank/backups        logicalreferenced     59.5K                  -
tank/backups        volmode               default                default
tank/backups        filesystem_limit      none                   default
tank/backups        snapshot_limit        none                   default
tank/backups        filesystem_count      none                   default
tank/backups        snapshot_count        none                   default
tank/backups        snapdev               hidden                 default
tank/backups        acltype               off                    default
tank/backups        context               none                   default
tank/backups        fscontext             none                   default
tank/backups        defcontext            none                   default
tank/backups        rootcontext           none                   default
tank/backups        relatime              off                    default
tank/backups        redundant_metadata    all                    default
tank/backups        overlay               on                     default
tank/backups        encryption            off                    default
tank/backups        keylocation           none                   default
tank/backups        keyformat             none                   default
tank/backups        pbkdf2iters           0                      default
tank/backups        special_small_blocks  0                      default
rincebrain commented 1 year ago

tank/backups isn't a zvol, so it's not what's triggering that backtrace...

Is there anything else on the pool that's a volume?

vherrlein commented 1 year ago

Only VM volumes, nothing else

> zfs list
NAME                 USED  AVAIL     REFER  MOUNTPOINT
tank                1.72T  36.9G      128K  /tank
tank/backups         176K  36.9G      176K  /mnt/tank-backups
tank/test           13.6G  37.7G     12.9G  -
tank/vm-101-disk-0  43.7G  64.6G     16.0G  -
tank/vm-101-disk-1  3.33M  36.9G      117K  -
tank/vm-105-disk-0  3.33M  36.9G      149K  -
tank/vm-105-disk-1  7.33M  36.9G     90.6K  -
tank/vm-105-disk-2   349G   311G     75.3G  -
tank/vm-106-disk-0  13.2M  36.9G     1.12M  -
tank/vm-106-disk-1  46.7M  37.0G      170K  -
tank/vm-106-disk-2   357G   228G      166G  -
tank/vm-106-disk-3  89.3G  57.4G     68.9G  -
tank/vm-202-disk-0  3.33M  36.9G      192K  -
tank/vm-202-disk-1  7.33M  36.9G     85.2K  -
tank/vm-202-disk-2  87.3G  58.5G     65.8G  -
tank/vm-202-disk-3   819G   508G      347G  -
vherrlein commented 1 year ago

The ZVOLs available :

>ls -Rlah /dev/zvol/
/dev/zvol/:
total 0
drwxr-xr-x  3 root root   60 Jul  8 01:55 .
drwxr-xr-x 21 root root 5.3K Jul  9 17:47 ..
drwxr-xr-x  2 root root  620 Jul  9 17:47 tank

/dev/zvol/tank:
total 0
drwxr-xr-x 2 root root 620 Jul  9 17:47 .
drwxr-xr-x 3 root root  60 Jul  8 01:55 ..
lrwxrwxrwx 1 root root  10 Jul  8 01:55 test -> ../../zd48
lrwxrwxrwx 1 root root  10 Jul  9 17:47 vm-101-disk-0 -> ../../zd80
lrwxrwxrwx 1 root root  12 Jul  9 17:47 vm-101-disk-0-part1 -> ../../zd80p1
lrwxrwxrwx 1 root root  12 Jul  9 17:47 vm-101-disk-0-part2 -> ../../zd80p2
lrwxrwxrwx 1 root root  12 Jul  9 17:47 vm-101-disk-0-part3 -> ../../zd80p3
lrwxrwxrwx 1 root root  11 Jul  8 01:55 vm-101-disk-1 -> ../../zd144
lrwxrwxrwx 1 root root  10 Jul  8 01:55 vm-105-disk-0 -> ../../zd16
lrwxrwxrwx 1 root root  11 Jul  8 01:55 vm-105-disk-1 -> ../../zd128
lrwxrwxrwx 1 root root  11 Jul  8 01:55 vm-105-disk-2 -> ../../zd176
lrwxrwxrwx 1 root root  13 Jul  8 01:55 vm-105-disk-2-part1 -> ../../zd176p1
lrwxrwxrwx 1 root root  13 Jul  8 01:55 vm-105-disk-2-part2 -> ../../zd176p2
lrwxrwxrwx 1 root root  13 Jul  8 01:55 vm-105-disk-2-part3 -> ../../zd176p3
lrwxrwxrwx 1 root root  13 Jul  8 01:55 vm-105-disk-2-part4 -> ../../zd176p4
lrwxrwxrwx 1 root root  11 Jul  9 17:30 vm-106-disk-0 -> ../../zd160
lrwxrwxrwx 1 root root  10 Jul  9 17:33 vm-106-disk-1 -> ../../zd64
lrwxrwxrwx 1 root root   9 Jul  9 17:30 vm-106-disk-2 -> ../../zd0
lrwxrwxrwx 1 root root  11 Jul  9 17:30 vm-106-disk-2-part1 -> ../../zd0p1
lrwxrwxrwx 1 root root  11 Jul  9 17:30 vm-106-disk-2-part2 -> ../../zd0p2
lrwxrwxrwx 1 root root  11 Jul  9 17:30 vm-106-disk-2-part3 -> ../../zd0p3
lrwxrwxrwx 1 root root  11 Jul  9 17:30 vm-106-disk-3 -> ../../zd208
lrwxrwxrwx 1 root root  13 Jul  9 17:30 vm-106-disk-3-part1 -> ../../zd208p1
lrwxrwxrwx 1 root root  10 Jul  8 01:55 vm-202-disk-0 -> ../../zd96
lrwxrwxrwx 1 root root  11 Jul  8 01:55 vm-202-disk-1 -> ../../zd192
lrwxrwxrwx 1 root root  11 Jul  8 01:55 vm-202-disk-2 -> ../../zd112
lrwxrwxrwx 1 root root  13 Jul  8 01:55 vm-202-disk-2-part1 -> ../../zd112p1
lrwxrwxrwx 1 root root  13 Jul  8 01:55 vm-202-disk-2-part2 -> ../../zd112p2
lrwxrwxrwx 1 root root  13 Jul  8 01:55 vm-202-disk-2-part3 -> ../../zd112p3
lrwxrwxrwx 1 root root  10 Jul  8 01:55 vm-202-disk-3 -> ../../zd32
lrwxrwxrwx 1 root root  12 Jul  8 01:55 vm-202-disk-3-part1 -> ../../zd32p1

They have all the same settings as for this one:

tank/vm-101-disk-0  type                  volume                 -
tank/vm-101-disk-0  creation              Thu Oct 13 21:08 2022  -
tank/vm-101-disk-0  used                  43.7G                  -
tank/vm-101-disk-0  available             64.6G                  -
tank/vm-101-disk-0  referenced            16.0G                  -
tank/vm-101-disk-0  compressratio         1.28x                  -
tank/vm-101-disk-0  reservation           none                   default
tank/vm-101-disk-0  volsize               32G                    local
tank/vm-101-disk-0  volblocksize          8K                     default
tank/vm-101-disk-0  checksum              on                     default
tank/vm-101-disk-0  compression           on                     inherited from tank
tank/vm-101-disk-0  readonly              off                    default
tank/vm-101-disk-0  createtxg             140                    -
tank/vm-101-disk-0  copies                1                      default
tank/vm-101-disk-0  refreservation        43.7G                  local
tank/vm-101-disk-0  guid                  3032082518710981070    -
tank/vm-101-disk-0  primarycache          all                    default
tank/vm-101-disk-0  secondarycache        all                    default
tank/vm-101-disk-0  usedbysnapshots       0B                     -
tank/vm-101-disk-0  usedbydataset         16.0G                  -
tank/vm-101-disk-0  usedbychildren        0B                     -
tank/vm-101-disk-0  usedbyrefreservation  27.7G                  -
tank/vm-101-disk-0  logbias               latency                default
tank/vm-101-disk-0  objsetid              136                    -
tank/vm-101-disk-0  dedup                 off                    default
tank/vm-101-disk-0  mlslabel              none                   default
tank/vm-101-disk-0  sync                  standard               default
tank/vm-101-disk-0  refcompressratio      1.28x                  -
tank/vm-101-disk-0  written               16.0G                  -
tank/vm-101-disk-0  logicalused           15.3G                  -
tank/vm-101-disk-0  logicalreferenced     15.3G                  -
tank/vm-101-disk-0  volmode               default                default
tank/vm-101-disk-0  snapshot_limit        none                   default
tank/vm-101-disk-0  snapshot_count        none                   default
tank/vm-101-disk-0  snapdev               hidden                 default
tank/vm-101-disk-0  context               none                   default
tank/vm-101-disk-0  fscontext             none                   default
tank/vm-101-disk-0  defcontext            none                   default
tank/vm-101-disk-0  rootcontext           none                   default
tank/vm-101-disk-0  redundant_metadata    all                    default
tank/vm-101-disk-0  encryption            off                    default
tank/vm-101-disk-0  keylocation           none                   default
tank/vm-101-disk-0  keyformat             none                   default
tank/vm-101-disk-0  pbkdf2iters           0                      default
rincebrain commented 1 year ago

Yes, so it's writes to one or more of those that are triggering this.

Which volume(s) are likely getting enough traffic, you could probably say better than me, but that backtrace pretty explicitly says "I am trying to write to a zvol and hit this", not "to a filesystem".

Maybe a sequel to #3684 where something is scaling the maximum it tries to write unexpectedly.

vherrlein commented 1 year ago

I found something to reproduce each time the issue. When I checked differences between a ZFS Fs and a ZFS ZVol, the major diff was the recordsize/blocksize. For the FS, it was 8K as default from the pool, and the ZVol 512b.

So to test, you must create 2 ZVol within your default pool then mnounting them as following

zfs create -V 8GB tank/test
fdisk /dev/zvol/tank/test
 > Add a GPT partion: g
 > Add a new partition: n
 > Write changes: w
mkfs.ext4 /dev/zvol/tank/test-part1
mkdir /mnt/test
mount /dev/zvol/tank/test-part1 /mnt/test
zfs create -b 512 -V 8GB tank/test2
fdisk /dev/zvol/tank/test2
 > Add a GPT part: g
 > Add a new partition: n
 > Write changes: w
mkfs.ext4 /dev/zvol/tank/test2-part1
mkdir /mnt/test2
mount /dev/zvol/tank/test2-part1 /mnt/test2

Then from a remote machine, upload a large file within each new mountpoint.

Finally, the error is their each time a large sequential writes are involved on the mountpoint /mnt/test2 :)

rincebrain commented 1 year ago

Making a 512b zvol is insane.

vherrlein commented 1 year ago

Making a 512b zvol is insane.

Totally agree, I never checked before now. It's the default when you are creating a disk for a VM from Proxmox VE :/

rincebrain commented 1 year ago

Proxmox claims their default is 8k.

vherrlein commented 1 year ago

Proxmox claims their default is 8k.

Indeed, unfortunatelly my setup has been made couple of years ago with that PVE default setting of 512 block size :/

But only now I have that kind of behaviors which at the begining wasn't :/

vherrlein commented 1 year ago

Since I “migrated” ZVols between pools with zdb send / receive by targeting new volume well configure with a block size of 8k, now I don’t have the issue anymore.

vherrlein commented 1 year ago

For information for whom have that kind of issue, after troubleshootings, it appears for Qemu VM you must define manually logical (4K) and physical block sizes (8K) for each disk associated to a ZVol in order to profit on best performance (avoiding a kind of RW translation overhead as by default virtio present disk with a block size of 512B).

vherrlein commented 1 year ago

It seems also relying to an issue with QEMU under the Kernel 6.2: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/2025591