Closed voutilad closed 3 months ago
Here's full debug output:
root@rpk:/# rpk redpanda tune disk_irq -v
19:17:24.584 DEBUG Looking for interface with '[0.0.0.0 0.0.0.0]' addresses
19:17:24.585 DEBUG Checking 'lo' address '127.0.0.1/8'
19:17:24.585 DEBUG Checking 'eth0' address '10.0.10.45/24'
19:17:24.585 DEBUG Creating disk IRQs tuner with mode 'def', cpu mask 'all', directories '[/var/lib/redpanda/data]' and devices '[]'
19:17:24.585 DEBUG Checking if 'hwloc-calc-redpanda' & 'hwloc-distrib-redpanda' are present...
19:17:24.585 DEBUG Tuner parameters &{Mode: CPUMask:all RebootAllowed:false Disks:[] Directories:[/var/lib/redpanda/data] Nics:[eth0]}
19:17:24.585 DEBUG Collecting info about directory '/var/lib/redpanda/data'
19:17:24.585 DEBUG Getting block device from path '/var/lib/redpanda/data'
19:17:24.585 DEBUG Creating block device from number {8, 16}
19:17:24.585 DEBUG Reading block device details from '/sys/devices/platform/host3/session8/target3:0:0/3:0:0:2/block/sdb'
19:17:24.585 DEBUG Getting physical device from '/sys/devices/platform/host3/session8/target3:0:0/3:0:0:2/block/sdb'
19:17:24.585 DEBUG Checking 'Disks IRQs affinity static'
19:17:24.585 DEBUG Getting 'sdb' IRQs
19:17:24.585 DEBUG Getting block device from path '/dev/sdb'
19:17:24.586 DEBUG Creating block device from number {8, 16}
19:17:24.586 DEBUG Reading block device details from '/sys/devices/platform/host3/session8/target3:0:0/3:0:0:2/block/sdb'
19:17:24.586 DEBUG Getting controller path for '/sys/devices/platform/host3/session8/target3:0:0/3:0:0:2/block/sdb'
19:17:24.586 DEBUG Reading IRQs of '/sys/devices/platform/host3/session8/target3:0:0/3:0:0:2/block/sdb', with deviceInfo name pattern 'blkif'
19:17:24.586 DEBUG Reading '/proc/interrupts' file...
19:17:24.586 DEBUG DeviceInfo '/sys/devices/platform/host3/session8/target3:0:0/3:0:0:2/block/sdb' IRQs '[]'
19:17:24.586 DEBUG Checking if we are running on i3.metal amazon instance type
19:17:24.594 DEBUG Running on 'No such metadata item' EC2 instance
19:17:24.594 DEBUG Running command 'ps' with arguments '[--no-headers -C irqbalance]'
19:17:24.595 DEBUG Check 'Disks IRQs affinity static' passed, skipping tuning
19:17:24.595 DEBUG Checking 'Disks IRQs affinity set'
19:17:24.595 DEBUG Getting [sdb] IRQs distribution with mode def and CPU mask all
19:17:24.595 DEBUG Running command 'hwloc-calc-redpanda' with arguments '[all]'
19:17:24.606 DEBUG Getting 'sdb' IRQs
19:17:24.606 DEBUG Getting block device from path '/dev/sdb'
19:17:24.606 DEBUG Creating block device from number {8, 16}
19:17:24.606 DEBUG Reading block device details from '/sys/devices/platform/host3/session8/target3:0:0/3:0:0:2/block/sdb'
19:17:24.606 DEBUG Getting controller path for '/sys/devices/platform/host3/session8/target3:0:0/3:0:0:2/block/sdb'
19:17:24.606 DEBUG Reading IRQs of '/sys/devices/platform/host3/session8/target3:0:0/3:0:0:2/block/sdb', with deviceInfo name pattern 'blkif'
19:17:24.606 DEBUG Reading '/proc/interrupts' file...
19:17:24.606 DEBUG DeviceInfo '/sys/devices/platform/host3/session8/target3:0:0/3:0:0:2/block/sdb' IRQs '[]'
19:17:24.606 DEBUG Checking if we are running on i3.metal amazon instance type
19:17:24.614 DEBUG Running on 'No such metadata item' EC2 instance
19:17:24.614 DEBUG Calculating default mode for Disk IRQs
19:17:24.614 DEBUG Running command 'hwloc-calc-redpanda' with arguments '[--restrict 0x000000ff --number-of core machine:0]'
19:17:24.625 DEBUG Running command 'hwloc-calc-redpanda' with arguments '[--restrict 0x000000ff --number-of PU machine:0]'
19:17:24.635 DEBUG Considering '4' cores and '8' PUs
19:17:24.635 DEBUG Computing IRQ CPU mask for 'sq' mode and input CPU mask '0x000000ff'
19:17:24.635 DEBUG Computing CPU mask for 'sq' mode and input CPU mask '0x000000ff'
19:17:24.635 DEBUG Running command 'hwloc-calc-redpanda' with arguments '[0x000000ff ~PU:0]'
19:17:24.644 DEBUG Computations CPU mask '0x000000fe'
19:17:24.644 DEBUG Running command 'hwloc-calc-redpanda' with arguments '[0x000000ff ~0x000000fe]'
19:17:24.654 DEBUG IRQs CPU mask '0x00000001'
19:17:24.654 DEBUG Running command 'hwloc-distrib-redpanda' with arguments '[0 --single --restrict 0x00000001]'
TUNER APPLIED ENABLED SUPPORTED ERROR
disk_irq false true true err=signal: segmentation fault (core dumped), stderr=
For additional background, I have an Oracle Block Volume backing the PV mounted to:
/dev/sdb on /var/lib/redpanda/data type xfs (rw,relatime,seclabel,nouuid,attr2,inode64,logbufs=8,logbsize=32k,noquota)
It looks to be attached via SCSI:
[556929.998574] sd 3:0:0:2: [sdb] 104857600 512-byte logical blocks: (53.7 GB/50.0 GiB)
[556930.001978] sd 3:0:0:2: [sdb] 4096-byte physical blocks
[556930.004672] sd 3:0:0:2: [sdb] Write Protect is off
[556930.006880] sd 3:0:0:2: [sdb] Mode Sense: 2b 00 10 08
[556930.007135] sd 3:0:0:2: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA
[556930.011066] sd 3:0:0:2: [sdb] Optimal transfer size 1048576 bytes
[556930.019158] sd 3:0:0:2: [sdb] Attached SCSI disk
I think this is a dupe of https://github.com/redpanda-data/core-internal/issues/1145
If you run into this again could you please try:
hwloc-distrib-redpanda 0 --single --restrict 0x00000001
manuallyhwloc-distrib 0 --single --restrict 0x00000001
manuallyClosing this in favor of the above mentioned ticket which has some investigation already.
Version & Environment
Redpanda version: 23.3.11
Running in Oracle Cloud:
What went wrong?
dmesg output after segfault:
What should have happened instead?
Either error or correctly tuned.
How to reproduce the issue?
rpk redpanda mode prod
rpk redpanda tune disk_irq
JIRA Link: CORE-2376