Open gdubicki opened 4 months ago
At a minimum, the script should save the settings before the tuning for a manual revert. While the sysctl and disk settings are easy to revert now, the IRQ ones are hard.
@vladzcloudius - thoughts? @gdubicki - I think most of the issues mentioned above are benign or less relevant, but I am interested in the specifics of what was configured by perftune on your setup that caused it to slow down.
Thank for a quick response, @mykaul!
Our Scylla configuration for the Scylla Operator is as follows:
datacenter: XXX
racks:
- name: YYY
scyllaConfig: "scylla-config"
scyllaAgentConfig: "scylla-agent-config"
members: 7
storage:
storageClassName: local-raid-disks
capacity: 2200G # this is only the initial size, the actual is 3000G now (see https://github.com/scylladb/scylla-operator/issues/402)
agentResources:
# requests and limits here need to be equal to make Scylla have Guaranteed QoS class
requests:
cpu: 150m
memory: 768M
limits:
cpu: 150m
memory: 768M
resources:
# requests and limits here need to be equal to make Scylla have Guaranteed QoS class
limits:
cpu: 31
memory: 108Gi
requests:
cpu: 31
memory: 108Gi
The output with the changes it applied on one of the nodes looked like this:
$ kubectl logs perftune-containers-89cc03d2-b076-4c41-9877-4c9e985fbd28-xnh5x -n scylla-operator-node-tuning
irqbalance is not running
No non-NVMe disks to tune
Setting NVMe disks: nvme0n1, nvme0n3, nvme0n2, nvme0n4, nvme0n6, nvme0n5, nvme0n8, nvme0n7...
Setting mask 00000001 in /proc/irq/30/smp_affinity
Writing 'none' to /sys/devices/pci0000:00/0000:00:04.0/nvme/nvme0/nvme0n1/queue/scheduler
Writing '2' to /sys/devices/pci0000:00/0000:00:04.0/nvme/nvme0/nvme0n1/queue/nomerges
Writing 'none' to /sys/devices/pci0000:00/0000:00:04.0/nvme/nvme0/nvme0n3/queue/scheduler
Writing '2' to /sys/devices/pci0000:00/0000:00:04.0/nvme/nvme0/nvme0n3/queue/nomerges
Writing 'none' to /sys/devices/pci0000:00/0000:00:04.0/nvme/nvme0/nvme0n2/queue/scheduler
Writing '2' to /sys/devices/pci0000:00/0000:00:04.0/nvme/nvme0/nvme0n2/queue/nomerges
Writing 'none' to /sys/devices/pci0000:00/0000:00:04.0/nvme/nvme0/nvme0n4/queue/scheduler
Writing '2' to /sys/devices/pci0000:00/0000:00:04.0/nvme/nvme0/nvme0n4/queue/nomerges
Writing 'none' to /sys/devices/pci0000:00/0000:00:04.0/nvme/nvme0/nvme0n6/queue/scheduler
Writing '2' to /sys/devices/pci0000:00/0000:00:04.0/nvme/nvme0/nvme0n6/queue/nomerges
Writing 'none' to /sys/devices/pci0000:00/0000:00:04.0/nvme/nvme0/nvme0n5/queue/scheduler
Writing '2' to /sys/devices/pci0000:00/0000:00:04.0/nvme/nvme0/nvme0n5/queue/nomerges
Writing 'none' to /sys/devices/pci0000:00/0000:00:04.0/nvme/nvme0/nvme0n8/queue/scheduler
Writing '2' to /sys/devices/pci0000:00/0000:00:04.0/nvme/nvme0/nvme0n8/queue/nomerges
Writing 'none' to /sys/devices/pci0000:00/0000:00:04.0/nvme/nvme0/nvme0n7/queue/scheduler
Writing '2' to /sys/devices/pci0000:00/0000:00:04.0/nvme/nvme0/nvme0n7/queue/nomerges
Setting a physical interface ens5...
Distributing all IRQs
Setting mask 00000001 in /proc/irq/68/smp_affinity
Setting mask 00000001 in /proc/irq/53/smp_affinity
Setting mask 00000001 in /proc/irq/49/smp_affinity
Setting mask 00000001 in /proc/irq/40/smp_affinity
Setting mask 00000001 in /proc/irq/43/smp_affinity
Setting mask 00000001 in /proc/irq/61/smp_affinity
Setting mask 00000001 in /proc/irq/63/smp_affinity
Setting mask 00000001 in /proc/irq/73/smp_affinity
Setting mask 00000001 in /proc/irq/89/smp_affinity
Setting mask 00000001 in /proc/irq/59/smp_affinity
Setting mask 00000001 in /proc/irq/77/smp_affinity
Setting mask 00000001 in /proc/irq/80/smp_affinity
Setting mask 00000001 in /proc/irq/44/smp_affinity
Setting mask 00000001 in /proc/irq/56/smp_affinity
Setting mask 00000001 in /proc/irq/66/smp_affinity
Setting mask 00000001 in /proc/irq/46/smp_affinity
Setting mask 00000001 in /proc/irq/93/smp_affinity
Setting mask 00000001 in /proc/irq/48/smp_affinity
Setting mask 00000001 in /proc/irq/84/smp_affinity
Setting mask 00000001 in /proc/irq/94/smp_affinity
Setting mask 00000001 in /proc/irq/60/smp_affinity
Setting mask 00000001 in /proc/irq/91/smp_affinity
Setting mask 00000001 in /proc/irq/70/smp_affinity
Setting mask 00000001 in /proc/irq/79/smp_affinity
Setting mask 00000001 in /proc/irq/83/smp_affinity
Setting mask 00000001 in /proc/irq/41/smp_affinity
Setting mask 00000001 in /proc/irq/64/smp_affinity
Setting mask 00000001 in /proc/irq/95/smp_affinity
Setting mask 00000001 in /proc/irq/65/smp_affinity
Setting mask 00000001 in /proc/irq/67/smp_affinity
Setting mask 00000001 in /proc/irq/37/smp_affinity
Setting mask 00000001 in /proc/irq/75/smp_affinity
Setting mask 00000001 in /proc/irq/74/smp_affinity
Setting mask 00000001 in /proc/irq/57/smp_affinity
Setting mask 00000001 in /proc/irq/86/smp_affinity
Setting mask 00000001 in /proc/irq/78/smp_affinity
Setting mask 00000001 in /proc/irq/45/smp_affinity
Setting mask 00000001 in /proc/irq/88/smp_affinity
Setting mask 00000001 in /proc/irq/47/smp_affinity
Setting mask 00000001 in /proc/irq/85/smp_affinity
Setting mask 00000001 in /proc/irq/42/smp_affinity
Setting mask 00000001 in /proc/irq/32/smp_affinity
Setting mask 00000001 in /proc/irq/51/smp_affinity
Setting mask 00000001 in /proc/irq/69/smp_affinity
Setting mask 00000001 in /proc/irq/71/smp_affinity
Setting mask 00000001 in /proc/irq/76/smp_affinity
Setting mask 00000001 in /proc/irq/92/smp_affinity
Setting mask 00000001 in /proc/irq/34/smp_affinity
Setting mask 00000001 in /proc/irq/81/smp_affinity
Setting mask 00000001 in /proc/irq/55/smp_affinity
Setting mask 00000001 in /proc/irq/82/smp_affinity
Setting mask 00000001 in /proc/irq/87/smp_affinity
Setting mask 00000001 in /proc/irq/54/smp_affinity
Setting mask 00000001 in /proc/irq/52/smp_affinity
Setting mask 00000001 in /proc/irq/72/smp_affinity
Setting mask 00000001 in /proc/irq/90/smp_affinity
Setting mask 00000001 in /proc/irq/31/smp_affinity
Setting mask 00000001 in /proc/irq/33/smp_affinity
Setting mask 00000001 in /proc/irq/62/smp_affinity
Setting mask 00000001 in /proc/irq/35/smp_affinity
Setting mask 00000001 in /proc/irq/39/smp_affinity
Setting mask 00000001 in /proc/irq/36/smp_affinity
Setting mask 00000001 in /proc/irq/38/smp_affinity
Setting mask 00000001 in /proc/irq/58/smp_affinity
Setting mask 00000001 in /proc/irq/50/smp_affinity
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-13/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-9/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-31/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-21/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-11/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-7/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-5/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-28/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-18/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-3/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-26/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-16/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-1/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-24/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-14/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-22/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-12/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-8/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-30/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-20/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-10/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-6/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-29/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-19/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-4/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-27/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-17/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-2/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-25/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-15/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-0/rps_cpus
Setting mask fffffffe in /sys/class/net/ens5/queues/rx-23/rps_cpus
Setting net.core.rps_sock_flow_entries to 32768
Setting limit 1024 in /sys/class/net/ens5/queues/rx-13/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-9/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-31/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-21/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-11/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-7/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-5/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-28/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-18/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-3/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-26/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-16/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-1/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-24/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-14/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-22/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-12/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-8/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-30/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-20/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-10/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-6/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-29/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-19/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-4/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-27/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-17/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-2/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-25/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-15/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-0/rps_flow_cnt
Setting limit 1024 in /sys/class/net/ens5/queues/rx-23/rps_flow_cnt
Trying to enable ntuple filtering HW offload for ens5...not supported
Setting mask 00000001 in /sys/class/net/ens5/queues/tx-6/xps_cpus
Setting mask 00010000 in /sys/class/net/ens5/queues/tx-22/xps_cpus
Setting mask 00000002 in /sys/class/net/ens5/queues/tx-12/xps_cpus
Setting mask 00020000 in /sys/class/net/ens5/queues/tx-4/xps_cpus
Setting mask 00000004 in /sys/class/net/ens5/queues/tx-30/xps_cpus
Setting mask 00040000 in /sys/class/net/ens5/queues/tx-20/xps_cpus
Setting mask 00000008 in /sys/class/net/ens5/queues/tx-10/xps_cpus
Setting mask 00080000 in /sys/class/net/ens5/queues/tx-2/xps_cpus
Setting mask 00000010 in /sys/class/net/ens5/queues/tx-29/xps_cpus
Setting mask 00100000 in /sys/class/net/ens5/queues/tx-19/xps_cpus
Setting mask 00000020 in /sys/class/net/ens5/queues/tx-0/xps_cpus
Setting mask 00200000 in /sys/class/net/ens5/queues/tx-27/xps_cpus
Setting mask 00000040 in /sys/class/net/ens5/queues/tx-17/xps_cpus
Setting mask 00400000 in /sys/class/net/ens5/queues/tx-9/xps_cpus
Setting mask 00000080 in /sys/class/net/ens5/queues/tx-25/xps_cpus
Setting mask 00800000 in /sys/class/net/ens5/queues/tx-15/xps_cpus
Setting mask 00000100 in /sys/class/net/ens5/queues/tx-7/xps_cpus
Setting mask 01000000 in /sys/class/net/ens5/queues/tx-23/xps_cpus
Setting mask 00000200 in /sys/class/net/ens5/queues/tx-13/xps_cpus
Setting mask 02000000 in /sys/class/net/ens5/queues/tx-5/xps_cpus
Setting mask 00000400 in /sys/class/net/ens5/queues/tx-31/xps_cpus
Setting mask 04000000 in /sys/class/net/ens5/queues/tx-21/xps_cpus
Setting mask 00000800 in /sys/class/net/ens5/queues/tx-11/xps_cpus
Setting mask 08000000 in /sys/class/net/ens5/queues/tx-3/xps_cpus
Setting mask 00001000 in /sys/class/net/ens5/queues/tx-1/xps_cpus
Setting mask 10000000 in /sys/class/net/ens5/queues/tx-28/xps_cpus
Setting mask 00002000 in /sys/class/net/ens5/queues/tx-18/xps_cpus
Setting mask 20000000 in /sys/class/net/ens5/queues/tx-26/xps_cpus
Setting mask 00004000 in /sys/class/net/ens5/queues/tx-16/xps_cpus
Setting mask 40000000 in /sys/class/net/ens5/queues/tx-8/xps_cpus
Setting mask 00008000 in /sys/class/net/ens5/queues/tx-24/xps_cpus
Setting mask 80000000 in /sys/class/net/ens5/queues/tx-14/xps_cpus
Writing '4096' to /proc/sys/net/core/somaxconn
Writing '4096' to /proc/sys/net/ipv4/tcp_max_syn_backlog
We managed to revert the disk and sysctl settings on all nodes but that alone didn't help.
We reverted all the settings on some nodes by rebooting them.
Then we wanted to revert the masks on all the nodes, but we realized that we didn't know what the settings were before as on the rebooted nodes they are not just set to the defaults. Therefore ultimately we reverted it by rebooting all the nodes.
think that especially as there are quite a lot of known issues with this tool (f.e. https://github.com/scylladb/scylladb/issues/14873, https://github.com/scylladb/scylladb/issues/10600, https://github.com/scylladb/seastar/issues/1297, https://github.com/scylladb/seastar/issues/1698, https://github.com/scylladb/seastar/issues/1008 and maybe more), there should be a feature implemented in pertfune.py to be able to revert to the defaults.
The only still open GH issue out of the above that is related to perftune.py
is a Documentation one.
perftune.py is supposed to be quite trustworthy - especially if you use the version from the seastar
master
branch.
I'm not familiar of any open bug related to perftune.py
at the moment.
As to your request to revert the tuning: this would require backing up the configuration of all values it tunes.
This is a nice feature when you play with things.
However in production you should either use it (perftune.py) or not use it. And there is a very easy way to tell Scylla not to use perftune.py tweaking if you are confident this is what you want: set following fields in /etc/default/scylla-server
:
SET_NIC_AND_DISKS=no
SET_CLOCKSOURCE=no
DISABLE_WRITEBACK_CACHE=no
think that especially as there are quite a lot of known issues with this tool (f.e. scylladb/scylladb#14873, scylladb/scylladb#10600, #1297, #1698, #1008 and maybe more), there should be a feature implemented in pertfune.py to be able to revert to the defaults.
The only still open GH issue out of the above that is related to
perftune.py
is a Documentation one.perftune.py is supposed to be quite trustworthy - especially if you use the version from the
seastar
master
branch.I'm not familiar of any open bug related to
perftune.py
at the moment.
Respectfully, this issue is related to perftune.
It may not be very clearly visible on the screenshot, but the our average write times have increased from ~500ms to ~3000ms (so became about 6 x higher) while the 95 percentile has increased from ~2500ms to ~17500ms (about 7 x higher)!
The read times have been affected as well, although less painfully.
As to your request to revert the tuning: this would require backing up the configuration of all values it tunes.
This is a nice feature when you play with things. However in production you should either use it (perftune.py) or not use it. And there is a very easy way to tell Scylla not to use perftune.py tweaking if you are confident this is what you want: set following fields in
/etc/default/scylla-server
:SET_NIC_AND_DISKS=no SET_CLOCKSOURCE=no DISABLE_WRITEBACK_CACHE=no
Well, we did use it and it broke our performance.
Then it was very hard to revert the changes as with the local SSDs on GKE the node restart caused the Scylla nodes to fall into a restart loop. We had to hack them to think they are replacing themselves to make them start without bootstrapping as new nodes. It didn't work for one node and it did bootstrap, which took more than 10 hours.
Overall we spent 3 days reverting the optimisations so I think there is need for a revert feature.
We would be happy to help with this by providing some PRs, but we would probably need some guidance, maybe over Slack?
perftune.py is supposed to be quite trustworthy - especially if you use the version from the
seastar
master
branch.
We used the version bundled with Scylla 5.4.9.
It may not be very clearly visible on the screenshot, but the our average write times have increased from ~500ms to ~3000ms (so became about 6 x higher) while the 95 percentile has increased from ~2500ms to ~17500ms (about 7 x higher)!
This should get its own issue (in Scylla) and we can look at it there, if we understand what changes were made (which I assume is doable, since you reverted them).
It may not be very clearly visible on the screenshot, but the our average write times have increased from ~500ms to ~3000ms (so became about 6 x higher) while the 95 percentile has increased from ~2500ms to ~17500ms (about 7 x higher)!
This should get its own issue (in Scylla) and we can look at it there, (...)
Sure, I can open an issue in https://github.com/scylladb/scylladb, if that's a more appropriate place.
(...) if we understand what changes were made (which I assume is doable, since you reverted them).
Well, we know what the settings were after changing them from the perftune logs (above), but we don't exactly know what they were before. That's the point.
think that especially as there are quite a lot of known issues with this tool (f.e. scylladb/scylladb#14873, scylladb/scylladb#10600, #1297, #1698, #1008 and maybe more), there should be a feature implemented in pertfune.py to be able to revert to the defaults.
The only still open GH issue out of the above that is related to
perftune.py
is a Documentation one. perftune.py is supposed to be quite trustworthy - especially if you use the version from theseastar
master
branch. I'm not familiar of any open bug related toperftune.py
at the moment.Respectfully, this issue is related to perftune.
It may not be very clearly visible on the screenshot, but the our average write times have increased from ~500ms to ~3000ms (so became about 6 x higher) while the 95 percentile has increased from ~2500ms to ~17500ms (about 7 x higher)!
@gdubicki you need to keep in mind that you should only (!!) use perftune.py in conjunction with corresponding Scylla CPUs pinning.
Was that all the case?
The read times have been affected as well, although less painfully.
As to your request to revert the tuning: this would require backing up the configuration of all values it tunes. This is a nice feature when you play with things. However in production you should either use it (perftune.py) or not use it. And there is a very easy way to tell Scylla not to use perftune.py tweaking if you are confident this is what you want: set following fields in
/etc/default/scylla-server
:SET_NIC_AND_DISKS=no SET_CLOCKSOURCE=no DISABLE_WRITEBACK_CACHE=no
Well, we did use it and it broke our performance.
Then it was very hard to revert the changes as with the local SSDs on GKE the node restart caused the Scylla nodes to fall into a restart loop. We had to hack them to think they are replacing themselves to make them start without bootstrapping as new nodes. It didn't work for one node and it did bootstrap, which took more than 10 hours.
Overall we spent 3 days reverting the optimisations so I think there is need for a revert feature.
We would be happy to help with this by providing some PRs, but we would probably need some guidance, maybe over Slack?
think that especially as there are quite a lot of known issues with this tool (f.e. scylladb/scylladb#14873, scylladb/scylladb#10600, #1297, #1698, #1008 and maybe more), there should be a feature implemented in pertfune.py to be able to revert to the defaults.
The only still open GH issue out of the above that is related to
perftune.py
is a Documentation one. perftune.py is supposed to be quite trustworthy - especially if you use the version from theseastar
master
branch. I'm not familiar of any open bug related toperftune.py
at the moment.Respectfully, this issue is related to perftune. It may not be very clearly visible on the screenshot, but the our average write times have increased from ~500ms to ~3000ms (so became about 6 x higher) while the 95 percentile has increased from ~2500ms to ~17500ms (about 7 x higher)!
@gdubicki you need to keep in mind that you should only (!!) use perftune.py in conjunction with corresponding Scylla CPUs pinning.
If you mean using the static CPU manager policy with Guaranteed QoS class, then we did that. See our config in this comment.
But maybe it was wrong to allocate 31 cores for Scylla out of 32 core machine? Should we leave some cores free here? 🤔
- You should also keep in mind that perftune.py needs to run on the Hypervisor - not from the POD.
We have run perftune using Scylla Operator (v. 1.13.0), so it's done in whatever way it does it.
- On top of that you need to remember that you must make sure that Scylla PODs CPUs are never allowed to run on the so called IRQ CPUs - the ones perftune.py pins IRQs affinities to: in your case it was CPU0.
Was that all the case?
I don't know, to be frank.
We have just configured Scylla as in this comment on a n2d-standard-32 node and enabled the performance tuning that used perftune.
I think you need to use 'cpuset' to ensure pods are assigned static CPU assignment.
It might be interesting that this was logged by Scylla when starting on the restarted nodes:
The first measurement looks about right for 8 local NVMe SSD nodes in GCP, but all the other results are very bad.
Note that this is from the restarts to disable the perftune optimizations.
...however the values ultimately written to the config file look like roughly like this on all the nodes:
root@...:/# cat /etc/scylla.d/io_properties.yaml
disks:
- mountpoint: /var/lib/scylla
read_iops: 721497
read_bandwidth: 2950537984
write_iops: 401104
write_bandwidth: 1623555072
...except one, which has substantially lower values for writes:
root@...-hrjn:/# cat /etc/scylla.d/io_properties.yaml
disks:
- mountpoint: /var/lib/scylla
read_iops: 682532
read_bandwidth: 2951263744
write_iops: 39928
write_bandwidth: 759449856
...but I suppose it's a measurement error.
I think you need to use 'cpuset' to ensure pods are assigned static CPU assignment.
According to this doc this is done automatically and for our nodes is set like this right now:
root@...:/# cat /etc/scylla.d/cpuset.conf
# DO NO EDIT
# This file should be automatically configure by scylla_cpuset_setup
#
# CPUSET="--cpuset 0 --smp 1"
CPUSET="--cpuset 1-31 "
Different io_properties.yaml are interesting. Either some issue, or you got a lemon. That happens :-/
What bugs me is the question if it is right to assign 31 cores on a 32 core machine for the Scylla pod? Shouldn’t I leave a bit more free for other workloads? (But note that these are dedicated nodes for Scylla, the other workloads are other Scylla pods, Datadog and kubesystem only).
What bugs me is the question if it is right to assign 31 cores on a 32 core machine for the Scylla pod? Shouldn’t I leave a bit more free for other workloads? (But note that these are dedicated nodes for Scylla, the other workloads are other Scylla pods, Datadog and kubesystem only).
You are asking the wrong question - the question is how many cores you should dedicate to network IRQ handling vs. Scylla cores. That's a ratio you need to ensure is reasonable. Scylla can work on fewer cores - it's up to you how many you wish to have. With very few cores, we don't even use dedicated cores for network processing. That's what perftune does (among other things). Specifically, 31 out of 32 doesn't make sense to me. Should be more to networking.
What bugs me is the question if it is right to assign 31 cores on a 32 core machine for the Scylla pod?
This doesn't look correct indeed. Regardless whether you have HT enabled or disabled perftune.py was supposed to allocate 2 CPUs for IRQs.
My I see the content of /etc/scylla.d/perftune.yaml
?
We have run perftune using Scylla Operator (v. 1.13.0), so it's done in whatever way it does it.
The page above is a bit unclear what needs to be run where but allow me to re-iterate:
1) perftune.py must configure the VM resources. I'm not a K8S specialist but usually you can't change Host level OS configuration from inside the container. Hence you should run perftune.py manually on the host VM itself.
2) If you want to achieve the maximum performance using perftune.py the container must be pinned to the corresponding host CPUs as mentioned by @mykaul above and it should never be allowed to run on "IRQ CPUs". In the configuration above I don't see where the POD is forbidden to run on the CPU0 - I only see that you tell it to use 31 CPUs - but which ones will be used in this case? I'm not sure it's safe to assume it will be 1-31. I'd assume it will more likely be 0-30.
3) When running (1) on the host pay attention which CPUs are configured as "IRQ CPUs" - you can use a --get-irq-cpu-mask
perftune.py parameter to print the corresponding CPU mask. And then make sure to pin your POD away from those CPUs and make sure to use the corresponding value in cpuset.conf
you referenced above.
Let me know if there are more questions I can help you with, @gdubicki ?
Thanks @vladzcloudius!
My I see the content of /etc/scylla.d/perftune.yaml?
I don't have a copy of it from before reverting the tuning, and now this file does not exist on my Scylla nodes.
The page above is a bit unclear what needs to be run where but allow me to re-iterate: (...)
I guess that we would need to ask the Scylla Operator team about whether this is done this way. cc @tnozicka
Installation details Scylla version (or git commit hash): 5.4.9 Cluster size: 7 nodes OS (RHEL/CentOS/Ubuntu/AWS AMI): Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-1059-gke x86_64)
Hardware details (for performance issues) Platform (physical/VM/cloud instance type/docker): GKE, v1.29.5-gke.1192000 Hardware: n2d-standard-32, min. CPU platform: AMD Milan Disks: (SSD/HDD, count): 8 x local SSD
We have run the
perftune.py
on our cluster for the first time and after the changes have been applied our Scylla read and write times have jumped (the change was applied a bit before 21:00):So far the only way we found to completely revert the changes was to restart the Scylla nodes but it's a long and painful procedure.
I think that especially as there are quite a lot of known issues with this tool (f.e. https://github.com/scylladb/scylladb/issues/14873, https://github.com/scylladb/scylladb/issues/10600, https://github.com/scylladb/seastar/issues/1297, https://github.com/scylladb/seastar/issues/1698, https://github.com/scylladb/seastar/issues/1008 and maybe more), there should be a feature implemented in
pertfune.py
to be able to revert to the defaults.