Open pdbossman opened 1 year ago
FYI - @vladzcloudius @tarzanek
@pdbossman - I don't think that's the right location for this issue - or the fix (should probably be in Scylla's /dist/common/scripts/scylla_ntp_setup , just like https://github.com/scylladb/scylladb/issues/13344 )?
scylladb/scylladb#13344
If I have where clocksource is set wrong, I apologize. To have Azure support, there should be a clocksource for azure that is recommended, and the code to choose optimal clocksource should identify it and set it.
If I have where clocksource is set wrong, I apologize. To have Azure support, there should be a clocksource for azure that is recommended, and the code to choose optimal clocksource should identify it and set it.
I'm quite sure the VM (if you are using our image) is configured properly. See https://github.com/scylladb/scylladb/issues/13363 .
@mykaul this is not about an AMI. In perftune.py, ScyllaDB recommends a clocksource - usually tsc or in Amazon nitro case, kvm-clock.
Here is the pr adding kvm-clock.
The problem here is - neither of the recommended clocksources are available.
I am suggesting as for Amazon, we need a recommended Azure clocksource (that is available on Azure instance types).
@mykaul this issue has nothing to do with a wall clock in general or NTP in particular. It's about a system (realtime) clock source used for calculating high resolution (nanoseconds level) timestamps: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_for_real_time/7/html/reference_guide/chap-timestamping
And, as Patrick wrote above none of this has anything to do with any "VM image" - it's simply irrelevant since at least some (most?) people are going to use our packages - not our images. - Even if we are considering scylla
users only (which we are not).
And on top of this this issue goes beyond scylla
- this problem is relevant for every seastar usage. scylla
is just one of them.
Having said all that this GH issue is exactly where it's supposed to be - in the seastar
project. And the component that needs to be patched (if at all) is perftune.py
.
I hope this makes more sense to you now.
I'd be surprised if modern kernel doesn't already pick the 'best' clock source.
I'd be surprised if modern kernel doesn't already pick the 'best' clock source.
It doesn't always, @mykaul. Hence the tweaking in perftune.py
@roydahan - what's the clock source used in your instance in Azure?
We test only our Scylla images for Azure and IIUC this issue is not relevant for the images.
I'd be surprised if modern kernel doesn't already pick the 'best' clock source.
It doesn't always, @mykaul. Hence the tweaking in perftune.py
Do we have cases it isn't picked properly? I see https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1875467 for example, so I think it's working properly these days?
I'd be surprised if modern kernel doesn't already pick the 'best' clock source.
It doesn't always, @mykaul. Hence the tweaking in perftune.py
Do we have cases it isn't picked properly? I see https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1875467 for example, so I think it's working properly these days?
I know Standard_*_v2 images has the following: cat /sys/devices/system/clocksource/clocksource0/available_clocksource hyperv_clocksource_tsc_page acpi_pm
Neither of these are recommended.
@pdbossman what's wrong with hyperv_clocksource_tsc_page clock source?
so why not adding hyperv_clocksource_tsc_page to the list in https://github.com/scylladb/seastar/blob/master/scripts/perftune.py#L1087 ?
one line and it should fix this issue, there is no other clock to pick anyways now ... (and it should fix perftune.py)
so why not adding hyperv_clocksource_tsc_page to the list in https://github.com/scylladb/seastar/blob/master/scripts/perftune.py#L1087 ?
one line and it should fix this issue, there is no other clock to pick anyways now ... (and it should fix perftune.py)
@tarzanek this is not the correct approach. We don't add "lines" into perftune.py
simply because "there is no other clock to pick from".
Nor we rely on OS defaults if a particular configuration is required, @mykaul. - We enforce it. If we don't, and even if it's "correct" today", tomorrow OS vendor is going to change defaults for whatever reason and we wouldn't know and suffer a regression.
@pdbossman what is the recommended clock source for Azure? Please, provide a reference to the corresponding documentation.
Hello, It is very difficult to find recommendations of what clocksource to use. The closing line from this kernal article indicate Hyper-V would be preferred. "Clockevents based on the virtualized PIT and local APIC timer also work, but the Hyper-V synthetic timer is preferred." https://docs.kernel.org/virt/hyperv/clocks.html
Hello, It is very difficult to find recommendations of what clocksource to use. The closing line from this kernal article indicate Hyper-V would be preferred. "Clockevents based on the virtualized PIT and local APIC timer also work, but the Hyper-V synthetic timer is preferred." https://docs.kernel.org/virt/hyperv/clocks.html
And I re-iterate, as I did work with the RHEL Virt team on this in the past, and I'm sure Microsoft did this work with all other major OS - the OS picks the right clocksource these days.
Hello, It is very difficult to find recommendations of what clocksource to use. The closing line from this kernal article indicate Hyper-V would be preferred. "Clockevents based on the virtualized PIT and local APIC timer also work, but the Hyper-V synthetic timer is preferred." https://docs.kernel.org/virt/hyperv/clocks.html
And I re-iterate, as I did work with the RHEL Virt team on this in the past, and I'm sure Microsoft did this work with all other major OS - the OS picks the right clocksource these days.
Definitely not all of them and not always: Ubuntu on GCP doesn't pick the recommended kvm-clock
and uses tsc
by default.
FYI, @mykaul
so why not adding hyperv_clocksource_tsc_page to the list in https://github.com/scylladb/seastar/blob/master/scripts/perftune.py#L1087 ?
one line and it should fix this issue, there is no other clock to pick anyways now
Unfortunately there is - see the opening message.
... (and it should fix perftune.py)
I agree. This is the way to go IMO.
Azure Standard_L16s_v2 image is used.
cat /sys/devices/system/clocksource/clocksource0/available_clocksource hyperv_clocksource_tsc_page acpi_pm
Please identify and add azure recommended clocksource to perftune.py.