Clock skew issues megathread

craigloewen-msft commented 1 year ago

Megathread

Current status: waiting on backport for kernel patch to mitigate issue.

We're creating this megathread to track the clock skew issues in WSL in one place, and will keep this parent comment current with any updates.

Background

Sometimes the WSL clock can become skewed after resume from sleep (specifically S0). See some example related issues for more info: https://github.com/microsoft/WSL/issues/8318 https://github.com/microsoft/WSL/issues/8204 https://github.com/microsoft/WSL/issues/7255

Potential work arounds

Use systemd to force clock sync

See this comment: https://github.com/microsoft/WSL/issues/8204#issuecomment-1338334154

Set the hardware clock via a command

Run sudo hwclock -s. More info here.

Run `ntpdate` on distro start up

Edit /etc/wsl.conf to have this content:

[boot]
command="ntpdate ntp.ubuntu.com"

This will force a clock reset on start up of the distro.

Build a private kernel with this patch

marceloid commented 1 year ago

This workaround using systemd was the best and only solution in my case: https://github.com/microsoft/WSL/issues/8204#issuecomment-1338334154

duaneking commented 1 year ago

The number of open issues that relate to this temporal distortion worry me, because it tells me there isn't a critical security focus on time by the teams involved in all these regressions and supposed fixes; so who is running the show for temporal security at MS? Anybody? Who is the Time Czar in the Security Org?

Or perhaps a better question to ask: Why aren't these consistent temporal anomalies being considered more of a security issue?

Everybody in security knows that time is a critical security component; so if time is not correct on the host system, then it is simply out of security compliance by default, right?

haroldiedema commented 1 year ago

Here's a simple fix that invokes hwclock -s everytime your machine wakes up from sleep. Note: This assumes you configured your WSL environment in such a way that it should always be running (for daemons/webservers/etc.).

Assuming you've changed the default user to something other than root, you'll first need to allow passwordless sudo when invoking sudo hwclock -s by updating the sudoers file:

$ sudo visudo

Add the following line:

%sudo   ALL=(ALL) NOPASSWD: /usr/sbin/hwclock

Next, create a batch file somewhere on your machine, e.g.: C:\sync-clock.bat with the following contents:

@echo off
ubuntu.exe run "sudo hwclock -s"
exit

(Change "ubuntu.exe" if you need to)

Lastly, create a Task in the Task Scheduler that runs every time your computer wakes up from sleep.

Open the Task Scheduler and create a new "Basic Task".
Set the trigger to "When a specific event is logged" and click "Next".
Under "Log", select System.
Under "Source", select Kernel-Power.
Under "Event ID", type "507".
Under "action", select "Start a program" and click "next"
Specify the batch file we just created: C:\sync-clock.bat.

If the task is not executed on your machine, it may be because your version of Windows emits a different Event ID. Open up the Event Viewer and check under System for any log entries that have the source "Kernel-Power" that match a timestamp when your machine has woken up from sleep mode. Verify the correct event by reading its description. It should state something along the lines of "The system exited sleep mode". The correct "Event ID" should be listed in the same window.

Hope this helps.

ManuInNZ commented 1 year ago

I suppose one could argue that it should rather be a wsl Linux kernel level task to have a hardware clock synch at power up/restore?

in the proposed temp solution, I have used in the past wsl -l -v to list running distro and then exec into them, wsl -d <distro> hwclock -s. Might require the video step though, I had issues with some distros.

cheers manu

On Thu, 27 Apr 2023, 00:08 Harold Iedema, @.***> wrote:

Here's a simple fix that invokes hwclock -s everytime your machine wakes up from sleep. Note: This assumes you configured your WSL environment in such a way that it should always be running (for daemons/webservers/etc.).

Assuming you've changed the default user to something other than root, you'll first need to allow passwordless sudo when invoking sudo hwclock -s by updating the sudoers file:

$ sudo visudo

Add the following line:

%sudo ALL=(ALL) NOPASSWD: /usr/sbin/hwclock

Next, create a batch file somewhere on your machine, e.g.: C:\sync-clock.bat with the following contents:

@echo off ubuntu.exe run "sudo hwclock -s"exit

(Change "ubuntu.exe" if you need to)

Lastly, create a Task in the Task Scheduler that runs every time your computer wakes up from sleep.

Open the Task Scheduler and create a new "Basic Task". This time, we want to start a task on a specific event.

Set the trigger to "When a specific event is logged" and click "Next".

Under "Log", select System.

Under "Source", select Kernel-Power.

Under "Event ID", type "507".

Under "action", select "Start a program" and click "next"

Specify the batch file we just created: C:\sync-clock.bat.

If the task is not executed on your machine, it may be because your version of Windows emits a different Event ID. Open up the Event Viewer and check under System for any log entries that have the source "Kernel-Power" that match a timestamp when your machine has woken up from sleep mode. Verify the correct event by reading its description. It should state something along the lines of "The system exited sleep mode". The correct "Event ID" should be listed in the same window.

Hope this helps.

— Reply to this email directly, view it on GitHub https://github.com/microsoft/WSL/issues/10006#issuecomment-1523309517, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4R3ABWJD77SHKIDZBFM4TXDEF3LANCNFSM6AAAAAAXG6OMXU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

esumii commented 1 year ago

Open the Task Scheduler and create a new "Basic Task".

Set the trigger to "When a specific event is logged" and click "Next".

Under "Log", select System.

Under "Source", select Kernel-Power.

Under "Event ID", type "507".

I'd add 107 as well.

Under "action", select "Start a program" and click "next"

Specify the batch file we just created: C:\sync-clock.bat.

If the task is not executed on your machine,

Another reason can be (lack of) power:

https://github.com/microsoft/WSL/issues/5324#issuecomment-1396292234

to run without power, "Start the task only if the computer is on AC power" must be unchecked on the Task Scheduler of Windows)

Also see the above link to do without sudo.

gaia commented 1 year ago

Lastly, create a Task in the Task Scheduler that runs every time your computer wakes up from sleep.

Task Scheduler doesn't work to run tasks in WSL2, see https://github.com/microsoft/WSL/issues/9231

dboreham commented 1 year ago

I don't think we need any more workarounds do we? Someone needs to go into the hypervisor code and fix the bug, no? Hypervisor's job is to present correct hardware clock functionality to its host kernels, presumably. Or have we moved on from that being the case?

duaneking commented 1 year ago

I don't think we need any more workarounds do we?

No. An actual fix would be best for security and compliance as systems having the wrong time can create GDPR violations in the worst case, and MSFT is a globally GDPR compliant company, right? So the team has a clear mandate as part of being One Microsoft that they need to fix this, right?

Someone needs to go into the hypervisor code and fix the bug, no?

Yes, if that is where it is.

Hypervisor's job is to present correct hardware clock functionality to its host kernels, presumably. Or have we moved on from that being the case?

I don't believe anybody's done that kind of due diligence. The correct people at that level don't seem to be aware of this issue, or I suspect it would have been resolved if they truly understood how big of an issue this was, so the fact nobody in technical leadership has freaked out and mandated a fix asap tells me that has not happened yet.

Clockwork-Muse commented 1 year ago

if they truly understood how big of an issue this was, so the fact nobody in technical leadership has freaked out and mandated a fix asap tells me that has not happened yet.

I don't think this is as large of an issue as you're trying to make it out to be.

Keep in mind that WSL is primarily intended to be a developer tool, and not something you'd run a production-level server on (also - if your server is allowed to sleep you probably have much larger problems). It would be simpler, easier, and cheaper to just run whatever distro "natively" (either in a dedicated hypervisor like Hyper-V, or on bare metal).

Yes, it's annoying. Yes, there are security issues (although exploiting them still requires stealing a private key, which shouldn't be trivial). If it's the end of the world for you, though, you likely have larger problems.

duaneking commented 1 year ago

I don't think this is as large of an issue as you're trying to make it out to be.

Then respectfully, you do not understand the issue or how this impacts the world at scale.

Keep in mind that WSL is primarily intended to be a developer tool, and not something you'd run a production-level server on

... and that's exactly why this is such a big issue. if you're making the bad assumption that because this is a developer's system that it's not going to be attacked, then sadly I have some bad news for you. We developers get attacked everyday, and even now people are trying to figure out ways to get on our machines. You ever hear of supply chain attacks? Developers are the supply chain.

(also - if your server is allowed to sleep you probably have much larger problems).

I agree but this is workstations, and that's even more heavily audited in some environments. If companies invested half as much in their production system security as they put into their corporate security, a lot of breaches would never happen.

It would be simpler, easier, and cheaper to just run whatever distro "natively" (either in a dedicated hypervisor like Hyper-V, or on bare metal).

Not for everybody. It would not be easier for me. I asked for many GB of ram for a reason: Docker/K8s.

Yes, it's annoying. Yes, there are security issues (although exploiting them still requires stealing a private key, which shouldn't be trivial). If it's the end of the world for you, though, you likely have larger problems.

Because looking in "%SystemDrive%\Documents and Settings\All Users\Application Data\Microsoft\Crypto\RSA\MachineKeys is hard, right? ;) I get what you're saying, but it's also clear to me that you don't have the same goals I do.

gaia commented 1 year ago

Even apt install/update fails when the clock is off. You can't develop if you can't install/update the tools you need.

This is an important issue. This should be basic to fix. The systemd workaround is enough for me.

esumii commented 1 year ago

Task Scheduler doesn't work to run tasks in WSL2, see #9231

When the user is not logged on to the desktop, I suppose?

asampal commented 1 year ago

I think another point to note is that WSL2 didn't always have this problem. So if it could do the right thing after waking up at one point, how come it's such a problem fixing the issue now?

dboreham commented 1 year ago

I don't think this is as large of an issue as you're trying to make it out to be.

Respectfully disagree. It's a very serious hypervisor bug. Presumably needs to be filed against the right product (not WSL, most likely) to get on the radar of someone who can fix it.

duaneking commented 1 year ago

I think another point to note is that WSL2 didn't always have this problem. So if it could do the right thing after waking up at one point, how come it's such a problem fixing the issue now?

Yes, exactly. This is a regression in a formally working product.

haroldiedema commented 1 year ago

Task Scheduler doesn't work to run tasks in WSL2, see #9231

When the user is not logged on to the desktop, I suppose?

No, he's right. That's why creating batch files is important. TaskScheduler can run a batch file, which then invokes the WSL commands. That works just fine.

ghost commented 1 year ago

The number of open issues that relate to this temporal distortion worry me, because it tells me there isn't a critical security focus on time by the teams involved in all these regressions and supposed fixes; so who is running the show for temporal security at MS? Anybody? Who is the Time Czar in the Security Org?

Or perhaps a better question to ask: Why aren't these consistent temporal anomalies being considered more of a security issue?

Everybody in security knows that time is a critical security component; so if time is not correct on the host system, then it is simply out of security compliance by default, right?

@troyhunt can you get some traction on this?

duaneking commented 1 year ago

I guarantee you that audit logs that have the wrong timestamp due to the host system times being wrong can create severe problems; mostly because these audit records containing false data are considered immutable proof of legal compliance in these systems.

The end result is that data is being logged that is not correct, and then every single some of these systems that doesn't know the input is bad is then saying that data as presented is correct.

I would like to see this issue fixed.

chrisclapham commented 1 year ago

My team and I are also facing this issue. After a system restart or sleep the clock is usually behind by ~40mins. For now sudo hwclock -s seems to be working for some of us.

Eagerly looking forward to an official fix.

0xabu commented 1 year ago

Running hwclock -s gets me much closer to reality, but it's still off by 5 minutes:

$ sudo hwclock -s; date; cmd.exe /c "time /t"
Wed May 10 09:46:17 CEST 2023
09:51

Update: the VM's "hardware clock" appears to be reporting the time off by 5 minutes. This is not a drift calculation in the guest:

cmd.exe /c "echo %time%" ; sudo hwclock -r --verbose
 9:55:21.25
hwclock from util-linux 2.37.2
System Time: 1683705037.069536
Trying to open: /dev/rtc0
Using the rtc interface to the clock.
Assuming hardware clock is kept in UTC time.
Waiting for clock tick...
...got clock tick
Time read from Hardware Clock: 2023/05/10 07:50:18
Hw clock time : 2023/05/10 07:50:18 = 1683705018 seconds since 1969
Time since last adjustment is 1683705018 seconds
Calculated Hardware Clock drift is 0.000000 seconds
2023-05-10 09:50:16.900289+02:00

lewissbaker commented 1 year ago

The sudo hwclock -s command sometimes results in a clock that is still hours off of the realtime for me.

I've found the following snippet (heavily adapted/reduced from wslact utility from WSL utilities project works more reliably:

fix-time.sh

#!/bin/bash

set -e

function pwsh {
    local PowerShellExe="/mnt/c/Program Files/PowerShell/7/pwsh.exe"
    "$PowerShellExe" -NoProfile -NonInteractive -ExecutionPolicy Bypass -Command "[Console]::OutputEncoding = [System.Text.Encoding]::UTF8; [Console]::InputEncoding = [System.Text.Encoding]::UTF8; $*"
}

function full_date {
    date +"%F %T"
}

echo "Prev date: $(full_date)"

sudo date -u -s "$(pwsh Get-Date -AsUTC -UFormat \"%FT%TZ\")" > /dev/null

echo "New date : $(full_date)"

It just gets the current UTC time from Windows by running a PowerShell command, and then runs date to set the local WSL time to that time. It is still only to the nearest second, but that's good enough for my purposes.

The wslact time-sync command didn't work for me as it only outputs timezone information from the host to the nearest hour and so doesn't give the right result if you're on a timezone offset that isn't a whole number of hours.

benc-uk commented 1 year ago

Likewise sudo hwclock -s still results in a clock hours out of sync, NTP is the only solution I've found to work, e.g. sudo ntpdate time.windows.com

dboreham commented 1 year ago

imho posts of the form "I found I could run hwclock|ntp and it made my clock kind of right" should be prohibited here. The bug is about the hypervisor screws up the guest OS's time. There is no workaround for that. The hypervisor just needs to be fixed such that it presents virtualized RTC to the guest that works.

(Sorry, no coffee yet this morning).

ipalopezhentsev commented 1 year ago

One more example of how it screws up work: suppose you've done some AWS S3 files downloading, then walked away, your computer went to sleep. Now you return and intend to download more files via the same console. AWS starts rejecting your attempts due to skewed time.

Lucasjuv commented 1 year ago

Another example of how this is a problem is when using terraform on WSL. Currently I've faced repeated signature issues because of a timestamp in the future when trying to run the acceptance tests of Hashicorp's Azurerm provider:

Error: reading queue properties for AzureRM Storage Account "acctestsa230529124321129": queues.Client#GetServiceProperties: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthenticationFailed" Message="Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:fbd76c32-8003-009b-324c-925663000000\nTime:2023-05-29T16:44:19.4658229Z"

superm1 commented 1 year ago

I've recently hit this with suspending a laptop (using Modern Standby over the weekend).

To me it brings up a fundamental question - why isn't the kernel used by WSL2 suspended when the system is and resumed when it's resumed?

The Linux kernel has a robust suspend architecture, including restoring/refreshing the clock after suspend. I would think even for a VM it makes perfect sense to introduce a notification chain in Windows to suspend the VM.

jeffska commented 1 year ago

Is this related to the multiple hibernate/sleep resulting in eventual WSL CPU max and lockup issues? (#8696 for example)

If not, can we get a megathread to track those as well?

mungojam commented 1 year ago

Is this related to the multiple hibernate/sleep resulting in eventual WSL CPU max and lockup issues? (#8696 for example)

If not, can we get a megathread to track those as well?

Installing the systemd based time service seems to have reduced the number of CPU spinning hangs, but not eliminated it for me

clshortfuse commented 1 year ago

I'm off by ~9 hours:

sudo hwclock -r --verbose
hwclock from util-linux 2.37.2
System Time: 1688025159.042482
Trying to open: /dev/rtc0
Using the rtc interface to the clock.
Assuming hardware clock is kept in UTC time.
Waiting for clock tick...
...got clock tick
Time read from Hardware Clock: 2023/06/29 07:52:20
Hw clock time : 2023/06/29 07:52:20 = 1688025140 seconds since 1969
Time since last adjustment is 1688025140 seconds
Calculated Hardware Clock drift is 0.000000 seconds
2023-06-29 03:52:18.904845-04:00

It's Thu Jun 29 2023 12:16:05 GMT-0400 here. Maybe it was installing Hyper-V network bridging that screwed it up?

Even with chronyd, it's giving me two numbers. One is right (that it's off by 8.3 hours), and the other says -4 seconds? Why does chronyd get two numbers?

 systemctl status chronyd.service
● chrony.service - chrony, an NTP client/server
     Loaded: loaded (/lib/systemd/system/chrony.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2023-06-28 23:13:36 EDT; 4h 41min ago
       Docs: man:chronyd(8)
             man:chronyc(1)
             man:chrony.conf(5)
    Process: 193 ExecStart=/usr/lib/systemd/scripts/chronyd-starter.sh $DAEMON_OPTS (code=exited, status=0/SUCCESS)
   Main PID: 219 (chronyd)
      Tasks: 2 (limit: 9359)
     Memory: 1.9M
     CGroup: /system.slice/chrony.service
             ├─219 /usr/sbin/chronyd -F 1 -x
             └─220 /usr/sbin/chronyd -F 1 -x

Jun 29 02:03:15 ThinkBook-PG3 chronyd[219]: Can't synchronise: no selectable sources
Jun 29 02:07:17 ThinkBook-PG3 chronyd[219]: Selected source 185.125.190.58 (ntp.ubuntu.com)
Jun 29 02:07:17 ThinkBook-PG3 chronyd[219]: System clock wrong by 30128.437928 seconds
Jun 29 02:08:22 ThinkBook-PG3 chronyd[219]: System clock wrong by -4.986587 seconds
Jun 29 02:14:47 ThinkBook-PG3 chronyd[219]: Source 38.17.55.196 replaced with 104.167.241.253 (1.ubuntu.pool.ntp.org)
Jun 29 02:32:03 ThinkBook-PG3 chronyd[219]: Selected source 72.30.35.88 (2.ubuntu.pool.ntp.org)
Jun 29 02:51:42 ThinkBook-PG3 chronyd[219]: Source 108.61.23.93 replaced with 152.70.159.102 (2.ubuntu.pool.ntp.org)
Jun 29 03:36:51 ThinkBook-PG3 chronyd[219]: Source 104.167.241.253 replaced with 198.137.202.56 (1.ubuntu.pool.ntp.org)
Jun 29 03:50:44 ThinkBook-PG3 chronyd[219]: Backward time jump detected!
Jun 29 03:50:44 ThinkBook-PG3 chronyd[219]: Can't synchronise: no selectable sources

brunoprietog commented 1 year ago

This error is unbearable. How can so much time have passed without fixing it? Please, this is critical

dboreham commented 1 year ago

There should be an "Assign to Dave Cutler" button here ;)

jaknz commented 1 year ago

As a workaround for now, I've put this in the system crontab, and it seems to be working: 0 * * * * /sbin/hwclock -s -v

LorisZ commented 1 year ago

@jaknz There are already some workarounds in the first post. The first one seemed to work for me.

I'm sure yours works too, but only if you're okay with being out of sync for up to 59 minutes after sleep (depending on when you wake up the computer).

dboreham commented 1 year ago

Quick note that this is a long running trail of tears and none of the "workarounds" works 100% for everyone. My workaround is to force WSL2 to exit every time before I put the machine to sleep.

patricklangsonos commented 1 year ago

My Lenovo T14 AMD laptop was sleeping over the weekend, so I ran ntpdate today. 31 Jul 10:35:58 ntpdate[402243]: step time server 168.61.215.74 offset +220478.058810 sec

I used to solve this for VMs using the Hyper-V Time Synchronization Service, so I looked into what's enabled in WSL.

It looks like the component is enabled in the WSL kernel since I see this in dmesg:

[118959.971069] hv_utils: TimeSync IC version 4.0

However, I'm not sure where to look for its logs. Is there a user mode daemon missing?

I tried polling that via hwclock which seems to work. This was after I already ran ntpdate so there was no drift. I'll try it again after I wake the laptop from sleep tomorrow morning.

hwclock from util-linux 2.37.2
System Time: 1690825759.570287
Trying to open: /dev/rtc0
Using the rtc interface to the clock.
Assuming hardware clock is kept in UTC time.
Waiting for clock tick...
...got clock tick
Time read from Hardware Clock: 2023/07/31 17:49:20
Hw clock time : 2023/07/31 17:49:20 = 1690825760 seconds since 1969
Time since last adjustment is 1690825760 seconds
Calculated Hardware Clock drift is 0.000000 seconds
2023-07-31 10:49:18.990314-07:00

2nd try today after lunch It definitely seems that the hardware clock is not working as expected after resume from sleep. I tried looking at hwclock first, and it was out of sync. I needed to use a network time server to get back in sync.

$ sudo hwclock -r --verbose
hwclock from util-linux 2.37.2
System Time: 1690834444.974326
Trying to open: /dev/rtc0
Using the rtc interface to the clock.
Assuming hardware clock is kept in UTC time.
Waiting for clock tick...
...got clock tick
Time read from Hardware Clock: 2023/07/31 21:12:01
Hw clock time : 2023/07/31 21:12:01 = 1690837921 seconds since 1969
Time since last adjustment is 1690837921 seconds
Calculated Hardware Clock drift is 0.000000 seconds
2023-07-31 14:11:59.986886-07:00

$ sudo hwclock -a
Needed adjustment is less than one second, so not setting clock.

$ sudo ntpdate time.windows.com
31 Jul 14:13:21 ntpdate[433461]: step time server 168.61.215.74 offset +3474.407840 sec

tsteven4 commented 1 year ago

The difference between hyper-v and wsl may be explained by the difference in the systems. On wsl systemd-detect-virt detects a container and hardware virtualization:

$ systemd-detect-virt -c
wsl
$ systemd-detect-virt -v
microsoft

On hyper-v only hardware virtualization is detected.

$ systemd-detect-virt -c
none
$ systemd-detect-virt -v
microsoft

/usr/lib/systemd/system/systemd-timesyncd.service contains the line

ConditionVirtualization=!container

One can force timesyncd to run in a container as outlined long ago by creating an /etc/system/systemd-timesyncd.service.d/override.conf file containing

[Unit]
ConditionVirtualization=

https://github.com/microsoft/WSL/issues/8204#issuecomment-1339506778

sisrfeng commented 1 year ago

About ntpdate

ntpdate: client for setting system time from NTP servers (deprecated)

man sntp:

        sntp -S ntpserver.somewhere
            With suitable privilege,
            run as a command or  from a cron(8) job,
            sntp -S ntpserver.somewhere will
            set (step)  the local clock from a synchronized specified server,
            like the (deprecated)  ntpdate(8),  or rdate(8)  commands.

wiki: SNTP is fully interoperable with NTP since it does not define a new protocol. However, the simple algorithms provide times of reduced accuracy and thus it is inadvisable to sync time from an SNTP source

(For Chinese user/中国老哥) I find ntp.ntsc.ac.cn in https://dns.iui.im/ntp/

So, I put this in my /etc/wsl.conf

[boot]
command="sntp -S ntp.ntsc.ac.cn"

And you need to apt install sntp

shivshanks commented 1 year ago

What nobody seems to be mentioning is that this seems related to Windows 11. It does not happen on any Windows 10 systems I have. Given that I would think it's possibly a Windows bug not WSL.

dboreham commented 1 year ago

Given that I would think it's possibly a Windows bug not WSL.

Yes it's an NT Hypervisor bug probably.

Unfortunately the folks responsible for that code don't seem to know or care that it is broken.

shivshanks commented 1 year ago

Given that I would think it's possibly a Windows bug not WSL.

Yes it's an NT Hypervisor bug probably.

Unfortunately the folks responsible for that code don't seem to know or care that it is broken.

It would be great if someone from the WSL team could take ownership of this issue and raise it with the Windows/Hyper-V team.

dhensen commented 1 year ago

After 8 years exclusively on Linux switched to Windows using WSL2 to come to this issue. Sucks, but I have no choice. Please fix this. There is this post about Steve Jobs saying: make the boot time faster, 10 seconds times 5 million users == lives saved. I'm sure this issue 1 year not-fixed-yet millions of developers == many dead people.

Just fix this already

zmajeed commented 1 year ago

This is still happening on latest Windows 11 and WSL - it's a pretty huge annoyance since it affects installs and builds that need accurate timestamps - needing to possibly restart WSL and rerun automated builds is a big pain - especially because it's usually not apparent that bad system time caused the failures

Some comments on the workarounds listed in the issue description above

Regarding the first workaround that links to https://github.com/microsoft/WSL/issues/8204#issuecomment-1338334154 - systemd-timesyncd is already installed and running on my Ubuntu-22.04 instance - the time drift occurred nonetheless - is the idea to use timedatectl to manually sync the time or to restart systemd-timesyncd?

I tried the hwclock workaround a while back and only remember it being problematic and causing other issues

The third workaround of ntpdate is not practical since ntpdate is deprecated - https://ubuntu.com/server/docs/network-ntp#:~:text=ntpdate%20is%20now%20considered%20deprecated,help%20with%20more%20complex%20cases - and relying on deprecated programs is only going to lead to other issues that I'll have to troubleshoot at some point

I also wonder if there's any connection to the vmmem high cpu usage - https://github.com/microsoft/WSL/issues/6982 - that also seems to happen after sleep

levrik commented 1 year ago

@zmajeed

is the idea to use timedatectl to manually sync the time or to restart systemd-timesyncd?

That's weird. This workaround works perfect for me. Not a single second of time drift in comparison to host system for a few months.

zmajeed commented 1 year ago

Yeah - hadn't seen it in months myself - but like most workarounds there's no guarantee they'll continue to work if things change - in this case none of the suggestions actually address the root cause which is still unknown - so nobody can say if any of these suggestions will work for everyone all the time

maxb commented 1 year ago

I found I needed to combine multiple workarounds to get satisfactory behaviour:

Set up systemd-timesyncd
AND use Windows Task Scheduler to restart systemd-timesyncd upon a Windows Event Log System Kernel-Power 507 event (wsl.exe -d Ubuntu -u root --cd / systemctl restart systemd-timesyncd)

This works, but it is hard to recommend WSL when this much compensating for problems is needed.

lackovic commented 1 year ago

is the idea to use timedatectl to manually sync the time or to restart systemd-timesyncd?

That's weird. This workaround works perfect for me. Not a single second of time drift in comparison to host system for a few months.

@levrik which of the two workarounds you quoted works for you?

I have been using:

$ sudo ntpdate ee.pool.ntp.org
14 Sep 13:18:41 ntpdate[725]: adjust time server 212.7.1.131 offset -0.134454 sec

which shows the offset and adjust the time.

Since ntp is deprecated (source) I have been trying to use timedatectl instead but I am bit confused on how exactly to use it. If I simply run timedatectl I can see some times information but doesn't say whether there was an offset, nor how much it was, nor if it actually adjusted the time to the correct one:

$ timedatectl
               Local time: Thu 2023-09-14 13:18:45 EEST
           Universal time: Thu 2023-09-14 10:18:45 UTC
                 RTC time: Thu 2023-09-14 10:18:45
                Time zone: Europe/Tallinn (EEST, +0300)
System clock synchronized: yes
              NTP service: inactive
          RTC in local TZ: no

Can anyone clarify on this?

levrik commented 1 year ago

@lackovic I'm using timedatectl and I simply checked by comparing the output of date with Windows' local time displayed in the taskbar. I didn't restart for a few weeks and there would have been a shift by more than a few seconds by now already but there's not so far.

johnlaur commented 1 year ago

At this point its starting to feel that microsoft is purposely ignoring these bugs to keep their own project from being as useful as it could be. I have wrestled with this issue since early 2019. 2019. That's getting close to 5 years, folks.

HyperV is supposed to be a top shelf product competing with VMWare/ESXi. How in the world is anyone able to use this system in production if it cannot keep time between the host and the VM? Meanwhile the latest release from vmware is capable of keeping the VM wallclock in check with PTP-level accuracy (~10s of ns). What gives? I am a big vmware customer, and this issue alone is enough to keep me from ever touching HyperV for a production workload. HyperV is a product I now view with immense skepticism and suspicion.

It would probably at this point be a worthwhile experiment for someone to try and make WSL-compatible tooling for a different virtualization framework.

ghost commented 1 year ago

What is often misattributed as malice, is rather ... (complete the expression)

If I were on the other side of the fence I'm sure I'd be feeling the same frustration you all do. Unfortunately there are few of us, and many of the issues worked on may be unobvious to the outside world. No promises, but this is something we're looking at now.

dhensen commented 1 year ago

I've been using WSL for years now on work laptops without this issue. Now on my own laptop, I had to switch back from Linux to Windows, because of bad Teams support i.c.w. firmware/driver related issues.. that's topic for another day.

The only difference I could find was:

Using W11 instead of W10
Using a new AMD machine instead of Intel
Using WSL-app via the Microsoft Store instead of via the enabled windows features

How can only some people suffer from this, what is the common denominator here?

microsoft / WSL