hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.91k stars 1.95k forks source link

Snapshot failing on Windows #11796

Open kopetersen opened 2 years ago

kopetersen commented 2 years ago

Nomad version

Output from nomad version

Operating system and Environment details

OS Name: Microsoft Windows Server 2019 Standard (x64) OS Version: 1809 Build 17763.2366

Nomad Version Nomad v1.2.3

Issue

When trying to do a snapshot on Windows it fails with: PS C:\> nomad.exe operator snapshot save backup.nomad.bak Filed to finalize snapshot file: rename backup.nomad.bak.tmp backup.nomad.bak: The process cannot access the file because it is being used by another process.

The log shows: 2022-01-07T10:25:05.816+0100 [DEBUG] nomad: memberlist: Stream connection from=127.0.0.1:59692 2022-01-07T10:25:05.842+0100 [INFO] nomad.raft: starting snapshot up to: index=34932 2022-01-07T10:25:05.842+0100 [INFO] snapshot: creating new snapshot: path=C:\Nomad\Data\server\raft\snapshots\6-34932-1641547505842.tmp 2022-01-07T10:25:05.880+0100 [WARN] snapshot: found temporary snapshot: name=4-24581-1640983949398.tmp 2022-01-07T10:25:05.880+0100 [INFO] snapshot: reaping snapshot: path=C:\Nomad\Data\server\raft\snapshots\6-34928-1641547315471 2022-01-07T10:25:05.882+0100 [INFO] nomad.raft: no logs to truncate 2022-01-07T10:25:05.883+0100 [INFO] nomad.raft: snapshot complete up to: index=34932 2022-01-07T10:25:05.908+0100 [DEBUG] http: request complete: method=GET path=/v1/operator/snapshot duration=66.1992ms

Reproduction steps

Run nomad.exe operator snapshot save backup.nomad.bak

Expected Result

To have a working snapshot

Actual Result

The command fails, but still manages to create a file called backup.nomad.bak.tmp

I have tried disabling Windows Defender to see if that was locking the file, but that wasn´t the case. It fails on all the Nomad servers and clients we have (3 servers, 2 clients). We are still in the very early adoption phase of Nomad.

We have a similar problem with Consul although a different error is thrown: consul.exe snapshot save backup.consul.bak Error writing unverified snapshot file: sync .: The handle is invalid.

brian-lamb-software-engineer commented 1 year ago

Any progress on this issue? Its creating the .tmp file for me, and then I get the above mentioned error.

Any chance I can use the created .tmp file for a recovery if needed? Or is that .tmp file missing info?

ghost commented 9 months ago

We have a similar problem with Consul although a different error is thrown: consul.exe snapshot save backup.consul.bak Error writing unverified snapshot file: sync .: The handle is invalid.

The snapshot issue for Windows OS is fixed in later versions of Consul (1.14.9, 1.16.1, and 1.15.5 and above).

snapshot: fix access denied and handle is invalid when we call snapshot save on windows - skip sync() for folders in windows in rboyer/safeio#3 [GH-18302]

https://github.com/hashicorp/consul/blob/main/CHANGELOG.md#1149-august-8-2023

You could use any of the fixed binaries and use it to run the consul snapshot CLI command in parallel on the existing Consul installation. note Do not replace the binary as you would in an upgrade.