Open rr-rosendahl opened 5 years ago
ping @jhowardmsft
I edited the description to include a note that copying and deleting files works without problems (only rename/move fails). I also added some background that we were facing that issue when running a database instance on the mounted volume.
By the way - the same issue exists with Windows Server 2016 Standard / Windows Docker Container Processisolation (without Hyper-V).
Maybe related? https://github.com/moby/moby/issues/30512
@exodusd sounds like like one more Windows containers weakness like there is many others.
Can you share some real world use case where this matters?
Yes, i described the real world scenario in the "Additional Information" section. We're mounting a MariaDB database folder into the container. There are circumstances where MariaDB will move/rename InnoDB files on startup, which fails because of the error, resulting in the database engine not starting up. That's actually how we found the problem - we just described the simpler test case with manually renaming a file to make it easier to reproduce. ;-)
@exodusd I highly recommended you to use official MariaDB image top of Linux version of Docker and only use Windows containers for workloads which are not able run on Linux.
How ever if you have good reason to use Windows containers I highly recommended that you open ticket to Microsoft Premier Support about this because afaik it is caused by weakness/bug on Windows it selves instead of Docker.
Well, we're actually fixed on windows for that one. But we've done some research and found that if the change the container isolation level from "process" to "hyperv", then the error doesn't occur. It's not perfect, but at least it's not broken. ;-)
@exodusd
I just tried this out use docker binaries based on moby/moby master (can download from master.dockerproject.org, but I built locally) and was unable to repro. Using either Hyper-V or Process isolation.
Running on RS5/Server 2019/Build 17763. (I was on patch release 253, but noticed you were on a slightly older 134, so there may have been some servicing fixes you don't have yet)
I downloaded mariadb-10.3.12.winx64.zip from https://downloads.mariadb.org/mariadb/10.3.12/ and expanded it to c:\mariadb on the host.
docker run -it --rm [--isolation=hyperv] -v c:\mariadb:c:\mariadb microsoft/windowsservercore cmd
(In container) cd \mariadb ren README.md README.md.bak (and then back again)
It's possible what you're seeing might be a combination of a docker bug which has since been fixed, or a servicing fix in Server 2019 which wasn't present in patch release 134.
Could you try both updating your machine to the latest servicing patches, and try the binaries from moby master to verify if this still seems to be an issue.
Thanks
EDIT: I also tried this on RS1/Windows Server 2016 and was similarly unable to repro.
@jhowardmsft Thanks a lot for trying to reproduce the issue. We just tried again by just updating Microsoft Windows Server 2019 to the latest version and we also couldn't reproduce the issue. It seems that Microsoft has fixed it in one of the recent updates!
Thanks again for taking the time to check! We probably wouldn't have tried process isolation again, but now we can use that and avoid the overhead of hyperv isolation. :-)
Hi! Unfortunately the problem is back again... We're now running Windows Server Core 2019 LTS (version 10.0.17763.379) in process isolation mode. When renaming a file in a subfolder of a mounted directory, the system tells us that the path could not be found (see the attached screenshot).
We were finally able to narrow down the problem. It only seems to occur when File System Resource Management (dirquota) has been activated. Once a quota is activated, rename/move in subdirectories of mounted directories fails with "The system cannot find the path specified." The problem persists even after removing the quota and restarting the container, the File System Resource Management service or the docker daemon. Only after a reboot of the whole operating system, the problem doesn't occur. Once a quota is added, the renames/moves fail again.
This is the setup that we used to reproduce the issue:
We took video of the steps to reproduce the issue: https://www.youtube.com/watch?v=aY2vnzC76_M
@jhowardmsft could you try to reproduce the issue in process isolation with activated File System Resource Management and a dir quota?
cc @exodusd, @jhowardmsft
I can reproduce the issue on
I had the issue originally with Docker CE 19.03.1. I uninstalled and cleaned Docker then installed Docker EE manually hoping this would resolve the issue, but it didn't.
I don't have File System Resource Management or Quotas configured on my machine.
Start a new container
docker run -it --isolation process -v C:\data mcr.microsoft.com/windows/servercore:1903 powershell
Then run these statements inside the container
$ErrorActionPreference = "Stop"
cd C:\data
New-Item test.txt
while ($true) {
Write-Host (Get-Date -DisplayHint Time)
Move-Item test.txt test2.txt
Start-Sleep -Seconds 1
Move-Item test2.txt test.txt
Start-Sleep -Seconds 1
}
In another prompt, start a new container and exit
docker run --rm -it mcr.microsoft.com/windows/servercore:1903 powershell
PS C:\> exit
As soon as the second container exits, Move-Item will fail on the first container with this error:
Move-Item : Could not find a part of the path.
At line:3 char:5
+ Move-Item test.txt test2.txt
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : WriteError: (C:\data\test.txt:FileInfo) [Move-Item], DirectoryNotFoundException
+ FullyQualifiedErrorId : MoveFileInfoItemIOError,Microsoft.PowerShell.Commands.MoveItemCommand
I hit this issue in my service where we use File.Replace to update a dynamic config file stored in a volume.
I can't reproduce the issue with hyperv isolation
I wasn't able to reproduce on other machines. I might end up reimaging my workstation to fix this issue. I can provide additional logs if needed
/cc @ddebroy @jterry75
I ended up re-imaging my workstation with a fresh install of Windows 10 1903, it solved the issue for a while but now it is back with the same repro steps as my post above.
I wonder if its related with the size of the windowsfilter folder.
I will gladly provide more logs if required.
Client: Docker Engine - Enterprise
Version: 19.03.2
API version: 1.40
Go version: go1.12.8
Git commit: c92ab06ed9
Built: 09/03/2019 16:38:11
OS/Arch: windows/amd64
Experimental: false
Server: Docker Engine - Enterprise
Engine:
Version: 19.03.2
API version: 1.40 (minimum version 1.24)
Go version: go1.12.8
Git commit: c92ab06ed9
Built: 09/03/2019 16:35:47
OS/Arch: windows/amd64
Experimental: false
Debug Mode: false
Server:
Containers: 3
Running: 3
Paused: 0
Stopped: 0
Images: 29
Server Version: 19.03.2
Storage Driver: windowsfilter
Windows:
Logging Driver: json-file
Plugins:
Volume: local
Network: ics l2bridge l2tunnel nat null overlay transparent
Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
Swarm: inactive
Default Isolation: process
Kernel Version: 10.0 18362 (18362.1.amd64fre.19h1_release.190318-1202)
Operating System: Windows 10 Enterprise Version 1903 (OS Build 18362.356)
OSType: windows
Architecture: x86_64
CPUs: 8
Total Memory: 31.89GiB
Name: *******
ID: **************
Docker Root Dir: C:\ProgramData\docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
This problem also seems to happen on Windows 10 1909 (version 18362.752, i.e. latest at the time of writing), either with Docker EE 19.03.5 or Docker for Desktop 19.03.8. As well as on server 1909 with EE 19.03.5.
My repro is even simpler, also with two successive containers:
docker run -it --rm --user ContainerAdministrator --isolation process -v test:c:\TEMP mcr.microsoft.com/powershell:nanoserver-1909 pwsh -Command 'mkdir /TEMP/SUBDIR; echo a > /TEMP/SUBDIR/a'
docker run -it --rm --user ContainerAdministrator --isolation process -v test:c:\TEMP mcr.microsoft.com/powershell:nanoserver-1909 pwsh -Command 'mv /TEMP/SUBDIR/a /TEMP/SUBDIR/b'
> Move-Item: Could not find a part of the path.
Please note that reproduction only works in a subdirectory of the volume - on the mount point (i.e. C:\TEMP
in my case) everything works.
Creating a new container allows to always reproduce the issue, however it also happens without this in the initial container after some activity (hard to pinpoint exactly: I've also found the issue after a database failed to rotate its logs, inside a brand new container and volume).
As for the other posts on this thread, it only happens in process isolation mode, but switching to Hyper-V mode is not a possibility in our case. Also, no issues with removing the files, so this really seems to be the same issue.
No FSRM or other quotas on my machines.
Finally, when running a procmon trace... everything works dandily! This is quite a bummer, and a likely indication that there is a timing issue somewhere, as this is a common cause of "the debugger fixed it".
@thaJeztah is this issue still being worked on? I would also be glad to provide any required information if needed.
@marcanpilami I don't think someone is working on this currently.
/cc @kevpar @taylorb-microsoft
I am also still reproducing this issue on multiple machines. It could be related to the size of windowsfilter. I don`t have the issue on a freshly installed computer, than I started having the issue again after pulling/building large containers
I happened to be listening to this episode of the Kubernetes Podcast which talks about Windows Containers: https://kubernetespodcast.com/episode/070-windows-server-containers/
They discussed the problem with GUID for user accounts being unique between instances of Windows. Is it possible that the ACLs for when the volume was created don't match the ACLs of the new container instance?
Does anyone know how to check this information from the CLI?
@rcwoolley : as far as I know, ID stability is only an issue when creating new users inside an image or container. Windows has the concept of "well known user ids", and there are two users in this category that were created for docker: ContainerUser & ContainerAdministrator. My reproduction uses ContainerAdministrator (which in turn is a member of the well known group Administators).
I do not think this could be related - after all, we can delete the files for exemple, and the issue also appears by itself after a while in the same container. But thanks for trying to make this issue progress!
We are now having this issue on one of our machines as well. It used to work until a few days ago.
Docker 2.2.3.0 Windows 10 1909 18363.720
Some observations:
If I can provide you with any information to narrow this down, please let me know.
We ran into exactly the same file move failure on mounted volume when using Docker EE 19.03.11 on Windows server 2019 1809 17763.1397, which is blocking our log rolling inside containers.
@thaJeztah do you know who is the right owner of this issue? Thanks
@hongyu69 from the discussion above, looks like it may be a platform issue (something in Windows itself); perhaps @kevpar or @taylorb-microsoft knows more.
Same Problem here.
System:
Repro: On host:
> docker run -it --isolation=process -v C:\data:C:\test mcr.microsoft.com/windows/servercore:ltsc2022 cmd
On container:
C:\test>mkdir foo
C:\test>mkdir bar
C:\test>move foo bar
The system cannot find the path specified.
C:\test>dir
Volume in drive C has no label.
Volume Serial Number is xxxxxxx
Directory of C:\test
01/19/2023 10:11 AM <DIR> .
01/19/2023 10:11 AM <DIR> bar
01/19/2023 10:11 AM <DIR> foo
0 File(s) 0 bytes
3 Dir(s) 783,761,960,960 bytes free
C:\test>
When using --isolation=hyperv
the problem does not occur. When using a volume on drive D: the problem does not occur.
No other containers are running. Used a fresh pull of the image.
Is there a workaround for this issue?
Don't think so; this looks like an issue in Windows itself, and not the code in this repository. From issues linked to this ticket (in Microsoft's issue tracker), it look like they had problems reproducing the issue in the past; e.g. see this ticket (but there's other tickets linked to that one as well in their issue tracker);
Is there a workaround for this issue?
Well, kind of. We moved the host directory to a separate partition.
docker run -v C:\data:D:\test ...
should work.
Description
Renaming or moving files within mounted volumes fails with a "path not found" error. This happens when renaming the file on the command line: ren README.md README.md.bak As well as with powershell Move-Item (see the attached screenshot): Move-Item .\README.md .\README.md.bak Moving files in compiled code with the MoveFile function of the Windows API also results in an error (microsoft error code 3 - file not found).
So it looks like renaming/moving files seems to fail at the operating system level. Copying and deleting files works without problems.
Steps to reproduce the issue:
Describe the results you received:
Moving / renaming the file resulted in an error message: Could not find a part of the path.
Describe the results you expected:
Successful renamed / moved file.
Additional information you deem important (e.g. issue happens only occasionally):
We stumbled upon this issue when we were trying to start a MariaDB database instance (with InnoDB engine) on the mounted volume. Under certain circumstances, the database tries to move files on startup and shutdown and then stops with an error message. We checked the MariaDB source code and noticed that they were using the Microsoft Windows API MoveFile function. We changed that to do a CopyFile (and DeleteFile) instead and that ran without error. That led us to check the rename commands on the shell and noticed that those also failed.
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
Windows Server 2019 Standard / Windows Docker Container Processisolation (without Hyper-V)