Open llvmbot opened 8 years ago
Do you have benchmarks comparing its performance to the existing implementation?
Actually, no. Right there, that's enough of a reason for me to agree.
Given that IFileOperation is basically the workhorse of file deletion and is what you get every time you press Delete in windows explorer, I have to imagine that Microsoft has tried very hard to optimize it to the best of their ability, and that if it's slower than some other hand-rolled mechanism, there is a good reason for it. Otherwise why wouldn't they use the faster mechanism?
I'm not saying it's impossible they have a poor implementation of IFileOperation, but I think it would be worth doing due diligence and producing some benchmarks to solidify this belief. There's point committing complicated code if simple code will do the job.
Do you have benchmarks comparing its performance to the existing implementation? I have had probelms deleting from explorer too, but the problems I've had have not been of a racy nature. I've never experienced a problem where if I try to delete something N times it fails the first N-1 times and succeeds the N'th time. The problems have always been "this file appears to be locked for no reason and there's nothing I can to delete it".
Given that IFileOperation is basically the workhorse of file deletion and is what you get every time you press Delete in windows explorer, I have to imagine that Microsoft has tried very hard to optimize it to the best of their ability, and that if it's slower than some other hand-rolled mechanism, there is a good reason for it. Otherwise why wouldn't they use the faster mechanism?
I'm not saying it's impossible they have a poor implementation of IFileOperation, but I think it would be worth doing due diligence and producing some benchmarks to solidify this belief. There's point committing complicated code if simple code will do the job.
Maybe I'm missing something, but how about just using the Windows function whose entire purpose is to recursively remove a directory and all the files in it?
SHFILEOPSTRUCT FileOp = {0}; FileOp.hwnd = nullptr; FileOp.wFunc = FO_DELETE; FileOp.pFrom =
; FileOp.pTo = nullptr; FileOp.fFlags = FOF_NO_UI; SHFileOperation(&FileOp);
Mainly, It's horrifically slow. I'm also not confident that it's robust just because explorer uses it... I've had issues deleting directory trees in explorer under heavy load in the past.
If we don't care about supporting pre-Vista, then the "recommended" way to do this is through the IFileOperation interface, btw.
We don't care about supporting pre-Vista. We only care about Windows 7 and later (see WindowsSupport.h:30).
If we don't care about supporting pre-Vista, then the "recommended" way to do this is through the IFileOperation interface, btw.
Note this is the same code explorer uses when you select a directory and hit Delete, so it definitely is robust under system load.
Maybe I'm missing something, but how about just using the Windows function whose entire purpose is to recursively remove a directory and all the files in it?
SHFILEOPSTRUCT FileOp = {0};
FileOp.hwnd = nullptr;
FileOp.wFunc = FO_DELETE;
FileOp.pFrom =
Minor comments, the code needs to keep the UTF16 (widenPath) support, see stuff at end of WindowsSupport.h and maybe sys::fs::createTemporaryFile (or the lower level createUniqueEntity) could be reused to create the temporary filename.
(part 2, what I think the patch will do)
Since LLVM doesn't use the NT kernel/native API, we'll need to do it a little bit differently.
The biggest difference is that we will use CryptGenRandom, the Win32 public facing API for generating random bytebuffers - internally it calls RtlGenRandom - and it's already used in LLVM (see ./llvm/lib/Support/Windows/Process.inc function Process::GetRandomNumber).
We will:
Questions remaining:
So here's what I'm thinking about the patch:
It'll use the same basic algorithm that Niall Douglas used in Boost AFIO v1 (see: https://github.com/BoostGSoC13/boost.afio/blob/b35ed5b9783a4dfde9b3d611dfb940739f303c97/include/boost/afio/v2/detail/impl/afio_iocp.ipp#L744), except using the Win32 API instead of the NT kernel/native API.
(separating into two comments so that my text doesn't evaporate)
AFIO:
So, is this a bug in remove or in the users?
Should we just change ProgramTest.cpp to not use a directory and document the restriction on remove? Anyone that really needs to delete directories then should use rename + remove.
I think I'll try and write a patch to fix this (in llvm::sys::fs::remove). It's better for the function to just do the right thing, because this issue is subtle enough to begin with that it's unlikely someone will get it right in their own implementation.
If I understand correctly, the problem happens even when erasing one file. When the system is loaded Windows may defer the actual file deletion. This may lead to all sorts of problem beside erasing the containing directory since remove() user expects the file be gone upon return.
Just below Windows\Path.inc remove(), rename() waits in a loop (with sleep of 1ms):
// Retry while we see recoverable errors. // System scanners (eg. indexer) might open the source file when it is written // and closed.
bool TryReplace = true;
for (int i = 0; i < 2000; i++) { if (i > 0) ::Sleep(1);
if (TryReplace) {
...
so remove() may do the same, waiting for the file/path to be actually deleted before return. Or it can do the rename/delete combo which makes more sense to avoid the wait loop.
So, is this a bug in remove or in the users?
Should we just change ProgramTest.cpp to not use a directory and document the restriction on remove? Anyone that really needs to delete directories then should use rename + remove.
The test had similar problem to what you describe: a file deleted by rm still existed when the next command is run.
At first some workarounds were tried to avoid removing the file and immediately using it but it still often failed and was disabled on Windows:
http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20150608/130450.html
The rm command may have the same problem you found in llvm::sys::fs:remove and the other programs mentioned.
Maybe bash rm on Windows has the same problem?
http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20150608/130437.html
Heh, I wouldn't be surprised. I'm a bit rusty on my shell scripting - what do the changed lines in that commit do?
Maybe bash rm on Windows has the same problem?
http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20150608/130437.html
Heh, I wouldn't be surprised. I'm a bit rusty on my shell scripting - what do the changed lines in that commit do?
Maybe bash rm on Windows has the same problem?
http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20150608/130437.html
Note that the function with the bug is in ./llvm/lib/Support/Windows/Path.inc, NOT in ./llvm/lib/Support/Unix/Path.inc
Sidenote: maybe "System Libraries" is the proper component?
Extended Description
Under heavy IO load, removing directories on Windows with llvm::sys::fs:remove will fail unpredictably.
I've just observed the TestWriteWithSystemEncoding test in ./llvm/unittests/Support/ProgramTest.cpp failing because of this. That test creates a kinda-unique directory named "program-test-%%%%%%" (where % is a pseudorandom alphanumeric), creates a file named "international-file.txt" in it, and roundtrip-tests some text (writes some text with one encoding, reads it with another, checks if it's not garbled). After that, it deletes the file (llvm::sys::fs::remove(file_pathname.str())) and then the directory (llvm::sys::fs::remove(TestDirectory.str())). Essentially, when Windows "Deletes" a file, it doesn't actually Delete the file; when the system is under load, "international-file.txt" may STILL be in the directory by the time of llvm::sys::fs::remove(TestDirectory.str()).
The details of WHY this will fail are very counter-intuitive, and you should see "CppCon 2015: Niall Douglas “Racing The File System" (https://youtu.be/uhRWMGBjlO8?t=7m35s) for specifically why the code is wrong. Lots of programs get this wrong, Douglas mentions CPython, and I've seen it in Chromium.
Sidenote: The documentation for RemoveDirectory (https://msdn.microsoft.com/en-us/library/windows/desktop/aa365488.aspx) is kinda crappy, because it says "RemoveDirectory removes a directory junction, even if the contents of the target are not empty; the function removes directory junctions regardless of the state of the target object.", but "Creating and Deleting Directories" (https://msdn.microsoft.com/en-us/library/windows/desktop/aa363872.aspx) says: "To delete an existing directory, use the RemoveDirectory or RemoveDirectoryTransacted function. Before removing a directory, you must ensure that the directory is empty and that you have the delete access privilege for the directory. To do the latter, call the GetSecurityInfo function."