r-lib / filelock

Cross platform file locking in R
https://r-lib.github.io/filelock/
Other
43 stars 7 forks source link

Interrupts on Windows #29

Open gaborcsardi opened 3 years ago

gaborcsardi commented 3 years ago

From Florian Balmer

Hello Gábor

Please apologize for contacting you directly via e-mail, but I don't have a GitHub account to submit an issue, and your address was immediately revealed by Googling.

While researching about how to acquire file locks, and abort after a specified timeout on already-locked files, I've come across your [r-lib/filelock] repository on GitHub.

First of all, thank you very much for sharing your code!

In my supplementary resources [0-4] besides MSDN [5-8], I found the following pattern:

OVERLAPPED structure allocated on local function stack Issue asynchronous request ... if (ERROR_IO_PENDING) { Wait for IO completion, or timeout elapse if (timeout elapsed) { CancelIo() GetOverlappedResult() --or-- WaitForSingleObject(OVERLAPPED.hEvent) ...

The emphasize here is that it's necessary to wait for the canceled IO operation to complete, before returning from the function, or the completion routine may access the implicitly free'd stack-allocated OVERLAPPED structure, and cause potential havoc.

The documentation for LockFileEx() gives the hint to call GetOverlappedResult() in case of ERROR_IO_PENDING. However, the documentation for GetOverlappedResult() does not mention this case explicitly, just Read/WriteFile(), DeviceIoControl(), and a few others. And I'm not sure if file locking goes down the DeviceIoControl() route, or is implemented above/besides the file system layer, so the documentation leaves some ambiguity, if I get everything correct.

Anyway, I tend to believe it's a good idea to wait for IO completion after calling CancelIo(), and to be overly correct, maybe only after CancelIo() returned a nonzero value (but neither of the code samples from the supplementary resources care about the CancelIo() return value).

This is just meant to be an input, feel free to do whatever you like with this feedback. I don't even expect you to reply to this message (I can't check my e-mails daily, anyway, at the moment). But if you have the knowledge, or know some expert to confirm that calling GetOverlappedResult() is wrong after canceling an asynchronous lock request, I'd appreciate a short note.

Best wishes --Florian

[0] You can use a file as a synchronization object, too | The Old New Thing https://devblogs.microsoft.com/oldnewthing/20140905-00/?p=63

[1] Ready... cancel... wait for it! (part 1) | The Old New Thing https://devblogs.microsoft.com/oldnewthing/20110202-00/?p=11613

[2] Ready... cancel... wait for it! (part 2) | The Old New Thing https://devblogs.microsoft.com/oldnewthing/20110203-00/?p=11603

[3] Ready... cancel... wait for it! (part 3) | The Old New Thing https://devblogs.microsoft.com/oldnewthing/20110204-00/?p=11583

[4] Asynchronous I/O | Newcomer J. M. http://flounder.com/asynchexplorer.htm

[5] LockFileEx function https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-lockfileex

[6] CancelIo function | MSDN https://docs.microsoft.com/en-us/windows/win32/fileio/cancelio

[7] GetOverlappedResult function | MSDN https://docs.microsoft.com/en-us/windows/win32/api/ioapiset/nf-ioapiset-getoverlappedresult

[8] DeviceIoControl function https://docs.microsoft.com/en-us/windows/win32/api/ioapiset/nf-ioapiset-deviceiocontrol

gaborcsardi commented 3 years ago

Thanks!

Why not create a GitHub account, and add this as an issue? You did a great investigation, it is kind of a waste that very few people know about this.

As for the issue, you are right, and freeing the cancelled handles can lead to undefined behavior. I have never actually seen that with filelock, because the handles are only freed somewhat later, but have seen the same with processx, for cancelled reads on pipes. I did fix this in processx: https://github.com/r-lib/processx/commit/a8f09d147fead78347a87fcf4e0fbd1c07de1c21 but forgot that I had the same issue in filelock.

Waiting for the cancellation is not great, btw. Whenever you try to synchronously wait in async code, that's a code smell. What you can do is to add the handle(s) to a free-list (thing to be freed later), and then check on the handle later, or use IOCPs.

Anyway, if you don't want to register to GitHub, then can I add your email to an issue in r-lib/filelock?

Thanks again, Gabor

gaborcsardi commented 3 years ago

Thanks!

Why not create a GitHub account, and add this as an issue? You did a great investigation, it is kind of a waste that very few people know about this.

Anyway, if you don't want to register to GitHub, then can I add your email to an issue in r-lib/filelock?

Thanks for your reply! I'm sure I will create a GitHub account, sooner or later, but I feel like I don't need one, right now. I'm managing my own code with Fossil, on my own web server.

If it simplifies management and discussion of issues for you and your contributors, please feel free to feed excerpts from my e-mails to the GitHub issue tracker. I don't mind if you add my e-mail address, but maybe with some minimal anti-spambot obfuscation to give me some minimal feeling of privacy ;-)

As for the issue, you are right, and freeing the cancelled handles can lead to undefined behavior. I have never actually seen that with filelock, because the handles are only freed somewhat later, but have seen the same with processx, for cancelled reads on pipes. I did fix this in processx: https://github.com/r-lib/processx/commit/a8f09d147fead78347a87fcf4e0fbd1c07de1c21 but forgot that I had the same issue in filelock.

Another good reason to call GetOverlappedResult(), besides the mentioned potential access-after-free to the OVERLAPPED structure on the local function stack.

Indeed, in case of the race condition, i.e. the locking IO request terminating in parallel to the CancelIo() call, one may also end up with the lock acquired when CancelIo() returns. Querying GetOverlappedResult() should catch this, and return TRUE/ERROR_OPERATION_ABORTED if CancelIo() was quick enough to take effect. If this were not the case, and there were no other errors, the lock could be kept and used normally, instead of discarded implicitly by calling CloseHandle(), even more so because MSDN recommends to unlock files by explicit UnLockFile[Ex]() calls.

Waiting for the cancellation is not great, btw. Whenever you try to synchronously wait in async code, that's a code smell. What you can do is to add the handle(s) to a free-list (thing to be freed later), and then check on the handle later, or use IOCPs.

Cancelled, but nonetheless successfully completed reads (for which CancelIo() lost the race) could be left unprocessed in the IOCP, but wouldn't immediately closing the file handles and the IOCP (without waiting for the cancellation) lead to the same situation? Maybe in the end, closing file handles with pending IO requests is a common scenario that file system drivers must be able to handle gracefully.

Yes, all this doesn't look trivial. With short lock timeouts of several 100 ms, an easier solution for my use case may be to repeat a few sleep-and-try-synchronous-locking cycles, even more so if the goal is to get a synchronous file handle opened without FILE_FLAG_OVERLAPPED.

--Florian