Open dstufft opened 8 years ago
Yup, known issue with this kind of lock type using the lock scheme we are using.
See the following for some of this:
That thread is split across 2 months; and it summarizes the problem and why its not 'just as easy' as u might think ;)
If someone wants to develop a solution though, I'm more than willing to review it.
Do note that if the python flock
API provided a little more functionality, we might be able to do some work there to get these cleaned up, because in general it appears quite hard to know which locks to even delete without trying to take over the ownership of those locks (and associated blocking doing that). I believe there is a non-exposed API for flock
usage that does let u 'get the current owner' (that at least could save some time during any cleanup process).
I was able to find a decent solution for this for my use case by using byte range locking with Python's lockf
(which is confusingly fcntl
underneath).
In particular, I have a bunch of install prefixes for different packages in a package manager, and each needs to be write-locked while an install is happening and read-locked while a dependency's install is happening. Lots of installs can happen in parallel.
I use a single lock file, and for each directory, I lock a single byte in the lock file identified by the 63-bit prefix of the SHA-1 of the directory name.
Yes, that is complicated, but it has some advantages:
fcntl
locks, which seems to be reasonably supported across distributed filesystems I care about (NFS v2.6+, Lustre, GPFS).fcntl
locks gives you race conditions anyway).Disadvantages:
1e-19
, and the likelihood of a collision doesn't reach 1% until you have ~430 million processes (at which point you probably have other problems).msvcrt
on Windows, you'd have to make all the locks exclusive, or maybe unix clients could use the read locks and they would appear to be exclusive locks to a windows client. I haven't looked into how different NFS implementations handle this.At least for my use case, the advantages outweigh the disadvantages, and I couldn't come up with a better solution that didn't require me to deploy a distributed lock server or some other coordination service.
Anyway, an interprocess reader-writer lock with byte range support is implemented here:
There is some relevant discussion here. The SHA stuff is outside the lock class, so it probably doesn't need to go in fasteners, unless you like the idea and want some kind of lock_by_name_sha1()
method.
Just wanted to throw a potential solution in the thread. Let me know if you'd have a use for this in fasteners.
@harlowja
I also had something similar @ https://github.com/harlowja/fasteners/pull/10 (though yours might be better)
Ya, nice yours is better :)
@harlowja: thanks! curious if openstack and/or fasteners has a use for this. If so I could try to make an API and submit a PR. Not sure if I can maintain for cloud environments, though, so someone in your project would have to have a need. Thoughts?
Sure, I take PRs :)
My guess is someone would have a need :)
The read and write lock stuff could be especially useful.
Issue is now almost 2 years old and I, a new user, am running into it now.
One key thing I'd like to address: Your documentation specifically says this,
Inter-process locks Single writer using file based locking (these automatically release on process exit, even if release or exit is never called).
However this doesn't seem to be the case, and if I'm understanding this bug correctly, they are the same issue. Specifically, SIGTERM'ing a process (let alone SIGQUIT or SIGKILL) leave the lockfile in the filesystem.
So I think you are confused that release is the same as file deletion; it's not. What is released is the file handle that is owned by the process; this is automatically released (ensured by the operating system). To actually delete the file is actually pretty complicated to get right (due to dual-ownership); but I do agree the wording could be better.
Would it make sense to use a file descriptor rather than a filename with O_TMPFILE, to ensure that the file is destroyed as soon at the file is closed in all processes (or when processes are killed)? O_TMPFILE requires Linux 3.11 and newer, and is not supported by all filesystems, but more and more filesystems support it.
See how tempfile of Python stdlib uses O_TMPFILE:
I just tested on Fedora 29: btrfs, ext4 and tmpfs support O_TMPFILE. (I only tested these filesystems.)
$ cat /etc/fedora-release
Fedora release 29 (Twenty Nine)
$ uname -r
4.18.16-300.fc29.x86_64
$ python3
Python 3.7.1 (default, Nov 5 2018, 14:07:04)
>>> import os
>>> fd=os.open(".", os.O_WRONLY | os.O_TMPFILE) # brtfs
>>> os.close(fd)
>>> fd=os.open("/tmp", os.O_WRONLY | os.O_TMPFILE) # tmpfs
>>> os.close(fd)
>>> fd=os.open("/boot/test", os.O_WRONLY | os.O_TMPFILE) # ext4
>>> os.close(fd)
Problem: who create the FD? How to pass the FD to other processes? UNIX socket with sendmsg()? ...
Oh hi victor, how are u :)
Hi @harlowja what do you think about Victor suggestions? I starting to review your PR #10 and I currently realize some tests with it.
Oh hi victor, how are u :)
Hey! I'm fine. I moved to a new team. I'm now maintaining Python for Red Hat (in Fedora, RHEL and upstream)! But as you can see, I'm helping @4383 who replaced me on Oslo in my previous OpenStack team.
Hi guys, just back from vacation (thanksgiving in the US) so it will take me a little while to catch all back up..
Let's do what victor suggests. I can't see any flaws with it (outside of the mentioned ones).
@harlowja do not hesitate to tell if I can help you by doing something (review, code, whatever), just ping me.
Do you want to try to implement https://github.com/harlowja/fasteners/issues/26#issuecomment-439081366???
@harlowja : I'm a really new comer on fasteners so in the case I implement the victor comment I need to be mentored by you to do not spend a lot of time to understand fasteners architecture, and to understand to integrate that properly with #10 . Do you agree?
@harlowja: Sorry for disappearing. I initially had issues getting a license change through our IP office when I commented, but Spack has since relicensed to Apache-2.0/MIT, so we could contribute our code pretty easily now.
If you want the byte range locking that I mentioned above, that could make sense here. We use it all the time so I believe it is pretty well tested. It doesn't directly solve this problem unless you use it the way we described above (by mapping locks to byte indexes via hashing), so it probably needs another API on top, but our approach is POSIX, so there is that.
@vstinner's idea is more general if the OS and filesystem support it.
Might be easier if you guys want to jump on toolsforhumans.slack.com or IRC?
It'd be nice to look at https://github.com/spack/spack/blob/develop/lib/spack/llnl/util/lock.py again, and see what we can take into this lib.
@harlowja: how do I get an account?
Ah, hmmm, good question, lol
Guess I have to inivite people, hmmm
@harlowja It doesn't look like the spack lock implementation does the cleanup the lockfile either. Am I missing something?
When trying to use the
InterProcessLock
class on OS X it appears that when it releases the lock, it doesn't cleanup after itself. It would be great if it could do this to ensure that we don't leave a bunch of left over files laying around.