InterProcessLock files left behind

dstufft commented 8 years ago

When trying to use the InterProcessLock class on OS X it appears that when it releases the lock, it doesn't cleanup after itself. It would be great if it could do this to ensure that we don't leave a bunch of left over files laying around.

harlowja commented 7 years ago

Yup, known issue with this kind of lock type using the lock scheme we are using.

See the following for some of this:

That thread is split across 2 months; and it summarizes the problem and why its not 'just as easy' as u might think ;)

If someone wants to develop a solution though, I'm more than willing to review it.

harlowja commented 7 years ago

Do note that if the python flock API provided a little more functionality, we might be able to do some work there to get these cleaned up, because in general it appears quite hard to know which locks to even delete without trying to take over the ownership of those locks (and associated blocking doing that). I believe there is a non-exposed API for flock usage that does let u 'get the current owner' (that at least could save some time during any cleanup process).

tgamblin commented 7 years ago

I was able to find a decent solution for this for my use case by using byte range locking with Python's lockf (which is confusingly fcntl underneath).

In particular, I have a bunch of install prefixes for different packages in a package manager, and each needs to be write-locked while an install is happening and read-locked while a dependency's install is happening. Lots of installs can happen in parallel.

I use a single lock file, and for each directory, I lock a single byte in the lock file identified by the 63-bit prefix of the SHA-1 of the directory name.

Yes, that is complicated, but it has some advantages:

Uses fcntl locks, which seems to be reasonably supported across distributed filesystems I care about (NFS v2.6+, Lustre, GPFS).
Only requires one lock file, whose actual size is zero since you can lock past the end of the file.
No need to clean up the one lock file (cleaning up fcntl locks gives you race conditions anyway).
Interprocess reader-writer locks.

Disadvantages:

This approach is not "perfect", in the sense that it can result in false positives -- if there is a hash collision you may have to wait for an install to finish when you do not need to wait. But the likelihood of a collision with two locking processes and 64-bit offsets (most machines) is 1e-19, and the likelihood of a collision doesn't reach 1% until you have ~430 million processes (at which point you probably have other problems).
I think if you adapted this to also support msvcrt on Windows, you'd have to make all the locks exclusive, or maybe unix clients could use the read locks and they would appear to be exclusive locks to a windows client. I haven't looked into how different NFS implementations handle this.

At least for my use case, the advantages outweigh the disadvantages, and I couldn't come up with a better solution that didn't require me to deploy a distributed lock server or some other coordination service.

Anyway, an interprocess reader-writer lock with byte range support is implemented here:

There is some relevant discussion here. The SHA stuff is outside the lock class, so it probably doesn't need to go in fasteners, unless you like the idea and want some kind of lock_by_name_sha1() method.

Just wanted to throw a potential solution in the thread. Let me know if you'd have a use for this in fasteners.

tgamblin commented 7 years ago

@harlowja

harlowja commented 7 years ago

I also had something similar @ https://github.com/harlowja/fasteners/pull/10 (though yours might be better)

harlowja commented 7 years ago

Ya, nice yours is better :)

tgamblin commented 7 years ago

@harlowja: thanks! curious if openstack and/or fasteners has a use for this. If so I could try to make an API and submit a PR. Not sure if I can maintain for cloud environments, though, so someone in your project would have to have a need. Thoughts?

harlowja commented 7 years ago

Sure, I take PRs :)

My guess is someone would have a need :)

harlowja commented 7 years ago

The read and write lock stuff could be especially useful.

corydodt commented 6 years ago

Issue is now almost 2 years old and I, a new user, am running into it now.

One key thing I'd like to address: Your documentation specifically says this,

Inter-process locks Single writer using file based locking (these automatically release on process exit, even if release or exit is never called).

However this doesn't seem to be the case, and if I'm understanding this bug correctly, they are the same issue. Specifically, SIGTERM'ing a process (let alone SIGQUIT or SIGKILL) leave the lockfile in the filesystem.

Is this in fact the same issue being described in this github issue?
If so, can you update the documentation to remove that wording, and maybe link to this issue as the path to fixing it? In my view, the documentation says something completely the opposite of what's true.

harlowja commented 6 years ago

So I think you are confused that release is the same as file deletion; it's not. What is released is the file handle that is owned by the process; this is automatically released (ensured by the operating system). To actually delete the file is actually pretty complicated to get right (due to dual-ownership); but I do agree the wording could be better.

vstinner commented 5 years ago

Would it make sense to use a file descriptor rather than a filename with O_TMPFILE, to ensure that the file is destroyed as soon at the file is closed in all processes (or when processes are killed)? O_TMPFILE requires Linux 3.11 and newer, and is not supported by all filesystems, but more and more filesystems support it.

See how tempfile of Python stdlib uses O_TMPFILE:

I just tested on Fedora 29: btrfs, ext4 and tmpfs support O_TMPFILE. (I only tested these filesystems.)

$ cat /etc/fedora-release 
Fedora release 29 (Twenty Nine)

$ uname -r
4.18.16-300.fc29.x86_64

$ python3
Python 3.7.1 (default, Nov  5 2018, 14:07:04) 
>>> import os

>>> fd=os.open(".", os.O_WRONLY | os.O_TMPFILE) # brtfs
>>> os.close(fd)

>>> fd=os.open("/tmp", os.O_WRONLY | os.O_TMPFILE) # tmpfs
>>> os.close(fd)

>>> fd=os.open("/boot/test", os.O_WRONLY | os.O_TMPFILE) # ext4
>>> os.close(fd)

Problem: who create the FD? How to pass the FD to other processes? UNIX socket with sendmsg()? ...

harlowja commented 5 years ago

Oh hi victor, how are u :)

4383 commented 5 years ago

Hi @harlowja what do you think about Victor suggestions? I starting to review your PR #10 and I currently realize some tests with it.

vstinner commented 5 years ago

Oh hi victor, how are u :)

Hey! I'm fine. I moved to a new team. I'm now maintaining Python for Red Hat (in Fedora, RHEL and upstream)! But as you can see, I'm helping @4383 who replaced me on Oslo in my previous OpenStack team.

harlowja commented 5 years ago

Hi guys, just back from vacation (thanksgiving in the US) so it will take me a little while to catch all back up..

harlowja commented 5 years ago

Let's do what victor suggests. I can't see any flaws with it (outside of the mentioned ones).

4383 commented 5 years ago

@harlowja do not hesitate to tell if I can help you by doing something (review, code, whatever), just ping me.

harlowja commented 5 years ago

Do you want to try to implement https://github.com/harlowja/fasteners/issues/26#issuecomment-439081366???

4383 commented 5 years ago

@harlowja : I'm a really new comer on fasteners so in the case I implement the victor comment I need to be mentored by you to do not spend a lot of time to understand fasteners architecture, and to understand to integrate that properly with #10 . Do you agree?

tgamblin commented 5 years ago

@harlowja: Sorry for disappearing. I initially had issues getting a license change through our IP office when I commented, but Spack has since relicensed to Apache-2.0/MIT, so we could contribute our code pretty easily now.

If you want the byte range locking that I mentioned above, that could make sense here. We use it all the time so I believe it is pretty well tested. It doesn't directly solve this problem unless you use it the way we described above (by mapping locks to byte indexes via hashing), so it probably needs another API on top, but our approach is POSIX, so there is that.

@vstinner's idea is more general if the OS and filesystem support it.

harlowja commented 5 years ago

Might be easier if you guys want to jump on toolsforhumans.slack.com or IRC?

harlowja commented 5 years ago

It'd be nice to look at https://github.com/spack/spack/blob/develop/lib/spack/llnl/util/lock.py again, and see what we can take into this lib.

tgamblin commented 5 years ago

@harlowja: how do I get an account?

harlowja commented 5 years ago

Ah, hmmm, good question, lol

harlowja commented 5 years ago

Guess I have to inivite people, hmmm

Erotemic commented 5 years ago

@harlowja It doesn't look like the spack lock implementation does the cleanup the lockfile either. Am I missing something?

harlowja / fasteners

InterProcessLock files left behind #26