keiranmraine / pidlock

Simple PID based locking for cronjobs, UNIX scripts or python programs
https://pypi.python.org/pypi/pidlock
MIT License
3 stars 2 forks source link

Doesn't cope if originating process deletes the pid file #4

Open keiranmraine opened 3 years ago

keiranmraine commented 3 years ago

If another process created the pid file and completes it will remove it, but the logic here allows any process that discovers the PID is no longer current to remove the file.

https://github.com/sayanarijit/pidlock/blob/12b45c91e18aa360bb1336c6ec8d6a477e2cb035/pidlock.py#L86-L89

This results in FileNotFoundError: [Errno 2] No such file or directory....

It is fine to remove the file if it's not found when the loop starts, but not if it's been waiting for a job to complete it needs to recheck the file is still there with the same PID before attempting to delete it.

sayanarijit commented 3 years ago

Hey @keiranmraine thanks for reoprting. Could you please provide a reproducible case? Probably a small script / testcase.

keiranmraine commented 3 years ago

Reproducible example:

#!/bin/bash

for i in {1..10}; do
  rm -f $i.pidlock.log
  pidlock -n sleepy_script -w 60 -m 1 -c 'sleep 6' >& $i.pidlock.log&
done
echo 'Wait for max time it "could" take them to complete (60s)'
sleep 60
set -x
grep -F 'No such file or directory' *.pidlock.log

Output:

$ bash pidlock_bug.sh 
Wait for max time it "could" take them to complete (60s)
+ grep -F 'No such file or directory' 10.pidlock.log 1.pidlock.log 2.pidlock.log 3.pidlock.log 4.pidlock.log 5.pidlock.log 6.pidlock.log 7.pidlock.log 8.pidlock.log 9.pidlock.log
10.pidlock.log:pidlock: [Errno 2] No such file or directory: '/.../.pidlock/sleepy_script.pid'
2.pidlock.log:pidlock: [Errno 2] No such file or directory: '/.../.pidlock/sleepy_script.pid'
3.pidlock.log:pidlock: [Errno 2] No such file or directory: '/.../.pidlock/sleepy_script.pid'
5.pidlock.log:pidlock: [Errno 2] No such file or directory: '/.../.pidlock/sleepy_script.pid'
6.pidlock.log:pidlock: [Errno 2] No such file or directory: '/.../.pidlock/sleepy_script.pid'
7.pidlock.log:pidlock: [Errno 2] No such file or directory: '/.../.pidlock/sleepy_script.pid'
8.pidlock.log:pidlock: [Errno 2] No such file or directory: '/.../.pidlock/sleepy_script.pid'
9.pidlock.log:pidlock: [Errno 2] No such file or directory: '/.../.pidlock/sleepy_script.pid'

I hope you don't mind, but there is a major improvement that can be made on how you determine which process gets the first lock.

Creating a file isn't a blocking process. When locking the standard approach is to create a folder for the process (with a file containing the PID).

So instead of:

.pidlock/sleepy_script.pid

You would create:

.pidlock/sleepy_script/pid.txt

You wouldn't test for the sleepy_script directory just create it via os.makedirs:

try:
     os.makedirs(name, mode=0o777, exist_ok=False)
    # create the pid file, do work
except FileExistsError fe:
    # I need to check to see if the pid exits
    # if it *isn't* here the first time I check, I can clean the folder and retry.
    # if the pid is present when I started I need to leave it to be cleaned up by it's own process.

An additional extension would be to include the host name in the pid file, then you can have this working for compute farms too (where only jobs on the same host can cleanup a lagging lock).

sayanarijit commented 3 years ago

Started https://github.com/sayanarijit/pidlock/pull/5. Will continue tomorrow.

sayanarijit commented 3 years ago

Actually, if you are using this in a critical program and would like to have full control over its development, I can transfer the ownership. I used this script many years ago and it more or less served its purpose back in the days.

keiranmraine commented 3 years ago

I understand, I'm happy to do so. It has the benefit of a good name so I'd rather extend it than create something new. I've created a pypi account (keiranmraine) and my github is the same. I found an entry for doing it on pypi "How do I become an owner/maintainer of a project on PyPI?" here. GitHub is normally quite easy.

sayanarijit commented 3 years ago

Sent invitations.

sayanarijit commented 3 years ago

Don't hesitate to ping me for reviews when making changes.

keiranmraine commented 3 years ago

Thanks, shall I leave you as owner on pypi or migrate you to Maintainer?

sayanarijit commented 3 years ago

As you prefer. I don't think I'll be uploading releases.

sayanarijit commented 3 years ago

test

I think something's wrong with the latest release.

keiranmraine commented 3 years ago

Ah, I've missed testing the behaviour for same host, verbose shows the following for the second job started:

Removing old piddir...
keiranmraine commented 3 years ago

Very odd as printing the lock_pid and the list of pids I can clearly see the value, checking not in use against sets:

https://github.com/keiranmraine/pidlock/blob/102593c2a28affdc8b39575cff25facf4edb5df5/pidlock.py#L111

2621667
[..., 2621667, ...]
sayanarijit commented 3 years ago

As mentioned, you can always ping me for reviews :) . I'm pretty active on GH. Worst case, it might take at-most 1-2 days to respond. For now, I'd delete the releases then investigate.

sayanarijit commented 3 years ago

In these case, I use pdb to debug. It's pretty handy and made for these kinds of debugging.

keiranmraine commented 3 years ago

I'm using the vscode plugins, but same diff

sayanarijit commented 3 years ago

I see from line 98 it jumps to 107 even when a script is running.

sayanarijit commented 3 years ago

Ah ok to get the pid

sayanarijit commented 3 years ago

The parsed pid is a string. Needs to be converted to int.

keiranmraine commented 3 years ago

I've just got there too... spent too much time working with perl..

keiranmraine commented 3 years ago

Thanks for the assist. Will add more checks to the CI, and script this check so easier to catch.

sayanarijit commented 3 years ago

I tried to add some tests to the branch I just closed. You might want to have a look there.

hulleyrob commented 3 years ago

I've had a weird problem with the original code In that if the process runs again but the last process has just deleted the pid from the file it hangs forever. Very hard to replicate to be sure. Will this pull request fix such a scenario or should I raise a new issue?

keiranmraine commented 3 years ago

@hulleyrob it should handle that as it works on successful creation of a folder instead of the file (which doesn't suffer race conditions). Working on a v3.0.1 as I made a silly mistake.

hulleyrob commented 3 years ago

Excellent news!