trbs / pid

Pidfile featuring stale detection and file-locking, can also be used as context-manager or decorator
https://pypi.python.org/pypi/pid/
Apache License 2.0
102 stars 26 forks source link

Incorrect PID tracking when used in conjunction with python-daemon. #4

Closed nnathan closed 9 years ago

nnathan commented 9 years ago

I'm using pid module in conjunction with python-daemon.

Although I don't have a simple example yet, the way python-daemon works is by providing a DaemonContext in which you can pass a file locking context manager. Normally I would use lockfile.pidfile but it doesn't feature staleness detection and pid seems to do everything correctly (almost).

The following code is contrived and hasn't been tested, but explains the situation I'm seeing.

from daemon import DaemonContext
from pid import Pidfile

# Using detach_process, DaemonContext roughly performs the following:
#1. fork() [exit parent]
#2. setsid()
#3. fork() [exit parent]
#4. override signal handlers
#5. pidfile.__enter__()

with DaemonContext(detach_process=True, pidfile=Pidfile(pidname="test", piddir="/tmp")):
    pass

The problem is that entering the context manager at step 5, the Pidfile context gathered information about the process environment before step 1 in the above comment, and overrides the signal handler if overriding TERM.

I am trying to brainstorm a way to provide a DaemonContext friendly pid context manager. I can subclass Pidfile and implement DaemonContextPidfile, which reinitialises all variables in __init__() when __enter__() is called. Or perhaps the process environment should be gathered only when the __enter__() has been called.

The simpler solution is:

with DaemonContext(detach_process=True):
    with Pidfile(pidname="test", piddir="/tmp/"):
        pass

Which is perfectly valid, except Pidfile() may tread on an existing SIGTERM handler. Perhaps we can supply a SIGTERM handler to Pidfile which decorates the passed signal handler to ensure Pidfile.close() is called before executing the handler that is passed in.

Please let me know if you think any of these ideas seem viable.

Thanks.

trbs commented 9 years ago

@nnathan sure, let me recap to make sure I understand correctly. I see two patches here:

1) Allow to specify a custom function for sigterm_noop_handler 2) Have a way to run the "setup" code from __init__() inside __enter__()

Both seem good idea's to me.

As for (2) it would be easier if we just require using PidFile as a context manager but I kinda like that using it as a context manager is optional. It would be fairly easy to just add another flag to __init__() so that we can allow to lazily initialize the setup code, if possible we can do that in the create() method so it will work for context decorators and 'normal' code.

nnathan commented 9 years ago

Yep, I think you have a good understanding of what I'm suggesting.

I think (1) is trivial if we allow a register_term_signal_handler_function which is assigned a signal handler function and either creates a custom function which calls close() before calling register_term_signal_handler_function().

I think (2) is a little more nuanced but approachable in a few different ways. We can have __enter__() just call __init__() again (essentially doing it twice), or supply a lazily flag to notify which only determines process environment in the create() method.

The last thing I want to do is change the current behaviour of the code as it works really well. I see the above as simply features that provide a superset of functionality to remain compatible with daemon context managers.

trbs commented 9 years ago

Since python-daemon already sets a SIGTERM signal, would it not be enough to just set register_term_signal_handler=False ?

trbs commented 9 years ago

Created a 'lazy' branch and pushed a patch for this in there.

See https://github.com/trbs/pid/commit/b233bfebdce258a835e9654de77bb78015006761

Could you take a lot / test this and see if this works as expected ?

nnathan commented 9 years ago

Thanks, I'll do some thorough testing and get back to you in a few days. (I was working on a similar approach but was seeking something more pythonic, the approach you laid out seems the path of least resistance.)

trbs commented 9 years ago

If you have any suggestions for improvements or better ways, I'm all ears ! :)

Thanks for wanting to test this. Things like these pidfile's can be very tricky and I don't want to break things for people.

Although I was thinking that the 'lazy-er' approach might make more sense as the default... since like with python-daemon environment variables might change between creating the Pidfile object and executing the create() method.

nnathan commented 9 years ago

Hi, I managed to test the new pid module. It works well with python-daemon. I tried all combinations of lazy and detach_process (double fork option) to ensure that pid behaves as expected.

I agree that the lazy approach should be the default. It makes sense that you want the most accurate assessment of the process environment prior to checking/acquiring a lock. As such I set that to the default, and along the way I had to create and fix some test cases (which I've submitted in PR https://github.com/trbs/pid/issues/5).

Lastly, I moved the logging setup to _setup() since I believe that the logging stream handle may be gone from the time you import the module to the time you instantiate PidFile(). This matters when used as a "passed" context manager (in the case of python-daemon) and for non-context manager use of PidFile() where between instantiation and check/acquire you may fork and close of all file handles/streams (including logging).

If you have any suggestions for improvements or better ways, I'm all ears ! :)

I honestly couldn't find a more elegant and beautiful way to approach the delayed setup. I think you've done a good job with the current situation. Also with the tests you wrote, this should vastly help should we see a refactor in the future.

nnathan commented 9 years ago

Oh, I totally forgot to test the SIGTERM handler. I'll get back to you with results shortly.

nnathan commented 9 years ago

After working on the code a little, my mind is a little clearer.

  1. I cannot find a situation where we want lazy setup to NOT be the default behaviour. We should be interrogating the process environment at the time of creating or checking a lock.
  2. Therefore lazy should be the default behaviour, but we should get rid of the lazy option altogether. We will still need a flag to determine internally if _setup() has been called (probably call it self._is_setup).
  3. If there is some extreme case where the user needs to control when _setup() takes place, then they can explicitly call _setup().
  4. The register_term_signal_handler option is sufficient for ensuring that SIGTERM is not tread on. I was overthinking when the solution was right in my face. Therefore we don't need to add any features in that regard. (I also didn't quite understand the relation between atexit and SIGTERM until reading up on it a little, and so I understand why this feature exists). I would argue that we shouldn't even have the option and instead use the signal.getsignal() to determine if the handler has been setup appropriately, otherwise implicitly set one.

What do you think of the above assessment? If you agree, I can do the legwork in my existing PR to fix this. We will also get rid of the lazy tests you've so graciously written.

trbs commented 9 years ago

I agree with your assessment. Cannot think of any other reason as well.

Let's remove the protected status from _setup() and just call it setup() to indicate it's new and improved role :)

I added register_term_signal_handler='auto' in the lazy branch. After rechecking this we should safely be able to detect whether or not we need to install the handler. According to documentation at https://docs.python.org/2/library/atexit.html we only need this when a handler is not explicitly installed.

This should make sure we do the 'correct' thing per default. Previously we were overriding a registered function if it was already set (causing the issues with python-daemon) which was not very nice of PidFile. Since we only needed when no other handler is defined we might as well leave it alone if that has happened already.

The user can still specify True or False explicitly to do otherwise. I also added a check to see if register_term_signal_handler is a callable and if it is use the callable as the signal handler function. (same functionality as before just without the extra variable)

nnathan commented 9 years ago

I've made lazy behaviour default (see PR https://github.com/trbs/pid/pull/6).

A few observations about the automatic signal handler which I think we should rectify:

  1. pid should not tread on SIG_IGN. This means the caller had explicitly set it after python started but before the PIDFile functionality was invoked. Specifically SIG_IGN has the behaviour of ignoring TERM signals altogether (discarding them). Therefore the process will not terminate and that is perfectly fine for the behaviour for pid.
  2. For same reason as (1) I think pid should not tread when signal.getsignal(SIGTERM) == None. I think a None return value may occur if a process interfaces with C which may set the Signal handler but isn't represented as a callable object in Python (I will test this and see soon).

Again I'm sure about the functionality of (1) but not of (2).

I don't want to step on any toes, but these are the tests I envision:

def test_pid_default_term_signal():
    signal.signal(signal.SIGTERM, signal.SIG_DFL)

    with pid.PidFile():
        assert callable(signal.getsignal(signal.SIGTERM)) is True

def test_pid_ignore_term_signal():
    signal.signal(signal.SIGTERM, signal.SIG_IGN)

    with pid.PidFile():
        assert signal.getsignal(signal.SIGTERM) == signal.SIG_IGN

def test_pid_custom_term_signal():
    def _noop(*args, **kwargs):
        pass

    signal.signal(signal.SIGTERM, _noop)

    with pid.PidFile():
        assert signal.getsignal(signal.SIGTERM) == _noop

def test_pid_unknown_term_signal():
    # Not sure how to properly test this when signal.getsignal returns None
    #  - perhaps by writing a C extension which might get ugly
    #
    # with pid.PidFile():
    #     assert signal.getsignal(signal.SIGTERM) == None

    pass

Lastly, I would like to add some (very short to the point) documentation for the front page describing the new behaviour going forward and a short note on signal handling.

trbs commented 9 years ago

Agreed on SIG_IGN and None. I should have looked into signalmodule.c and read the Python documentation better. I've changed that and added your tests. I disabled test_pid_unknown_term_signal test because this indeed seems a hard one to test. I was thinking that we might be able to use ctypes (on Linux/OSX) to set the handler (or call PyOS_setsig directly). But reading the signalmodule.c code it seems the handlers needs to be set before loading the signal module for the first time.

Thanks for wanting to write a little documentation about this.

Lets change the version number to 2.0.0, a major version increase, to indicate the impact of the changes. If you want we can add an AUTHORS file ? Your name or nick should be mentioned for all the work you put into this ! if you want of course :)

nnathan commented 9 years ago

Lets change the version number to 2.0.0, a major version increase, to indicate the impact of the changes. If you want we can add an AUTHORS file ? Your name or nick should be mentioned for all the work you put into this ! if you want of course :)

:+1:

You're welcome to attribute as a (minor) contributor. I am honoured for the mention.

I think with my last pull request and a few other touch ups (such as bumping changelog), I think we can go forward to 2.0.0. Thanks again for putting in the legwork. I really found pid very pleasant with the combined effort to make the behaviour more desirable (instead of me going off on my own with a custom fork).

trbs commented 9 years ago

Secretly I dream of a day in which we can delete this repository and Python will ship with a well testing and full featured PidFile/lockfile implementation :-)

trbs commented 9 years ago

A very big THANK YOU ! :+1: