trbs / pid

Pidfile featuring stale detection and file-locking, can also be used as context-manager or decorator
https://pypi.python.org/pypi/pid/
Apache License 2.0
102 stars 26 forks source link

Test test_pid_check_samepid_two_processes fails on FreeBSD #20

Closed thnee closed 5 years ago

thnee commented 5 years ago

If I understand this error correctly, the code did not raise a PidFileAlreadyRunningError when it was supposed to. I do not yet know why.

I have no problem running the test suite on Linux (Python 3.7.1), only on FreeBSD (Python 3.6.6).

======================================================================
FAIL: test_pid.test_pid_check_samepid_two_processes
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/usr/ports/devel/py-pid/work-py36/pid-2.2.1/tests/test_pid.py", line 280, in test_pid_check_samepid_two_processes
    pidfile_proc2.create()
  File "/usr/local/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/usr/ports/devel/py-pid/work-py36/pid-2.2.1/tests/test_pid.py", line 42, in raising
    raise AssertionError("Failed to throw exception of type(s) %s." % (", ".join(exc_type.__name__ for exc_type in exc_types),))
AssertionError: Failed to throw exception of type(s) PidFileAlreadyRunningError, PidFileAlreadyLockedError.
-------------------- >> begin captured logging << --------------------
PidFile: DEBUG: <pid.PidFile object at 0x807eee2c8> entering setup
PidFile: DEBUG: <pid.PidFile object at 0x807eee2c8> create pidfile: /tmp/setup.py.pid
PidFile: DEBUG: <pid.PidFile object at 0x807eee2c8> check pidfile: /tmp/setup.py.pid
PidFile: DEBUG: <pid.PidFile object at 0x807eee368> entering setup
PidFile: DEBUG: <pid.PidFile object at 0x807eee368> create pidfile: /tmp/setup.py.pid
PidFile: DEBUG: <pid.PidFile object at 0x807eee368> check pidfile: /tmp/setup.py.pid
PidFile: DEBUG: <pid.PidFile object at 0x807eee2c8> closing pidfile: /tmp/setup.py.pid
PidFile: DEBUG: <pid.PidFile object at 0x807eee368> closing pidfile: /tmp/setup.py.pid
--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 32 tests in 0.030s

FAILED (failures=1)
Test failed: <unittest.runner.TextTestResult run=32 errors=0 failures=1>
error: Test failed: <unittest.runner.TextTestResult run=32 errors=0 failures=1>

Also, on line 279, I don't understand why it is expecting PidFileAlreadyLockedError? If I remove it, so it only expects PidFileAlreadyRunningError, the test still passes (on Linux). So, what exactly is supposed be happening in this test case?

thnee commented 5 years ago

Notice that if allow_samepid=True is removed from the initialization of pidfile_proc2, then the code does raise PidFileAlreadyLockedError, so it seems maybe this test was rewritten at some point and is not entirely accurate now? I also feel like the test case name is lacking in descriptiveness. Perhaps this thing could be refactored into two separate test cases or something like that?

trbs commented 5 years ago

I agree that this (and probably other tests) are missing descriptiveness. (We could do with proper docs as well)

Think the idea is (after quickly looking at it) that this tests the same pidfile opened by different processes with allow_samepid=True, this should raise PidFileAlreadyLockedError. (Versus if it was the same process with should work since allow_samepid was specified.)

Could it be that on FreeBSD the mocking of patch('pid.os.getpid') does not work properly ?

thnee commented 5 years ago

Ok so we should remove PidFileAlreadyLockedError from that test, since it is not actually relevant there, that's good.

I realize now that this problem only happens in a FreeBSD jail (a container), not on a regular native FreeBSD.

The issue occurs on this line: https://github.com/trbs/pid/blob/0efff53af4554dfc8e06a2627809baa84d13b732/pid/__init__.py#L138

Normally, when this test case runs, the errno code on that line is 1 "Operation not permitted", and so, the if statement is false (ESRCH is 3, not 1), and the code goes to the raise on line 142.

But, when running in a jail, the errno code is 3 "No such process", which makes the if statement true, and so, there is no exception raised.

Do you think that maybe something more should be patched, besides 'pid.os.getpid'? But what?

Lastly, no, I think the patching itself is working just fine. I just think the logic it is relying too much on platform specific behavior.

kevans91 commented 5 years ago

Hi,

@thnee had poked me about this, wondering why this is- here's my assessment:

os.kill should be patched here, in addition to os.getpid, to raise the expected exception. You're patching the latter to simulate pids 1 and 2 and assuming that os.kill on pid 1 will work.

While this will work on almost every *NIX system known to man since pid 1 is init, this is not necessarily true in a containerized environment that may or may not have a pid 1 depending on circumstances. In the case of jails, the jailed process cannot see pid 1 because it exists outside of the current jail, so we raise ESRCH instead.

While not terribly critical, I think for correctness sake patch of os.kill should be done as well since you're dealing pids out of this namespace and the system doesn't have to guarantee the pids you're using exist.

trbs commented 5 years ago

Thanks @thnee and @kevans91 for your analysis of this !

@thnee could you make a PR for this ? (Otherwise I will try to get around to this when I can but probably have a harder time testing it :-) )

thnee commented 5 years ago

Thanks Kyle, makes sense. We need to patch os.kill so that the test never even tries to access the real process table, and then the difference between platforms should not matter.

Yeah, I will make a PR as soon as I have the time for it! :)