Fixtures not torn down after timeout

pytest-dev / pytest-timeout

MIT License

206 stars 63 forks source link

Fixtures not torn down after timeout #134

Closed ctmbl closed 1 year ago

ctmbl commented 2 years ago

Understanding

As I understand the README, and the behavior I'm noticing about pytest-timeout, fixtures teardown may not be executed if timeout is triggered. Even if using func_only=True fixtures aren't torn down, I must confess I was expecting that, using this parameter, timeout would apply to the test only, but that fixtures would be torn down as expected when using pytest.

Problem

The test suite I'm running, starts a process using subprocess.Popen and not tearing down that fixture is thus creating a remaining zombie process which makes it mandatory to being executed.

Hope and Questions

I also must admit that I'm really not familiar with the internal principle of signal of thread methods. Even if I think I understand their differences, I can't really see their "natural" limitations. That's why I'm making this Issue: Would there be a way to implement pytest-timeout that would allow to execute the tearing down of (at least) one fixture when using @pytest.mark.timeout(X, func_only=True)? Would it depends on the used method? Is the only solution would be to "force" the execution of the fixture (for example using a custom hook like proposing in the README section Extending pytest-timeout with plugings)? Is it even possible to force a fixture to be executed using pytest (it doesn't seem to be really the pytest way to do things...)?

Not satisfying solution so far

So far I tried to go with the whole "forcing the tearing down" thing and I found https://github.com/pytest-dev/pytest/issues/1738 this solution but that doesn't seems really satisfying so I keep looking !

I would really like any help, on my understanding of this plugin as well as on any solution you could think of ! Thanks a lot !

ctmbl commented 2 years ago

I continued digging into my issue. While I can't find any way to force a pytest fixture to torn down, I tried to execute a function of my own after a timeout. It seems easy to do with the signal method using for exemple pytest_sessionfinish but I can't find a way to do it using the thread method because of the call to os._exit(1) at pytest_timeout.py:447 :

def timeout_timer(item, timeout):
    ....
    try:
        ....
    except Exception:
        traceback.print_exc()
    finally:
        sys.stdout.flush()
        sys.stderr.flush()
        os._exit(1)

Would it be possible to add a handler (a python callable) to pytest-timeout to execute after such a timeout and before os._exit(1) ?

flub commented 2 years ago

So with the thread method I think the answer is simply no, it can't be done. It maybe possible to fudge something like you suggest, run some bit of custom code anyway, but it defeats the point of the method: I want to guarantee the process terminates right there and then, no ifs or buts. If this happens regularly in your test setup than you should use some other layer around pytest to do the required cleanup, like e.g. collecting your zombie processes.

For the signal method this might be made to work and like you I'm kind of surprised it doesn't work with the func-only option. IIRC we simply raise an exception at a timeout, func-only should ensure that this only happens inside your test and not inside a fixture function after which I kind of expect pytest to run the fixture teardown. So it is worth spending your efforts in understanding the pytest interactions here and maybe figure out exactly what the execution flow is to see if you can fix this.

ctmbl commented 2 years ago

Thanks for the answer!

So if I understand (for the the thread method), it would be technically feasible to insert a call to a custom function in timeout_timer(item, timeout) but it is just not the "spirit" of the method because it can alter its reliability (which I totally understand)? But just to learn a bit: If I'd want to do it anyway I could fork pytest-timeout and insert this call, for example to handler (which would be a Callable) passed as parameter of @pytest.mark.timeout(X, method="thread", func_only=True, handler=my_callable)? And after some modification to Settings and pytest_timeout_set_timer(item, settings) do:

def timeout_timer(item, timeout):
    ....
    try:
        ....
    finally:
        item.handler()
        sys.stdout.flush()
        sys.stderr.flush()
        os._exit(1)

But I could also not modify at all pytest-timeout and simply reimplement the pytest_timeout_set_timer hook (I couldn't pass my_callable in the marker but I could execute some code anyway), is that right ?

I'm kind of new with pytest hooks so I try to get a better understanding of it all 🙂

However I admit that your solution of adding a layer around pytest to handle zombie processes seems far better and I think I'll dig into it !!

As for the signal method I found out that I can't use it because ~~I also use xmlrpc.client to communicate with my other process through a socket and it seems that underlying (maybe in httplib or socket) it uses SIGALRM too~~ I already use it elsewhere and I'll have to change that surely. But that's actually another topic 😄

flub commented 2 years ago

On Wed 06 Jul 2022 at 02:41 -0700, ClementMabileau wrote:

So if I understand (for the the thread method), it would be technically feasible to insert a call to a custom function in timeout_timer(item, timeout) but it is just not the "spirit" of the method because it can alter its reliability (which I totally understand)?

Some random code you could execute, but it executes in a different thread so something like fixture teardown would not work unless the fixture is designed in a way that allows it to be done from a different thread so that requires intricate interaction.

But just to learn a bit: If I'd want to do it anyway I could fork pytest-timeout and insert this call, for example to handler (which would be a Callable) passed as parameter of @.***(X, method="thread", func_only=True, handler=my_callable)? And after some modification toSettingsandpytest_timeout_set_timer(item, settings)` do:
def timeout_timer(item, timeout):
    ....
    try:
        ....
    finally:
        item.handler()
        sys.stdout.flush()
        sys.stderr.flush()
        os._exit(1)
But I could also not modify at all pytest-timeout and simply reimplement the pytest_timeout_set_timer hook (I couldn't pass my_callable in the marker but I could execute some code anyway), is that right ?

Yes you can use the hooks to achieve this, if you look at the existing hook you can basically see how to do this:

Start a timer thread with the code to run in case of timeout.
Attach the cancel method to the item, or implement the pytest_timeout_cancel_timer hook.

Problem is that the timeout_timer() logic is not public currently, I guess this could be made more reusable for custom implementations like you want to have.

As for the signal method I found out that I can't use it because I also use xmlrpc.client to communicate with my other process through a socket and it seems that underlying (maybe in httplib or socket) it uses SIGALRM too, making the timeout to never be triggered. But that's actually another topic 😄

Oh, that's actually kind of interesting especially if that's from the stdlib.

ctmbl commented 2 years ago

Some random code you could execute, but it executes in a different thread so something like fixture teardown would not work unless the fixture is designed in a way that allows it to be done from a different thread so that requires intricate interaction.

Yes I understand this thanks a lot for the explanation !

Problem is that the timeout_timer() logic is not public currently, I guess this could be made more reusable for custom implementations like you want to have.

That's actually the issue I encountered trying to implement pytest_timeout_set_timer, I ended up copying many function from pytest-timeout (such as write_title) only to add a few code. It didn't seem really interesting in the end.

Oh, that's actually kind of interesting especially if that's from the stdlib.

My mistake! SIGALRM was not use in these libs I was just using it elsewhere and forget it...

So I guess in the end the better solution is by far using the signal method, now that I can. But thank you very much for taking the time to help me understand these methods, it has been a great help!

shaunc commented 1 year ago

I'm facing a related case: I'm testing a lot of numba-compiled CUDA kernels. For test speed, I cache them (bytecode is written to pycache). Every once in a while, one hangs, and "signal" method doesn't work in this case. "Thread" method does interrupt, but next time I run the cached bytecode is often corrupt and segfaults. I can of course delete all the caches, but then slow testing runs 5x slower. I would like to be able to clear the caches for just kernels that have been compiled in current session when I interrupt with "thread" method. Can I do this using a hook?

ctmbl commented 1 year ago

@shaunc What I learn from @flub 's answer and from my own experience is that, if the signal method fails it's because it's already used by another system in the process (in my case I was using timeout-decorator for other reason in the same process), because this method works with SIGALRM you can't nest it (it just resets SIGALRM and overwrite your first declared timeout)

So, instead of trying to recover (or tear down) from a thread method you should ask yourself why does the signal method don't work, and fix it (which could mean in your/my case to remove the other parasitizing timeout OR refactor the system so that they don't nest OR fork a process to control it from parent process as flub suggested) 😉 it would be the better solution for sure

However if you're absolutely sure that you don't want to do that then you could probably do it with a hook as suggested by flub but I don't if you'll have access to session's informaztion to clear the right cached info, you could also modify the pytest-timeout code as I did on my fork, it was quite a long ago so I don't remember very well but if I'm not mistaken it worked, and I could dig into it if you need help!

Hope that's helping you 😉

flub commented 1 year ago

@shaunc Thread method means no cleanup by the pytest process itself, nothing that can be done about that. What @ctmbl suggests to figure out why signal method doesn't work is one option. Another way it so add cleanup in a separate process wrapping pytest. Or yet another option for your case is to write some kind of lock file in the test setup (i.e. a fixture) and remove it in the teardown (i.e. fixture finaliser). That way on the next run when the fixure finds the lockfile because the test timed out and the finaliser was not run, you it can remove the possibly corrupt cache.

shaunc commented 1 year ago

All I really want is to know what has been executed so I can clear the cached info. The finalizer method sounds promising - thanks!