wolph / portalocker

An easy library for Python file locking. It works on Windows, Linux, BSD and Unix systems and can even perform distributed locking. Naturally it also supports the with statement.
http://portalocker.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
268 stars 51 forks source link

Fix `ResourceWarning` when `AlreadyLocked` #59

Closed BoniLindsley closed 3 years ago

BoniLindsley commented 3 years ago

This is issue https://github.com/WoLpH/portalocker/pull/50 appearing on another line.

When acquire-ing a portalocker.Lock, there are parameters fail_when_locked for indicating whether to raise an AlreadyLocked exception if the first attempt in locking fails, and timeout for indicating how long to attempt locking for if locking fails.

The unit test test_with_timeout suggests that a valid combination of arguments can be to fail_when_locked with a positive timeout.

with portalocker.Lock(tmpfile, timeout=0.1, mode='wb',
                      fail_when_locked=True):

So there should be no warnings issued from using it. In particular, no ResourceWarning from not closing a file handle.

Expected Behavior

In the following, we lock a temporary file, and then try to acquire another lock on the same file again. All ResourceWarning-s are set to be shown to check if they are issued.

#!/usr/bin/env python3

import portalocker
import tempfile
import warnings

warnings.filterwarnings('always', category=ResourceWarning)
with tempfile.NamedTemporaryFile() as pid_file:
    with portalocker.Lock(filename=pid_file.name):
        try:
            portalocker.Lock(
                fail_when_locked=True,
                filename=pid_file.name,
                timeout=0.1,
            ).acquire()
        except portalocker.exceptions.AlreadyLocked:
            pass
        else:
            raise RuntimeError('Expected AlreadyLocked.')

The acquire docstring says that

fail_when_locked is useful when multiple threads/processes can race
when creating a file. If set to true than the system will wait till
the lock was acquired and then return an AlreadyLocked exception.

So the acquire call should raise an AlreadyLocked exception. There should be no ResourceWarning-s issued.

Current Behavior

The exception is indeed raised. However, the warning is issued.

$ ./test_portalocker.py
/home/boni/base/src/tmp/./test_portalocker.py:17: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/tmpkk2prhbu' mode='a' encoding='UTF-8'>
  pass
ResourceWarning: Enable tracemalloc to get the object allocation traceback

This occurs because the file handle to the pid_file used for locking was not closed before raising the error.

fh = self._get_fh()
try:
    fh = self._get_lock(fh)
except exceptions.LockException as exception:
    timeout_end = current_time() + timeout
    while timeout_end > current_time():
        time.sleep(check_interval)
        try:
            if fail_when_locked:
                # ---- HERE ----
                # The `fh` is closed for timeout in `else` below,
                # but not for failing.
                raise exceptions.AlreadyLocked(exception)
            else:
                fh = self._get_lock(fh)
                break
        except exceptions.LockException:
            pass
    else:
        fh.close()
        raise exceptions.LockException(exception)

# Prepare the filehandle (truncate if needed)
fh = self._prepare_fh(fh)

self.fh = fh
return fh

Possible Solution

The simplest fix is to add fh.close() before raising the exception.

A more defensive alternative, guarding also against say KeyboardInterrupt, is to wrap the whole thing with a finally clause to close the handle. (There is also dropping Python 2.7 support and use a context manager!)

try:
    fh = self._get_fh()
    ...
    return fh

finally:
    if self.fh is None:
        fh.close()

I am not sure how to tell coverage that fh.close() -> return fh is not possible though.

Context (Environment)

I am using Debian bullseye/sid, Python 3.9.

wolph commented 3 years ago

First of all, thank you for the very detailed and well constructed bug report!

For safety and backwards compatibility I prefer the option of adding the fh.close() before the raise but I think that would introduce a new bug. I think that simply adding the fh.close() at line 169 will make the next iteration of the loop inevitably fail because the file handle will be closed. So instead of capturing exceptions.LockException I think it should capture exceptions.BaseLockException so it also captures exceptions.AlreadyLockedException.

I would actually prefer to refactor that code a little because it has a little bit of duplication anyhow.

As for wrapping in a finally clause, that would still need a special case as it would also close the handle in the success state. And that would be bad :)

Dropping Python 2.7 support is something I want to do within the next year with a "final" version downloadable for people that are still stuck.

BoniLindsley commented 3 years ago

I think that simply adding the fh.close() at line 169 will make the next iteration of the loop inevitably fail because the file handle will be closed. So instead of capturing exceptions.LockException I think it should capture exceptions.BaseLockException so it also captures exceptions.AlreadyLockedException.

Hm... if I understand the code properly, it is not supposed to catch the AlreadyLocked exception. The exception propagates out of the function as per the API, and there would not be a next iteration.

I would actually prefer to refactor that code a little because it has a little bit of duplication anyhow.

I did feel an urge to do some refactoring. Butm well, obligatory spacebar and heater xkcd!

If you intend to refactor a little, I would like to ask: is it necessary to wait for one iteration to raise AlreadyLocked? The fail_when_locked variable is essentially constant inside the function. So the exception is either raised in the first iteration, or it never will. In particular, it would make sense to pull it above the while loop, unless it is intentional to wait one iteration first. From reading the acquire docstring (copied below), perhaps it was to ensure other locking threads have a chance to finish locking?

fail_when_locked is useful when multiple threads/processes can race
when creating a file. If set to true than the system will wait till
the lock was acquired and then return an AlreadyLocked exception.

As for wrapping in a finally clause, that would still need a special case as it would also close the handle in the success state. And that would be bad :)

Yes, I agree! It does feel a little off that the preferred path requires special handling.

Dropping Python 2.7 support is something I want to do within the next year with a "final" version downloadable for people that are still stuck.

Nice! May be type hinting will come in too!

Edit: Here is a suggestion that puts the exception checking in one place. It still goes checks fail_when_locked every loop though.

# Set the timeout end time as soon as possible.
timeout_end = current_time() + timeout
# Get a new filehandler and lock.
# Ensure handle is closed unless file is locked and prepared.
try:
    fh = self._get_fh()
    # Retry till the timeout has passed.
    # A do-while style loop to always try once.
    sleep = time.sleep
    while True:
        try:
            self._get_lock(fh)
        except exceptions.LockException as exception:
            # Breaking: no sleep if `fail_when_locked`.
            if fail_when_locked:
                # TODO[Python 3]: Use `raise ... from exception`.
                raise exceptions.AlreadyLocked(exception)
            if timeout_end <= current_time():
                raise
            # Retry if not timed out and no fail_when_locked.
            # Leave the `try` scope as soon as possible.
        else:
            # We've got the lock.
            # Prepare the filehandle (truncate if needed).
            # If `_prepare_fh` fails, we still close the handle.
            self.fh = self._prepare_fh(fh)
            return fh
        # Wait a bit and retry.
        sleep(check_interval)
finally:
    if self.fh is None:  # pragma: no branch  # Returns if True.
        fh.close()  # pragma: no branch  # No return.
BoniLindsley commented 3 years ago

I believe the lastest commit addca52b22345372aee3629e264b5aece1e805a9 has fixed this. I will leave closing this issue to you, in case you want to do that only when merging to the main branch.

And thank you for your excellent work on this. It is a lot more readable (to me) now. I have even learnt that except clauses del the assigned exc variable. That is surprising to me! I had to look this up.

wolph commented 3 years ago

Yes, I'll close it with the new release. Should be out somewhere this week :)

The exception thing is indeed a tricky one, you'll never know it went wrong until it bites you. It's a "feature" that was introduced in Python 3, Python 2 didn't have that issue. I discovered it several years ago when doing research for my Python book :)

For future reference, even though you would expect it to work, this does not work:

exception = None
try:
    ...
except Something as exception:
    # `exception` is still available here
    ....

# `exception` is not available here
wolph commented 3 years ago

The new release is now online. It also includes type hinting and it has a distributed redis lock included.

wolph commented 3 years ago

And thank you for all the help of course :)