python / cpython

The Python programming language
https://www.python.org
Other
62.33k stars 29.94k forks source link

weakmethod's ref is deleted before weakref's garbage-collect callback is executed #85923

Closed 959e967f-6a11-4df7-a200-413ac8b1b028 closed 4 years ago

959e967f-6a11-4df7-a200-413ac8b1b028 commented 4 years ago
BPO 41757
Nosy @freddrake, @nascheme, @pablogsal, @kratsg
PRs
  • python/cpython#16495
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['3.7', 'type-bug', 'library'] title = "weakmethod's ref is deleted before weakref's garbage-collect callback is executed" updated_at = user = 'https://github.com/kratsg' ``` bugs.python.org fields: ```python activity = actor = 'kratsg' assignee = 'none' closed = True closed_date = closer = 'kratsg' components = ['Library (Lib)'] creation = creator = 'kratsg' dependencies = [] files = [] hgrepos = [] issue_num = 41757 keywords = ['patch'] message_count = 5.0 messages = ['376699', '376700', '376870', '376872', '376876'] nosy_count = 4.0 nosy_names = ['fdrake', 'nascheme', 'pablogsal', 'kratsg'] pr_nums = ['16495'] priority = 'normal' resolution = 'fixed' stage = 'patch review' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue41757' versions = ['Python 3.7'] ```

    959e967f-6a11-4df7-a200-413ac8b1b028 commented 4 years ago

    Hi, this is my first issue, so I hope to try my best to explain the problem. Unfortunately, I cannot get an easy minimum-reproducible-example of this because I can only seem to invoke this behavior using pytest (6.0.1) on two tests of our code. First, let me explain the issue.

    AttributeError: 'NoneType' object has no attribute '_alive'
    Exception ignored in: <function WeakMethod.__new__.<locals>._cb at 0x1696c2a70>
    Traceback (most recent call last):
      File "/Users/kratsg/.virtualenvs/pyhf-dev/lib/python3.7/weakref.py", line 55, in _cb
        if self._alive:

    occurs sometimes. Not always (feels like a race condition) and occurs after pytest has finished, which indicates it must be hitting garbage-collect. The backtrace isn't as helpful as it seems to jump from our code straight to the callback (indicating a garbage-collect):

    /Users/kratsg/pyhf/tests/test_validation.py(1082)test_optimizer_stitching() -> pdf = pyhf.simplemodels.hepdata_like([50.0], [100.0], [10]) /Users/kratsg/pyhf/src/pyhf/simplemodels.py(64)hepdata_like() -> return Model(spec, batch_size=batch_size) /Users/kratsg/pyhf/src/pyhf/pdf.py(590)init() -> config=self.config, batch_size=self.batch_size /Users/kratsg/pyhf/src/pyhf/pdf.py(339)init() -> self.config.auxdata_order, /Users/kratsg/pyhf/src/pyhf/parameters/paramview.py(66)init() -> self._precompute() /Users/kratsg/pyhf/src/pyhf/parameters/paramview.py(78)_precompute() -> self.allpar_viewer, self.selected_viewer, self.all_indices /Users/kratsg/pyhf/src/pyhf/parameters/paramview.py(27)extract_index_access() -> index_selection = baseviewer.split(indices, selection=subviewer.names) /Users/kratsg/pyhf/src/pyhf/tensor/common.py(59)split() -> data = tensorlib.einsum('...j->j...', tensorlib.astensor(data)) /Users/kratsg/pyhf/src/pyhf/tensor/numpy_backend.py(268)einsum() -> return np.einsum(subscripts, *operands) \<__array_function__ internals>(6)einsum()

    /Users/kratsg/.virtualenvs/pyhf-dev/lib/python3.7/weakref.py(56)_cb() -> if self._alive:

    Essentially, inside weakref.py's _cb(), I tried to figure out what "self" was:

    (Pdb) self (Pdb) !arg \<weakref at 0x16f31c5d0; dead>

    and it seems like the evaluation of "self._alive" is doomed to fail as self is None. So meth comes in, we take it apart into obj and func, define an inner function _cb that closes over a weakref to a weakref to obj and registers that function to fire when the underlying object gets gc'd. However, there seems to be an assumption that "self" is not None by the time the callback is fired.

    -------------------------------------------------------

    Steps to reproduce:

    Clone: https://github.com/scikit-hep/pyhf Set up virtual env/install: python3 -m pip install -e .[complete] Run pytest: pytest tests/test_validation.py -vvvv -k "test_optimizer_stitching[scipy-numpy or test_optimizer_stitching[minuit-numpy" -s

    959e967f-6a11-4df7-a200-413ac8b1b028 commented 4 years ago

    This PR seems highly related: https://github.com/python/cpython/pull/18189. Not sure if it should be linked to this issue or not.

    pablogsal commented 4 years ago

    Hi Giordon and thanks for your issue.

    Could you please check if you can reproduce this behaviour with Python3.8 and with the current master? We have some new code that may be preventing this to happen:

    https://github.com/python/cpython/blob/master/Modules/gcmodule.c#L780

    Unfortunately, without a simpler reproducer is difficult to tell what's going on so is not possible for me to say that this is fixed in current master.

    pablogsal commented 4 years ago

    Indeed, I cannot reproduce this with Pyhon3.8:

    (vev) ❯ python -m pytest tests/test_validation.py -vvvv -k "test_optimizer_stitching" -v \============================================================================================================= test session starts \============================================================================================================== platform darwin -- Python 3.8.2, pytest-3.10.1, py-1.9.0, pluggy-0.13.1 -- /Users/pgalindo3/github/python/master/vev/bin/python cachedir: .pytest_cache Matplotlib: 3.3.1 Freetype: 2.6.1 benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) rootdir: /Users/pgalindo3/github/python/master/pyhf, inifile: pytest.ini plugins: mpl-0.11, mock-3.3.1, cov-2.10.1, console-scripts-0.2.0, benchmark-3.2.3 collected 26 items / 18 deselected

    tests/test_validation.py::test_optimizer_stitching[scipy-numpy_backend] PASSED [ 12%] tests/test_validation.py::test_optimizer_stitching[scipy-jax_backend] PASSED [ 25%] tests/test_validation.py::test_optimizer_stitching[scipy-tensorflow_backend] PASSED [ 37%] tests/test_validation.py::test_optimizer_stitching[scipy-pytorch_backend] PASSED [ 50%] tests/test_validation.py::test_optimizer_stitching[minuit-numpy_backend] PASSED [ 62%] tests/test_validation.py::test_optimizer_stitching[minuit-jax_backend] PASSED [ 75%] tests/test_validation.py::test_optimizer_stitching[minuit-tensorflow_backend] PASSED [ 87%] tests/test_validation.py::test_optimizer_stitching[minuit-pytorch_backend] PASSED

    \============================================================================================= 8 passed, 18 deselected, 50 warnings in 3.29 seconds \=============================================================================================

    959e967f-6a11-4df7-a200-413ac8b1b028 commented 4 years ago

    Hi Pablo,

    Thanks for looking at this. Indeed, this seems to be fixed for me on python 3.8 (I'm using docker image python:3.8-slim) and I'm unable to force the issue to come up.

    I will go ahead and close this and link the corresponding PR (https://github.com/python/cpython/pull/16495) that fixed things up.