harlowja / fasteners

A python package that provides useful locks.
Apache License 2.0
248 stars 45 forks source link

RuntimeError: Reader to writer privilege escalation not allowed #36

Closed prometheanfire closed 2 years ago

prometheanfire commented 5 years ago

https://github.com/harlowja/fasteners/commit/467ed75ee1e9465ebff8b5edf452770befb93913

Hi, I think this commit caused an issue with nova (at the very least). Had an oslo guy (bnemec) look at it and he pointed to the commit.

Here's the log while it's still up.

http://logs.openstack.org/54/660254/2/check/cross-nova-functional/a0617a9/testr_results.html.gz

Backtraces follow.

2019-05-21 21:29:32,148 ERROR [nova.api.openstack.wsgi] Unexpected exception in API method
Traceback (most recent call last):
  File "nova/api/openstack/wsgi.py", line 671, in wrapped
    return f(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/openstack/compute/servers.py", line 661, in create
    nova_context.get_admin_context(), ['nova-compute'])
  File "nova/objects/service.py", line 515, in get_minimum_version_all_cells
    binaries)
  File "nova/context.py", line 564, in scatter_gather_all_cells
    fn, *args, **kwargs)
  File "nova/context.py", line 452, in scatter_gather_cells
    with target_cell(context, cell_mapping) as cctxt:
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "nova/tests/fixtures.py", line 492, in _wrap_target_cell
    with self._cell_lock.write_lock():
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/fasteners/lock.py", line 219, in write_lock
    " escalation not allowed" % me)
RuntimeError: Reader <_MainThread(MainThread, started 140521207375680)> to writer privilege escalation not allowed
2019-05-21 21:32:12,672 ERROR [nova.api.openstack.wsgi] Unexpected exception in API method
Traceback (most recent call last):
  File "nova/api/openstack/wsgi.py", line 671, in wrapped
    return f(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/validation/__init__.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "nova/api/openstack/compute/servers.py", line 661, in create
    nova_context.get_admin_context(), ['nova-compute'])
  File "nova/objects/service.py", line 515, in get_minimum_version_all_cells
    binaries)
  File "nova/context.py", line 564, in scatter_gather_all_cells
    fn, *args, **kwargs)
  File "nova/context.py", line 452, in scatter_gather_cells
    with target_cell(context, cell_mapping) as cctxt:
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "nova/tests/fixtures.py", line 492, in _wrap_target_cell
    with self._cell_lock.write_lock():
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/fasteners/lock.py", line 219, in write_lock
    " escalation not allowed" % me)
RuntimeError: Reader <_MainThread(MainThread, started 140534644954944)> to writer privilege escalation not allowed
2019-05-21 21:30:53,495 ERROR [nova.api.openstack.wsgi] Unexpected exception in API method
Traceback (most recent call last):
  File "nova/api/openstack/wsgi.py", line 671, in wrapped
    return f(*args, **kwargs)
  File "nova/api/openstack/compute/hypervisors.py", line 202, in index
    return self._index(req, limit=limit, marker=marker, links=True)
  File "nova/api/openstack/compute/hypervisors.py", line 218, in _index
    marker=marker, links=links)
  File "nova/api/openstack/compute/hypervisors.py", line 151, in _get_hypervisors
    context, hypervisor_match)
  File "nova/api/openstack/compute/hypervisors.py", line 101, in _get_compute_nodes_by_name_pattern
    context, hostname_match)
  File "nova/compute/api.py", line 5286, in compute_node_search_by_hypervisor
    with nova_context.target_cell(context, cell) as cctxt:
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "nova/tests/fixtures.py", line 492, in _wrap_target_cell
    with self._cell_lock.write_lock():
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/fasteners/lock.py", line 219, in write_lock
    " escalation not allowed" % me)
RuntimeError: Reader <_MainThread(MainThread, started 140089940105024)> to writer privilege escalation not allowed

The errors look more or less the same to me. If I could get indication if I should mask 0.15 in global-requirements.txt (as a known bad release) that'd be nice, right now it's just constraints that are keeping openstack safe.

Thanks

cybertron commented 5 years ago

After a bit of initial investigation, it looks like this doesn't reproduce consistently or in isolation. Running the failed tests by themselves works fine. I was able to reproduce one failure by running the entire functional test suite for Nova, but that's a rather heavy reproducer, and suggests there may be race condition going on here. I'm going to keep poking at it, but since it will likely take a while (each Nova functional test run takes about 15 minutes for me) I thought I'd provide my initial findings.

cybertron commented 5 years ago

With https://github.com/harlowja/fasteners/commit/467ed75ee1e9465ebff8b5edf452770befb93913 reverted I haven't been able to reproduce the Nova functional test failure. Previously it reproduced pretty consistently for me.

My guess would be that there's some sort of race in the eventlet fix that hits us when we're running highly concurrent things like functional tests. That's a WAG though.

I'm going to push a PR to revert that change. While the workaround may be a little ugly, at least it works and it only affects monkey-patched environments (which are ugly by definition ;-).

psarka commented 3 years ago

@cybertron I hope you still remember this 1.5 year old issue :)

I'm hesitating to revert the eventlet thing in PR #37, so I'd like to investigate this a little more. You write that you can hit this in the Nova functional test suite, do you have some pointers on how to run it? And what system were you using, Linux?

cybertron commented 3 years ago

Okay, luckily I was able to dredge up some memories of this. :-)

It does still reproduce with the nova functional tests. I created a fresh Fedora VM and installed the necessary bits, then tweaked the package versions to use the latest fasteners, and my first test run failed on this error. Here are the highlights of what I did:

git clone https://opendev.org/openstack/nova
cd nova
sudo dnf install tox
sudo dnf install gcc-c++
tox -e bindep
# Install all of the missing packages reported by that, e.g.
sudo dnf install -y graphviz libffi-devel libxslt-devel lsscsi mysql mysql-devel mysql-server openssl pcre-devel postgresql postgresql-server sqlite-devel
tox -e functional # This should succeed, it will take a while
.tox/functional/bin/pip install -U fasteners
tox -e functional # This should eventually fail, it might take more than one run

You don't necessarily have to use Fedora, bindep is distro-aware so it will give you the right package names for your distro. You'll need to figure out the tox and gcc package names if you don't already have them, of course.

psarka commented 3 years ago

I can confirm, that this replicates, on ubuntu 20.04 in my case. I'm having some success with running a limited test

tox -e functional nova.tests.functional.test_cross_cell_migrate.TestMultiCellMigrate

Which can be repeated until failure with this nice trick:

while tox -e functional nova.tests.functional.test_cross_cell_migrate.TestMultiCellMigrate; do :; done
psarka commented 3 years ago

Repro :tada: :tada: :tada:

import eventlet

import fasteners

rw_lock = fasteners.ReaderWriterLock()

def do(read):
    if read:
        with rw_lock.read_lock():
            eventlet.greenthread.sleep(1)
    else:
        with rw_lock.write_lock():
            eventlet.greenthread.sleep(1)

pool = eventlet.GreenPool(10)
list(pool.imap(do, [True] * 5 + [False] * 5))

So at the moment fasteners do not support eventlet green threads. Like at all.

Question for those who know - are there any other "thread" like objects we would like fasteners to support? Is there a list of libraries of this sort?

psarka commented 3 years ago

Actually scratch that, this is not even supposed to work, as it lacks the magic

eventlet.monkey_patch()

Adding this repro example works. Back to investigation :(

psarka commented 3 years ago

OK, I got it now - nova is not properly monkey_patching it's eventlets. So the question is then as follows - should fasteners add a support for unpached eventlet threads, or should all our library users properly patch their threading module?

I don't like monkey patching, so I'm leaning towards supporting it, but I would have to think what is the impact to our API. I'd love to keep it clear and simple, with no magic, and definitely as an optional dependency. Would love to hear some opinions!

Details from the investigation:

Fasteners ReaderWriterLock is used in the nova.tests.fixtures.CellDatabases class. Lock is created in the __init__ method, line 400, and used mainly in the _wrap_target_cell method.

I added print statements above each call to read_lock or write_lock as follows:

print(f'At 1. {threading.current_thread().__module__}')
with self._cell_lock.read_lock():
    ...

And the output is as follows:

Captured stdout:
~~~~~~~~~~~~~~~~
At 5. threading
At 5. threading
At 5. threading
At 5. threading
At 5. threading
At 1. threading
At 2. threading
At 3. threading
...

This means that the threading module is not monkey patched at that point, and all the treads are the same thread.

Then, the error itself is triggered when read lock is aquired at line 472, method yields ccontext at line 486 and another thread tries to acquire write lock at 491, before the previous one returns from the yield to release the read lock. Since the two different green threads are the same, it is considered as a reader trying to become a writer, which triggers the exception.

cybertron commented 3 years ago

Huh, that's strange. The functional tests should be fully monkey patched. The functional module has https://opendev.org/openstack/nova/src/branch/master/nova/tests/functional/__init__.py#L23 which calls https://opendev.org/openstack/nova/src/branch/master/nova/monkey_patch.py#L63 unless the debugger is enabled, in which case the thread module would be excluded, but at least in my env I wasn't using the debugger.

There is some funny business with the threading module later in that function though. Apparently it's related to https://github.com/eventlet/eventlet/issues/592 . Maybe that's causing an issue here?

psarka commented 2 years ago

I'm closing this, as the issue here most likely was related to spawn_n, which is solved in fasteners 0.18, and oslo.concurency fixed it independently as well.

I don't have any guidance on how nova should migrate to fixed versions, the project is very complicated :)

Let me know if you ever try the fixed version and it does not work.