pytest-dev / pytest

The pytest framework makes it easy to write small tests, yet scales to support complex functional testing
https://pytest.org
MIT License
11.89k stars 2.65k forks source link

Memory usage in 3.3.0 #2968

Closed mplanchard closed 6 years ago

mplanchard commented 6 years ago

Bug Description

TL;DR: memory usage seems to have increased several fold in 3.3.0. With the default log capturing enabled, memory usage has the potential to increase to ludicrous levels.

I was troubleshooting issues with tests hanging on our CI server and massive swap utilization, and I found it to be due to an upgrade to pytest 3.3.0. Running our test suite with 3.3.0 quickly consumed all of the memory and began swapping.

Looking at the changelog, the capturing of log output seemed to be the most likely cause. Our application uses SQLAlchemy, which has verbose logging, and has rather verbose logging itself, so I attempted to disable log capturing via both of the options specified here, and while the first seemed to decrease the massive consumption of virtual memory seen with the default options, neither was able to reduce memory utilization enough for our test suite to run.

I have collected the memory utilization of various configurations below. For all of the 3.3.0 runs, the CI server began swapping and the test suite hung, although some configurations resulted in more virtual memory consumption than others.

In all cases, regardless of attempting to disable log capturing, the virtual memory output jumped significantly when hitting tests that produced a large amount of log output. However, even prior to hitting those tests, virtual memory utilization was around twice that of its peak in version 3.2.5.

The tests were running in a python:3.5.2 docker container, via an Ubuntu 16.04 OS. The system has 2 GB of RAM and 2 GB of swap space. In all cases, pytest was invoked via tox, with the command coverage run --source <project directory> -m py.test {posargs}

pytest version additional options peak virtual memory consumed peak resident memory consumed test suite completed
3.2.5 none 2685 MB 243 MB yes
3.3.0 none 11.9 GB !!! 418 MB no
3.3.0 addopts=-p no:logging 9105 MB 366 MB no
3.3.0 --no-print-logs 12.2 GB !!! 431 MB no
3.3.0 addopts=-p no:logging and --no-print-logs 12.2 GB !!! 415 MB no

pytest is not included in the pip output below, since its version varied by test. All other packages remained consistent between tests. Some private distributions have been redacted.

We have had to pin our version of pytest to <3.3.0 for the moment. Is there anything else I can try to reduce the base memory footprint of version 3.3.0?

I'm happy to gather more data as needed.

Thanks!

Pip List

aiohttp (2.2.5) alembic (0.9.5) amqp (2.2.2) aniso8601 (1.3.0) apipkg (1.4) apispec (0.25.4) appnope (0.1.0) asn1crypto (0.23.0) astroid (1.5.3) async-timeout (2.0.0) attrs (17.3.0) autopep8 (1.3.3) Babel (2.5.1) billiard (3.5.0.3) bleach (2.1.1) celery (4.1.0) certifi (2017.7.27.1) cffi (1.11.2) chardet (3.0.4) click (6.7) coverage (4.4.2) cryptography (2.1.1) debtcollector (1.18.0) decorator (4.1.2) dill (0.2.7.1) docutils (0.14) entrypoints (0.2.3) execnet (1.5.0) flake8 (3.5.0) flasgger (0.8.0) Flask (0.12.2) Flask-RESTful (0.3.6) Flask-Script (2.0.6) frozendict (1.2) html5lib (1.0b10) httplib2 (0.10.3) idna (2.6) ipdb (0.10.3) ipykernel (4.6.1) ipython (6.2.1) ipython-genutils (0.2.0) ipywidgets (7.0.5) iso8601 (0.1.12) isort (4.2.15) itsdangerous (0.24) jedi (0.11.0) Jinja2 (2.9.6) jsonschema (2.6.0) jupyter (1.0.0) jupyter-client (5.1.0) jupyter-console (5.2.0) jupyter-core (4.4.0) keystoneauth1 (3.2.0) kombu (4.1.0) lazy-object-proxy (1.3.1) loggly-python-handler (1.0.0) Mako (1.0.7) MarkupSafe (1.0) marshmallow (2.13.6) marshmallow-sqlalchemy (0.13.1) mccabe (0.6.1) mistune (0.7.4) monotonic (1.3) msgpack-python (0.4.8) multidict (3.2.0) mypy (0.550) mysqlclient (1.3.12) nbconvert (5.3.1) nbformat (4.4.0) netaddr (0.7.19) netifaces (0.10.6) notebook (5.2.1) oslo.config (4.13.1) oslo.i18n (3.18.0) oslo.policy (1.28.1) oslo.serialization (2.21.1) oslo.utils (3.30.0) pandocfilters (1.4.2) parso (0.1.0) passlib (1.7.1) pbr (3.1.1) pexpect (4.2.1) pickleshare (0.7.4) pika (0.11.0) pip (9.0.1) pluggy (0.6.0) positional (1.2.1) prompt-toolkit (1.0.15) psutil (5.4.1) psycopg2 (2.7.3.1) ptyprocess (0.5.2) py (1.5.2) pycodestyle (2.3.1) pycparser (2.18) pydecor (1.1.3) pydocstyle (2.1.1) pyflakes (1.6.0) Pygments (2.2.0) PyJWT (1.5.3) pylint (1.7.4) pymongo (3.5.1) pyparsing (2.2.0) pyrabbit (1.1.0) pytest-aiohttp (0.1.3) pytest-cov (2.5.1) pytest-forked (0.2) pytest-xdist (1.20.1) python-dateutil (2.6.1) python-editor (1.0.3) python-keystoneclient (3.13.0) python-memcached (1.58) pytz (2017.2) PyYAML (3.12) pyzmq (16.0.3) qtconsole (4.3.1) redis (2.10.6) requests (2.18.4) requests-futures (0.9.7) restructuredtext-lint (1.1.2) rfc3986 (1.1.0) setuptools (20.10.1) simplegeneric (0.8.1) simplejson (3.11.1) six (1.11.0) snowballstemmer (1.2.1) SQLAlchemy (1.1.14) SQLAlchemy-Utils (0.32.18) stevedore (1.27.1) stripe (1.67.0) terminado (0.8.1) testpath (0.3.1) tornado (4.5.2) tox (2.9.1) traitlets (4.3.2) typed-ast (1.1.0) typing (3.6.2) urllib3 (1.22) vine (1.1.4) virtualenv (15.1.0) wcwidth (0.1.7) webargs (1.8.1) webencodings (0.5.1) Werkzeug (0.12.2) widgetsnbextension (3.0.8) wrapt (1.10.11) yarl (0.13.0)

Submission Checklist

Thanks for submitting an issue!

Here's a quick checklist in what to include:

mplanchard commented 6 years ago

I was just trying to make a repeatable example of this issue, and I have found it to be relatively easy. The file here uses a ton of RAM during testing, without all the docker/tox/coverage complications of our real environment:

OS: MacOS 10.13.2 Python: 3.5

pip version options virtual memory resident memory test time
3.2.5 none 132 MB 32 MB 78.76 seconds
3.3.0 none 890 MB 731 MB 140.96 seconds
3.3.0 -p no:logging 125 MB 32 MB 74.27 seconds
3.3.0 --no-print-logs 819 MB 711 MB 140.96 seconds

This indicates that the additional memory overhead seen in our full test environment when -p no:logging is set is due to something other than the log capturing. Has the fixture setup/teardown lifecycle changed significantly?

Edit: it's also possible that this test doesn't really replicate our actual test environment well enough. We've got ~2500 tests, hundreds of fixtures, etc.

nicoddemus commented 6 years ago

@mplanchard thanks a ton for the reproducible example, we really appreciate it.

I can reproduce your memory usage on my Windows computer. Indeed I suspected the logging module when I first read your post a few minutes ago because it will append the captured log to the test items, which are kept alive for the entire test session. Setting --no-print-logs should be enough as well as this prevents the logging module from appending the captured output to the items:

https://github.com/pytest-dev/pytest/blob/4d2f05e4b967a8df6becb939fe236f5f0b3f11cb/_pytest/logging.py#L307-L310

But this does not explain why even with -p no:logging you still see an increase in memory in your full test environment of course. There has not been any significant change in fixture setup/teardown that I remember.

Are you passing -p no:logging on the command line or in the pytest.ini file? As #2964 shows, there might be a bug related to that.

mplanchard commented 6 years ago

In the full environment, I was passing it via the pytest.ini file, although in my examples from my local machine I was just appending it to the command line (which seemed to work).

I can enable debugging, config tracing, etc. and run any particular combination that you think might be helpful. The tests take quite a while, so just let me know and I can run them in the background during the workday.

mplanchard commented 6 years ago

It's odd that --no-print-logs is showing virtually the same memory consumption as with no options in both my local and the full test environment. Maybe it's the record list being kept by the LogCaptureHandler? It seems like that might get appended to even if self.print_logs is False

https://github.com/pytest-dev/pytest/blob/4d2f05e4b967a8df6becb939fe236f5f0b3f11cb/_pytest/logging.py#L117-L128

nicoddemus commented 6 years ago

Maybe it's the record list being kept by the LogCaptureHandler?

Good point and I thought about that, but the handler is deleted from the item at the end of each phase:

https://github.com/pytest-dev/pytest/blob/4d2f05e4b967a8df6becb939fe236f5f0b3f11cb/_pytest/logging.py#L296-L310

I'm really puzzled that even with logging disabled with -p no:logging you are still seeing the abnormal memory increase...

mplanchard commented 6 years ago

The memory usage seems to be higher from the start, which is what made me ask about differences in fixture setup. I'm doing a run now with -p no:logging, --debug, and --trace-config. I'll need to review the output for sensitive information before sending it. Would it be helpful to also have a comparison of a run with the same options on 3.2.5?

nicoddemus commented 6 years ago

Thanks for the patience in debugging this!

Would it be helpful to also have a comparison of a run with the same options on 3.2.5?

Hopefully yes, so it would be great if you could that as well. ๐Ÿ˜

mplanchard commented 6 years ago

No problem! We love pytest, and use it for all our projects. It's the least we can do.

Also, just collecting statistics, out test suite outputs about 54000 log messages, which is about 10 MB, so even if every log message were held in active memory simultaneously, it would not explain the sudden jump in virtual memory by several gigabytes.

mplanchard commented 6 years ago

I'm having some issues with collecting tests in --debug mode, because several of our fixtures rely on setting/transmitting a Flask application context, and having --debug on seems to interfere with that. There's a relatively complex fixture hierarchy relying on those fixtures, and of course a lot of tests. Any ideas as to how to get around that?

___________________________________ ERROR collecting invision_api/tests/integration/v3/test_log.py ____________________________________
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:617: in __call__
    return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:222: in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:179: in __call__
    self.before(hook.name, hook_impls, kwargs)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:441: in before
    hooktrace(hook_name, kwargs)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:160: in __call__
    self.root.processmessage(self.tags, args)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:136: in processmessage
    lines = self.format_message(tags, args)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:131: in format_message
    lines.append("%s    %s: %s\n" % (indent, name, value))
.tox/py35/lib/python3.5/site-packages/werkzeug/local.py:366: in <lambda>
    __str__ = lambda x: str(x._get_current_object())
.tox/py35/lib/python3.5/site-packages/werkzeug/local.py:306: in _get_current_object
    return self.__local()
.tox/py35/lib/python3.5/site-packages/flask/globals.py:51: in _find_app
    raise RuntimeError(_app_ctx_err_msg)
E   RuntimeError: Working outside of application context.
E
E   This typically means that you attempted to use functionality that needed
E   to interface with the current application object in a way.  To solve
E   this set up an application context with app.app_context().  See the
E   documentation for more information.
mplanchard commented 6 years ago

I think I may have found something interesting. Looking at htop, I felt like there were more threads being used in 3.3.0, so I counted the threads every two seconds. With version 3.3.0, the thread count is increased by a factor of ~10 for the majority of the test, and then bumps up to a thread count increase of ~25 when it hits the latter tests (which have many more log messages, although I'm not sure that's causal).

Version 3.3.0 with -p no:logging 2 -> 4 -> 8 -> 28 -> 70 -> 136 -> 180 -> 226 -> 372 (majority of tests) -> 816 (beginning of latter tests till freeze)

Version 3.2.5 2 -> 4 -> 8 -> 30 -> 114 -> 148 -> 32 -> 34 (majority of tests) -> 166 (beginning of latter tests) -> 92 -> 150 -> 222 -> 90 -> 216 -> 82 -> 186 -> 220 -> 222 -> 230 -> 57 -> 38 -> 32 (until tests finished)

So, it looks like pytest may be holding onto threads that it was letting go previously, or spinning up more threads than it was before.

Edit: thread count collected with while true; do ps -eLf | grep -c "py.test"; sleep 2; done

mplanchard commented 6 years ago

And the thread count gets even higher with the logging plugin enabled.

Version 3.3.0 with default options 2 -> 4 -> 8 -> 4 -> 28 -> 30 -> 76 -> 130 -> 162 -> 202 -> 290 -> 348 -> 372 (majority of tests) -> 554 (beginning of latter tests) -> 712 -> 922 -> 1178 (freezes)

nicoddemus commented 6 years ago

Hi @mplanchard, thanks a lot for the detailed statistics. The part about the high number of threads is interesting: pytest itself (nor py and pluggy) spawn any threads AFAIK. This to me also suggests it might be a side effect of something not being teardown at the appropriate time (fixtures might be one case as you suggest). I would love to help debug this further but it is hard to speculate further without looking at the source code.

I would try to reduce the tests and code involved step by step until I got to a minimal reproducer. For example running half of the test suite, looking for fixtures which might spawn threads and see if the same number of threads keep spawning if tests which don't use that fixture are executed, and so on. Really sorry I can't help more than that, but I will be sure to come back here if more suggestions pop in my head.

mplanchard commented 6 years ago

No worries. I didn't find any reference to the threading module in the pytest code, so I'm inclined to think you're right. I wonder if it might be due to the changes referenced in #2959? Several of our fixtures return mutable objects, and some of them, like DB connection instances, function via thread pools. I'm not sure how attr implements the whole frozen thing, so I'm not sure whether that's a reasonable conjecture.

I'll try to get a minimal example by this evening that demonstrates a difference in thread consumption at the lowest possible level.

nicoddemus commented 6 years ago

Several of our fixtures return mutable objects, and some of them, like DB connection instances, function via thread pools. I'm not sure how attr implements the whole frozen thing, so I'm not sure whether that's a reasonable conjecture.

Actually the change in #2959 refers to the fact that the params argument of pytest.fixture could be mutated after the declaration and it would affect the parametrization with the mutated list; returning mutable objects from fixtures is common practice and should work.

Another thing to look for: do all your fixtures implement an explicit cleanup (if makes sense)? For example, explicitly closing a db connection or a destroying a thread pool? I can see some bug sneaking in that would keep the fixture frame alive for some reason, and a fixture which implicitly relied on the fact that the locals of that frame were destroyed for cleanup might start acting up (#2798 came to mind, but that was reported in 3.2).

mplanchard commented 6 years ago

You were absolutely right about explicit cleanup. We had a couple of test fixtures that were relying on an implicit closing of a MongoClient connection upon garbage collection (sneakily, because they were instantiating objects that instantiated clients). Updating these to close() the clients explicitly fixed the memory leak so that the -p no:logging utilization more closely mirrors that of 3.2.5.

I should have time this evening both to check the memory utilization with logging enabled and to hopefully generate a test based on our fixtures that is able to demonstrate the difference between 3.2.5 and 3.3.0.

The-Compiler commented 6 years ago

I haven't really had time to follow this discussion, but FWIW I've had good experiences tracking down memory leaks using tracemalloc and/or pympler.muppy.

nicoddemus commented 6 years ago

@mplanchard that's both great news (that you found a solution to your problem) and a bit of sad news (it seems we introduced a regression about leaking frames/fixtures in 3.3). ๐Ÿ˜…

We would really appreciate if you can come up with a reproducible example that demonstrates the differences between 3.2 and 3.3 regarding fixture cleanup. ๐Ÿ‘

nicoddemus commented 6 years ago

Rรก managed to find a reproducer in #2981! ๐Ÿ‘

mplanchard commented 6 years ago

That is great news! I can verify that our case is also a session fixture being passed to a regular fixture.

Thanks so much for all your help and your quick responses. Weโ€™ll be pushing fixes for our test tear down tomorrow, which should allow us to get back to the most recent pytest. Weโ€™d be happy to help verify any fixes as they come!

On Wed, Nov 29, 2017 at 5:53 PM Bruno Oliveira notifications@github.com wrote:

Rรก managed to find a reproducer in #2981 https://github.com/pytest-dev/pytest/issues/2981! ๐Ÿ‘

โ€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pytest-dev/pytest/issues/2968#issuecomment-348036393, or mute the thread https://github.com/notifications/unsubscribe-auth/AGcI4Wg66Y5iti9nHY16VwSGRwvwgxLzks5s7e6OgaJpZM4Qt8Hv .

nicoddemus commented 6 years ago

Thanks for the patience as well!

out test suite outputs about 54000 log messages, which is about 10 MB

With the fixed tear down code, could you please check the memory consumption of your test suite with the logging plugin active?

mplanchard commented 6 years ago

@nicoddemus It seems like after integrating our teardown fixes, thread utilization and memory consumption is pretty much back to normal, even with the logging plugin enabled. In a run with pytest 3.3.0 and the logging plugin enabled, both metrics look pretty much identical to running with 3.2.5.

The changes required were remarkably minimal, but of course the fixtures are used in hundreds and hundreds of tests. I've included all of the required fixture changes for our test suite below:

-@pytest.yield_fixture()
-def legacy_app():
+@pytest.fixture(scope='session')
+def legacy_app_session():
+    """Yield a legacy application for the session."""
+    instance = api.create_custom_app()
+    yield instance
+    instance.mongo.db.client.close()
+
+
+@pytest.fixture()
+def legacy_app(legacy_app_session):
     """Yield an isolated flask test app instance"""
-    app = api.create_custom_app()
-    context = app.app_context()
+    context = legacy_app_session.app_context()
     context.push()
-    yield app
+    yield legacy_app_session
     context.pop()

 @pytest.fixture(scope='session')
 def session_app(sql_session):
     """Return an app for the session, with no context"""
     instance = api.create_app()
     instance.config['TESTING'] = True
     scoped_session = instance.sql.session
     instance.sql.session = instance.sql.session()
     yield instance
+    instance.mongo.db.client.close()
     instance.sql.session.expunge_all()
     instance.sql.session.rollback()
     instance.sql.session.close()

 @pytest.fixture()
-def db(mongo_util):
+def db(mongo_util, db_client):
     """Return a pymongo Database instance"""
-    db = mongo_util.get_db(db_name=mongo_util.db_info.db)
+    db = db_client.get_database(mongo_util.db_info.db)
     return db

 @pytest.fixture(scope='session')
-def db_session(mongo_util):
+def db_session(mongo_util, db_client_session):
     """Return a pymongo Database instance"""
-    db = mongo_util.get_db(db_name=mongo_util.db_info.db)
+    db = db_client_session.get_database(mongo_util.db_info.db)
     return db

The mongo_util.get_db() method being used previously created a client when it was called, which was then not getting closed. Also, the db_client fixtures, which were not being used previously in the fixtures where they've been added, already had explicit teardowns.

nicoddemus commented 6 years ago

That's great news, thanks @mplanchard for the followup! I'm really glad we got to the bottom of the issue and the solution was simple.

I'm closing this for now because the root of the issue is detailed in #2981 already. Thanks again for all the patience and understanding while we tracked down the problem! ๐Ÿ‘