Closed mplanchard closed 6 years ago
I was just trying to make a repeatable example of this issue, and I have found it to be relatively easy. The file here uses a ton of RAM during testing, without all the docker/tox/coverage complications of our real environment:
OS: MacOS 10.13.2 Python: 3.5
pip version | options | virtual memory | resident memory | test time |
---|---|---|---|---|
3.2.5 | none | 132 MB | 32 MB | 78.76 seconds |
3.3.0 | none | 890 MB | 731 MB | 140.96 seconds |
3.3.0 | -p no:logging |
125 MB | 32 MB | 74.27 seconds |
3.3.0 | --no-print-logs |
819 MB | 711 MB | 140.96 seconds |
This indicates that the additional memory overhead seen in our full test environment when -p no:logging
is set is due to something other than the log capturing. Has the fixture setup/teardown lifecycle changed significantly?
Edit: it's also possible that this test doesn't really replicate our actual test environment well enough. We've got ~2500 tests, hundreds of fixtures, etc.
@mplanchard thanks a ton for the reproducible example, we really appreciate it.
I can reproduce your memory usage on my Windows computer. Indeed I suspected the logging
module when I first read your post a few minutes ago because it will append the captured log to the test items, which are kept alive for the entire test session. Setting --no-print-logs
should be enough as well as this prevents the logging module from appending the captured output to the items:
But this does not explain why even with -p no:logging
you still see an increase in memory in your full test environment of course. There has not been any significant change in fixture setup/teardown that I remember.
Are you passing -p no:logging
on the command line or in the pytest.ini
file? As #2964 shows, there might be a bug related to that.
In the full environment, I was passing it via the pytest.ini
file, although in my examples from my local machine I was just appending it to the command line (which seemed to work).
I can enable debugging, config tracing, etc. and run any particular combination that you think might be helpful. The tests take quite a while, so just let me know and I can run them in the background during the workday.
It's odd that --no-print-logs
is showing virtually the same memory consumption as with no options in both my local and the full test environment. Maybe it's the record list being kept by the LogCaptureHandler
? It seems like that might get appended to even if self.print_logs
is False
Maybe it's the record list being kept by the LogCaptureHandler?
Good point and I thought about that, but the handler is deleted from the item at the end of each phase:
I'm really puzzled that even with logging disabled with -p no:logging
you are still seeing the abnormal memory increase...
The memory usage seems to be higher from the start, which is what made me ask about differences in fixture setup. I'm doing a run now with -p no:logging
, --debug
, and --trace-config
. I'll need to review the output for sensitive information before sending it. Would it be helpful to also have a comparison of a run with the same options on 3.2.5?
Thanks for the patience in debugging this!
Would it be helpful to also have a comparison of a run with the same options on 3.2.5?
Hopefully yes, so it would be great if you could that as well. ๐
No problem! We love pytest, and use it for all our projects. It's the least we can do.
Also, just collecting statistics, out test suite outputs about 54000 log messages, which is about 10 MB, so even if every log message were held in active memory simultaneously, it would not explain the sudden jump in virtual memory by several gigabytes.
I'm having some issues with collecting tests in --debug
mode, because several of our fixtures rely on setting/transmitting a Flask application context, and having --debug
on seems to interfere with that. There's a relatively complex fixture hierarchy relying on those fixtures, and of course a lot of tests. Any ideas as to how to get around that?
___________________________________ ERROR collecting invision_api/tests/integration/v3/test_log.py ____________________________________
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:617: in __call__
return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:222: in _hookexec
return self._inner_hookexec(hook, methods, kwargs)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:179: in __call__
self.before(hook.name, hook_impls, kwargs)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:441: in before
hooktrace(hook_name, kwargs)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:160: in __call__
self.root.processmessage(self.tags, args)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:136: in processmessage
lines = self.format_message(tags, args)
.tox/py35/lib/python3.5/site-packages/pluggy/__init__.py:131: in format_message
lines.append("%s %s: %s\n" % (indent, name, value))
.tox/py35/lib/python3.5/site-packages/werkzeug/local.py:366: in <lambda>
__str__ = lambda x: str(x._get_current_object())
.tox/py35/lib/python3.5/site-packages/werkzeug/local.py:306: in _get_current_object
return self.__local()
.tox/py35/lib/python3.5/site-packages/flask/globals.py:51: in _find_app
raise RuntimeError(_app_ctx_err_msg)
E RuntimeError: Working outside of application context.
E
E This typically means that you attempted to use functionality that needed
E to interface with the current application object in a way. To solve
E this set up an application context with app.app_context(). See the
E documentation for more information.
I think I may have found something interesting. Looking at htop, I felt like there were more threads being used in 3.3.0, so I counted the threads every two seconds. With version 3.3.0, the thread count is increased by a factor of ~10 for the majority of the test, and then bumps up to a thread count increase of ~25 when it hits the latter tests (which have many more log messages, although I'm not sure that's causal).
Version 3.3.0 with -p no:logging
2 -> 4 -> 8 -> 28 -> 70 -> 136 -> 180 -> 226 -> 372 (majority of tests) -> 816 (beginning of latter tests till freeze)
Version 3.2.5 2 -> 4 -> 8 -> 30 -> 114 -> 148 -> 32 -> 34 (majority of tests) -> 166 (beginning of latter tests) -> 92 -> 150 -> 222 -> 90 -> 216 -> 82 -> 186 -> 220 -> 222 -> 230 -> 57 -> 38 -> 32 (until tests finished)
So, it looks like pytest may be holding onto threads that it was letting go previously, or spinning up more threads than it was before.
Edit: thread count collected with while true; do ps -eLf | grep -c "py.test"; sleep 2; done
And the thread count gets even higher with the logging plugin enabled.
Version 3.3.0 with default options 2 -> 4 -> 8 -> 4 -> 28 -> 30 -> 76 -> 130 -> 162 -> 202 -> 290 -> 348 -> 372 (majority of tests) -> 554 (beginning of latter tests) -> 712 -> 922 -> 1178 (freezes)
Hi @mplanchard, thanks a lot for the detailed statistics. The part about the high number of threads is interesting: pytest itself (nor py and pluggy) spawn any threads AFAIK. This to me also suggests it might be a side effect of something not being teardown at the appropriate time (fixtures might be one case as you suggest). I would love to help debug this further but it is hard to speculate further without looking at the source code.
I would try to reduce the tests and code involved step by step until I got to a minimal reproducer. For example running half of the test suite, looking for fixtures which might spawn threads and see if the same number of threads keep spawning if tests which don't use that fixture are executed, and so on. Really sorry I can't help more than that, but I will be sure to come back here if more suggestions pop in my head.
No worries. I didn't find any reference to the threading module in the pytest code, so I'm inclined to think you're right. I wonder if it might be due to the changes referenced in #2959? Several of our fixtures return mutable objects, and some of them, like DB connection instances, function via thread pools. I'm not sure how attr
implements the whole frozen
thing, so I'm not sure whether that's a reasonable conjecture.
I'll try to get a minimal example by this evening that demonstrates a difference in thread consumption at the lowest possible level.
Several of our fixtures return mutable objects, and some of them, like DB connection instances, function via thread pools. I'm not sure how attr implements the whole frozen thing, so I'm not sure whether that's a reasonable conjecture.
Actually the change in #2959 refers to the fact that the params
argument of pytest.fixture
could be mutated after the declaration and it would affect the parametrization with the mutated list; returning mutable objects from fixtures is common practice and should work.
Another thing to look for: do all your fixtures implement an explicit cleanup (if makes sense)? For example, explicitly closing a db connection or a destroying a thread pool? I can see some bug sneaking in that would keep the fixture frame alive for some reason, and a fixture which implicitly relied on the fact that the locals of that frame were destroyed for cleanup might start acting up (#2798 came to mind, but that was reported in 3.2).
You were absolutely right about explicit cleanup. We had a couple of test fixtures that were relying on an implicit closing of a MongoClient connection upon garbage collection (sneakily, because they were instantiating objects that instantiated clients). Updating these to close()
the clients explicitly fixed the memory leak so that the -p no:logging
utilization more closely mirrors that of 3.2.5.
I should have time this evening both to check the memory utilization with logging enabled and to hopefully generate a test based on our fixtures that is able to demonstrate the difference between 3.2.5 and 3.3.0.
I haven't really had time to follow this discussion, but FWIW I've had good experiences tracking down memory leaks using tracemalloc and/or pympler.muppy.
@mplanchard that's both great news (that you found a solution to your problem) and a bit of sad news (it seems we introduced a regression about leaking frames/fixtures in 3.3). ๐
We would really appreciate if you can come up with a reproducible example that demonstrates the differences between 3.2 and 3.3 regarding fixture cleanup. ๐
Rรก managed to find a reproducer in #2981! ๐
That is great news! I can verify that our case is also a session fixture being passed to a regular fixture.
Thanks so much for all your help and your quick responses. Weโll be pushing fixes for our test tear down tomorrow, which should allow us to get back to the most recent pytest. Weโd be happy to help verify any fixes as they come!
On Wed, Nov 29, 2017 at 5:53 PM Bruno Oliveira notifications@github.com wrote:
Rรก managed to find a reproducer in #2981 https://github.com/pytest-dev/pytest/issues/2981! ๐
โ You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pytest-dev/pytest/issues/2968#issuecomment-348036393, or mute the thread https://github.com/notifications/unsubscribe-auth/AGcI4Wg66Y5iti9nHY16VwSGRwvwgxLzks5s7e6OgaJpZM4Qt8Hv .
Thanks for the patience as well!
out test suite outputs about 54000 log messages, which is about 10 MB
With the fixed tear down code, could you please check the memory consumption of your test suite with the logging
plugin active?
@nicoddemus It seems like after integrating our teardown fixes, thread utilization and memory consumption is pretty much back to normal, even with the logging plugin enabled. In a run with pytest 3.3.0 and the logging plugin enabled, both metrics look pretty much identical to running with 3.2.5.
The changes required were remarkably minimal, but of course the fixtures are used in hundreds and hundreds of tests. I've included all of the required fixture changes for our test suite below:
-@pytest.yield_fixture()
-def legacy_app():
+@pytest.fixture(scope='session')
+def legacy_app_session():
+ """Yield a legacy application for the session."""
+ instance = api.create_custom_app()
+ yield instance
+ instance.mongo.db.client.close()
+
+
+@pytest.fixture()
+def legacy_app(legacy_app_session):
"""Yield an isolated flask test app instance"""
- app = api.create_custom_app()
- context = app.app_context()
+ context = legacy_app_session.app_context()
context.push()
- yield app
+ yield legacy_app_session
context.pop()
@pytest.fixture(scope='session')
def session_app(sql_session):
"""Return an app for the session, with no context"""
instance = api.create_app()
instance.config['TESTING'] = True
scoped_session = instance.sql.session
instance.sql.session = instance.sql.session()
yield instance
+ instance.mongo.db.client.close()
instance.sql.session.expunge_all()
instance.sql.session.rollback()
instance.sql.session.close()
@pytest.fixture()
-def db(mongo_util):
+def db(mongo_util, db_client):
"""Return a pymongo Database instance"""
- db = mongo_util.get_db(db_name=mongo_util.db_info.db)
+ db = db_client.get_database(mongo_util.db_info.db)
return db
@pytest.fixture(scope='session')
-def db_session(mongo_util):
+def db_session(mongo_util, db_client_session):
"""Return a pymongo Database instance"""
- db = mongo_util.get_db(db_name=mongo_util.db_info.db)
+ db = db_client_session.get_database(mongo_util.db_info.db)
return db
The mongo_util.get_db()
method being used previously created a client when it was called, which was then not getting closed. Also, the db_client
fixtures, which were not being used previously in the fixtures where they've been added, already had explicit teardowns.
That's great news, thanks @mplanchard for the followup! I'm really glad we got to the bottom of the issue and the solution was simple.
I'm closing this for now because the root of the issue is detailed in #2981 already. Thanks again for all the patience and understanding while we tracked down the problem! ๐
Bug Description
TL;DR: memory usage seems to have increased several fold in 3.3.0. With the default log capturing enabled, memory usage has the potential to increase to ludicrous levels.
I was troubleshooting issues with tests hanging on our CI server and massive swap utilization, and I found it to be due to an upgrade to pytest 3.3.0. Running our test suite with 3.3.0 quickly consumed all of the memory and began swapping.
Looking at the changelog, the capturing of log output seemed to be the most likely cause. Our application uses SQLAlchemy, which has verbose logging, and has rather verbose logging itself, so I attempted to disable log capturing via both of the options specified here, and while the first seemed to decrease the massive consumption of virtual memory seen with the default options, neither was able to reduce memory utilization enough for our test suite to run.
I have collected the memory utilization of various configurations below. For all of the 3.3.0 runs, the CI server began swapping and the test suite hung, although some configurations resulted in more virtual memory consumption than others.
In all cases, regardless of attempting to disable log capturing, the virtual memory output jumped significantly when hitting tests that produced a large amount of log output. However, even prior to hitting those tests, virtual memory utilization was around twice that of its peak in version 3.2.5.
The tests were running in a python:3.5.2 docker container, via an Ubuntu 16.04 OS. The system has 2 GB of RAM and 2 GB of swap space. In all cases, pytest was invoked via tox, with the command
coverage run --source <project directory> -m py.test {posargs}
addopts=-p no:logging
--no-print-logs
addopts=-p no:logging
and--no-print-logs
pytest is not included in the pip output below, since its version varied by test. All other packages remained consistent between tests. Some private distributions have been redacted.
We have had to pin our version of pytest to
<3.3.0
for the moment. Is there anything else I can try to reduce the base memory footprint of version 3.3.0?I'm happy to gather more data as needed.
Thanks!
Pip List
aiohttp (2.2.5) alembic (0.9.5) amqp (2.2.2) aniso8601 (1.3.0) apipkg (1.4) apispec (0.25.4) appnope (0.1.0) asn1crypto (0.23.0) astroid (1.5.3) async-timeout (2.0.0) attrs (17.3.0) autopep8 (1.3.3) Babel (2.5.1) billiard (3.5.0.3) bleach (2.1.1) celery (4.1.0) certifi (2017.7.27.1) cffi (1.11.2) chardet (3.0.4) click (6.7) coverage (4.4.2) cryptography (2.1.1) debtcollector (1.18.0) decorator (4.1.2) dill (0.2.7.1) docutils (0.14) entrypoints (0.2.3) execnet (1.5.0) flake8 (3.5.0) flasgger (0.8.0) Flask (0.12.2) Flask-RESTful (0.3.6) Flask-Script (2.0.6) frozendict (1.2) html5lib (1.0b10) httplib2 (0.10.3) idna (2.6) ipdb (0.10.3) ipykernel (4.6.1) ipython (6.2.1) ipython-genutils (0.2.0) ipywidgets (7.0.5) iso8601 (0.1.12) isort (4.2.15) itsdangerous (0.24) jedi (0.11.0) Jinja2 (2.9.6) jsonschema (2.6.0) jupyter (1.0.0) jupyter-client (5.1.0) jupyter-console (5.2.0) jupyter-core (4.4.0) keystoneauth1 (3.2.0) kombu (4.1.0) lazy-object-proxy (1.3.1) loggly-python-handler (1.0.0) Mako (1.0.7) MarkupSafe (1.0) marshmallow (2.13.6) marshmallow-sqlalchemy (0.13.1) mccabe (0.6.1) mistune (0.7.4) monotonic (1.3) msgpack-python (0.4.8) multidict (3.2.0) mypy (0.550) mysqlclient (1.3.12) nbconvert (5.3.1) nbformat (4.4.0) netaddr (0.7.19) netifaces (0.10.6) notebook (5.2.1) oslo.config (4.13.1) oslo.i18n (3.18.0) oslo.policy (1.28.1) oslo.serialization (2.21.1) oslo.utils (3.30.0) pandocfilters (1.4.2) parso (0.1.0) passlib (1.7.1) pbr (3.1.1) pexpect (4.2.1) pickleshare (0.7.4) pika (0.11.0) pip (9.0.1) pluggy (0.6.0) positional (1.2.1) prompt-toolkit (1.0.15) psutil (5.4.1) psycopg2 (2.7.3.1) ptyprocess (0.5.2) py (1.5.2) pycodestyle (2.3.1) pycparser (2.18) pydecor (1.1.3) pydocstyle (2.1.1) pyflakes (1.6.0) Pygments (2.2.0) PyJWT (1.5.3) pylint (1.7.4) pymongo (3.5.1) pyparsing (2.2.0) pyrabbit (1.1.0) pytest-aiohttp (0.1.3) pytest-cov (2.5.1) pytest-forked (0.2) pytest-xdist (1.20.1) python-dateutil (2.6.1) python-editor (1.0.3) python-keystoneclient (3.13.0) python-memcached (1.58) pytz (2017.2) PyYAML (3.12) pyzmq (16.0.3) qtconsole (4.3.1) redis (2.10.6) requests (2.18.4) requests-futures (0.9.7) restructuredtext-lint (1.1.2) rfc3986 (1.1.0) setuptools (20.10.1) simplegeneric (0.8.1) simplejson (3.11.1) six (1.11.0) snowballstemmer (1.2.1) SQLAlchemy (1.1.14) SQLAlchemy-Utils (0.32.18) stevedore (1.27.1) stripe (1.67.0) terminado (0.8.1) testpath (0.3.1) tornado (4.5.2) tox (2.9.1) traitlets (4.3.2) typed-ast (1.1.0) typing (3.6.2) urllib3 (1.22) vine (1.1.4) virtualenv (15.1.0) wcwidth (0.1.7) webargs (1.8.1) webencodings (0.5.1) Werkzeug (0.12.2) widgetsnbextension (3.0.8) wrapt (1.10.11) yarl (0.13.0)
Submission Checklist
Thanks for submitting an issue!
Here's a quick checklist in what to include:
pip list
of the virtual environment you are using