Closed amine-ammor closed 1 year ago
looks like the tests are failing, what OS are you using? psutil works differently in difference OSs, so it might be that.
I am using a dockerized ubuntu container, my test assume that the CI pipeline executes the tests sequentially, in order to track and compare the memory usage, if other tests runs in parallel, the outcome of the test won't be deterministic, and a lock must be added during the tests to prevent memory consumption by other process.
Is is the case ?
I think that I will also remove assertions done in the middle, to make the test simpler to maintain, and keep only the final comparison, I'll change the epsilon value to a threshold that compares the rate between the memory consumption at the end subtracted by the initial value and the memory consumption at its peak subtracted by the initial value.
This rate should be negligible in the case of absence of memory leak.
Is is the case ?
yes, tests are executed serially.
The test is also failing locally for me (mac M2):
(engine) eduardo@macbookair dev/ploomber-engine (memory-release) ยป pytest tests/test_memory_release_nb.py
================================================================================ test session starts ================================================================================
platform darwin -- Python 3.10.9, pytest-7.2.2, pluggy-1.0.0
rootdir: /Users/eduardo/dev/ploomber-engine, configfile: pyproject.toml
plugins: anyio-3.6.2
collected 1 item
tests/test_memory_release_nb.py F [100%]
===================================================================================== FAILURES ======================================================================================
_____________________________________________________________________ test_if_memory_leak_within_notebook[0.01] _____________________________________________________________________
path_notebook = PosixPath('/private/var/folders/p_/tdg15_tn0rq_b63np2rwrrpc0000gn/T/pytest-of-eduardo/pytest-41/test_if_memory_leak_within_not0/noteboook.ipynb'), epsilon = 0.01
@pytest.mark.parametrize(
"epsilon,",
[
0.01,
],
)
def test_if_memory_leak_within_notebook(path_notebook, epsilon):
"""
epsilon is an amount of memory that is negligeable with respect
to the data declared on the test,
"""
memory_usage_start = get_current_memory_usage()
for _ in range(2):
# memory_usages.append(get_current_memory_usage())
memory_usage_begining_loop = get_current_memory_usage()
client = PloomberClient.from_path(path_notebook)
# injecting new variables in namespace
namespace = client.get_namespace(
namespace=dict(a=[1 for _ in range(10**7)], b=[1 for _ in range(10**7)])
)
memory_usage_with_namespace_in_mem = get_current_memory_usage()
# executing notebook with execute method
nb_node = client.execute()
del nb_node
memory_usage_after_exec_method = get_current_memory_usage()
assert (
abs(
memory_usage_after_exec_method["used"]
- memory_usage_with_namespace_in_mem["used"]
)
< epsilon
)
> assert (
abs(
memory_usage_after_exec_method["free"]
- memory_usage_with_namespace_in_mem["free"]
)
< epsilon
)
E assert 80.0 < 0.01
E + where 80.0 = abs((115.0 - 35.0))
tests/test_memory_release_nb.py:80: AssertionError
------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------
free 33.9
used 5.6
free 33.9
used 5.6
free 35.0
used 5.6
free 115.0
used 5.6
------------------------------------------------------------------------------- Captured stderr call --------------------------------------------------------------------------------
Executing cell: 3: 100%|โโโโโโโโโโ| 3/3 [00:00<00:00, 19.40it/s]
Executing cell: 3: 100%|โโโโโโโโโโ| 3/3 [00:00<00:00, 25.98it/s]
================================================================================= warnings summary ==================================================================================
../../miniconda3/envs/engine/lib/python3.10/site-packages/ansiwrap/core.py:6
/Users/eduardo/miniconda3/envs/engine/lib/python3.10/site-packages/ansiwrap/core.py:6: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
import imp
../../miniconda3/envs/engine/lib/python3.10/site-packages/jupyter_client/connect.py:20
/Users/eduardo/miniconda3/envs/engine/lib/python3.10/site-packages/jupyter_client/connect.py:20: DeprecationWarning: Jupyter is migrating its paths to use standard platformdirs
given by the platformdirs library. To remove this warning and
see the appropriate new directories, set the environment variable
`JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
The use of platformdirs will be the default in `jupyter_core` v6
from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================== short test summary info ==============================================================================
FAILED tests/test_memory_release_nb.py::test_if_memory_leak_within_notebook[0.01] - assert 80.0 < 0.01
=========================================================================== 1 failed, 2 warnings in 2.79s ===========================================================================
We've experienced this before: testing for memory consumption is challenging since many factors can influence the measurements. Perhaps we could test that the variables are released after running a notebook? (check they no longer exist in the user_ns) It's not a perfect test but will be reliable.
Yes, that's a good idea, I integrated this test into the pipeline using monkey-patching. I also added the "memory" marker to this tests (maybe it would be interesting, to regroup , all the tests linked to memory usage with the same marker), and added the suggested metric above as a print , for visual indication.
Also, I tried to run the code on a vm instance from GCP. I followed the docs for contribution to ploomber (https://ploomber-contributing.readthedocs.io/en/latest/contributing/setup.html) I got 5 tests that failed on the 'main branch' (and also for my branch). Are these tests ignored on the CI pipeline ran by github ?
I can propose another solution, as soon as I have more time to work on it, this week or the next
great work! feel free to submit another PR if you find a better approach!
I'll make a release now
Describe your changes
added the private method "_get_interactive_variables" to "PloomberShell" to detect all the variables defined within the notebook. added the method "delete_interactive_variables" to "PlommberShell" to delete all the references of the interactive variables.
applied this last method after the notebook exectuion at the exit method.
added test on test_memory_release_nb.py to track the memory usage.
Closes #74
:books: Documentation preview :books:: https://ploomber-engine--75.org.readthedocs.build/en/75/