ploomber / ploomber-engine

A toolbox ๐Ÿงฐ for Jupyter notebooks ๐Ÿ“™: testing, experiment tracking, debugging, profiling, and more!
https://engine.ploomber.io
BSD 3-Clause "New" or "Revised" License
59 stars 14 forks source link

Memory leak resolution #75

Closed amine-ammor closed 1 year ago

amine-ammor commented 1 year ago

Describe your changes

added the private method "_get_interactive_variables" to "PloomberShell" to detect all the variables defined within the notebook. added the method "delete_interactive_variables" to "PlommberShell" to delete all the references of the interactive variables.

applied this last method after the notebook exectuion at the exit method.

added test on test_memory_release_nb.py to track the memory usage.

Closes #74


:books: Documentation preview :books:: https://ploomber-engine--75.org.readthedocs.build/en/75/

edublancas commented 1 year ago

looks like the tests are failing, what OS are you using? psutil works differently in difference OSs, so it might be that.

amine-ammor commented 1 year ago

I am using a dockerized ubuntu container, my test assume that the CI pipeline executes the tests sequentially, in order to track and compare the memory usage, if other tests runs in parallel, the outcome of the test won't be deterministic, and a lock must be added during the tests to prevent memory consumption by other process.

Is is the case ?

I think that I will also remove assertions done in the middle, to make the test simpler to maintain, and keep only the final comparison, I'll change the epsilon value to a threshold that compares the rate between the memory consumption at the end subtracted by the initial value and the memory consumption at its peak subtracted by the initial value.

This rate should be negligible in the case of absence of memory leak.

edublancas commented 1 year ago

Is is the case ?

yes, tests are executed serially.


The test is also failing locally for me (mac M2):

(engine)  eduardo@macbookair dev/ploomber-engine (memory-release) ยป pytest tests/test_memory_release_nb.py
================================================================================ test session starts ================================================================================
platform darwin -- Python 3.10.9, pytest-7.2.2, pluggy-1.0.0
rootdir: /Users/eduardo/dev/ploomber-engine, configfile: pyproject.toml
plugins: anyio-3.6.2
collected 1 item

tests/test_memory_release_nb.py F                                                                                                                                             [100%]

===================================================================================== FAILURES ======================================================================================
_____________________________________________________________________ test_if_memory_leak_within_notebook[0.01] _____________________________________________________________________

path_notebook = PosixPath('/private/var/folders/p_/tdg15_tn0rq_b63np2rwrrpc0000gn/T/pytest-of-eduardo/pytest-41/test_if_memory_leak_within_not0/noteboook.ipynb'), epsilon = 0.01

    @pytest.mark.parametrize(
        "epsilon,",
        [
            0.01,
        ],
    )
    def test_if_memory_leak_within_notebook(path_notebook, epsilon):
        """
        epsilon is an amount of memory that is negligeable with respect
        to the data declared on the test,
        """
        memory_usage_start = get_current_memory_usage()

        for _ in range(2):
            # memory_usages.append(get_current_memory_usage())
            memory_usage_begining_loop = get_current_memory_usage()
            client = PloomberClient.from_path(path_notebook)

            # injecting new variables in namespace
            namespace = client.get_namespace(
                namespace=dict(a=[1 for _ in range(10**7)], b=[1 for _ in range(10**7)])
            )
            memory_usage_with_namespace_in_mem = get_current_memory_usage()

            # executing notebook with execute method

            nb_node = client.execute()

            del nb_node
            memory_usage_after_exec_method = get_current_memory_usage()

            assert (
                abs(
                    memory_usage_after_exec_method["used"]
                    - memory_usage_with_namespace_in_mem["used"]
                )
                < epsilon
            )
>           assert (
                abs(
                    memory_usage_after_exec_method["free"]
                    - memory_usage_with_namespace_in_mem["free"]
                )
                < epsilon
            )
E           assert 80.0 < 0.01
E            +  where 80.0 = abs((115.0 - 35.0))

tests/test_memory_release_nb.py:80: AssertionError
------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------
free 33.9
used 5.6
free 33.9
used 5.6
free 35.0
used 5.6
free 115.0
used 5.6
------------------------------------------------------------------------------- Captured stderr call --------------------------------------------------------------------------------
Executing cell: 3: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 3/3 [00:00<00:00, 19.40it/s]
Executing cell: 3: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 3/3 [00:00<00:00, 25.98it/s]
================================================================================= warnings summary ==================================================================================
../../miniconda3/envs/engine/lib/python3.10/site-packages/ansiwrap/core.py:6
  /Users/eduardo/miniconda3/envs/engine/lib/python3.10/site-packages/ansiwrap/core.py:6: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
    import imp

../../miniconda3/envs/engine/lib/python3.10/site-packages/jupyter_client/connect.py:20
  /Users/eduardo/miniconda3/envs/engine/lib/python3.10/site-packages/jupyter_client/connect.py:20: DeprecationWarning: Jupyter is migrating its paths to use standard platformdirs
  given by the platformdirs library.  To remove this warning and
  see the appropriate new directories, set the environment variable
  `JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
  The use of platformdirs will be the default in `jupyter_core` v6
    from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================== short test summary info ==============================================================================
FAILED tests/test_memory_release_nb.py::test_if_memory_leak_within_notebook[0.01] - assert 80.0 < 0.01
=========================================================================== 1 failed, 2 warnings in 2.79s ===========================================================================

We've experienced this before: testing for memory consumption is challenging since many factors can influence the measurements. Perhaps we could test that the variables are released after running a notebook? (check they no longer exist in the user_ns) It's not a perfect test but will be reliable.

amine-ammor commented 1 year ago

Yes, that's a good idea, I integrated this test into the pipeline using monkey-patching. I also added the "memory" marker to this tests (maybe it would be interesting, to regroup , all the tests linked to memory usage with the same marker), and added the suggested metric above as a print , for visual indication.

amine-ammor commented 1 year ago

Also, I tried to run the code on a vm instance from GCP. I followed the docs for contribution to ploomber (https://ploomber-contributing.readthedocs.io/en/latest/contributing/setup.html) I got 5 tests that failed on the 'main branch' (and also for my branch). image Are these tests ignored on the CI pipeline ran by github ?

amine-ammor commented 1 year ago

I can propose another solution, as soon as I have more time to work on it, this week or the next

edublancas commented 1 year ago

great work! feel free to submit another PR if you find a better approach!

edublancas commented 1 year ago

I'll make a release now