ploomber / ploomber-engine

A toolbox 🧰 for Jupyter notebooks 📙: testing, experiment tracking, debugging, profiling, and more!
https://engine.ploomber.io
BSD 3-Clause "New" or "Revised" License
66 stars 14 forks source link

memory leak with "PloomberClient" #74

Closed amine-ammor closed 1 year ago

amine-ammor commented 1 year ago

Hello, I am a new user of the framework. I encountered the following problem when using ploomber-engine:

when calling a notebook using ploomber-engine, with the methods 'get_namespace' or 'execute' of PloomberClient , the memory reserved by the kernel used by the notebook is not released.

For example, if we consider the following notebook with one cell, named "temp_test.py", saved as a python file with jupytext.

import torch
with torch.no_grad():   
    tensor = torch.ones((10000,10000))

if we execute it using this script:

import psutil
def print_memory():
    """method to get free and used memory"""
    print(psutil._common.bytes2human(psutil.virtual_memory().free),
      psutil._common.bytes2human(psutil.virtual_memory().used))

from pathlib import Path
from ploomber_engine.ipython import PloomberClient
import jupytext
path_notebook_as_py = "./temp_test.py"
path_notebook = path_notebook_as_py.replace(".py",".ipynb")

jupytext.write(nb=jupytext.read(path_notebook_as_py),
                       fp =path_notebook)

for _ in range(3):
    print_memory()

    client = PloomberClient.from_path(path_notebook)

    namespace = client.get_namespace()
    del client
    del namespace
print_memory()

the deletion of client, doesn't free memory reserved by the notebook.

I get, in fact the following output which indicate that the memory is not entirely freed:

20.6G 2.2G

Executing cell: 1: 100%|██████████████████████████| 1/1 [00:01<00:00,  1.23s/it]

20.1G 2.7G

Executing cell: 1: 100%|██████████████████████████| 1/1 [00:00<00:00, 17.93it/s]

19.7G 3.1G

Executing cell: 1: 100%|██████████████████████████| 1/1 [00:00<00:00, 18.98it/s]

19.4G 3.5G

Is there a clean way to free the memory by the executed notebook?

I also tried this variant :

for _ in range(3):
    print_memory()

    client = PloomberClient.from_path(path_notebook)

    nb = client.execute()

print_memory()

but I got the same problem.

Also am I using the api correctly ? Thank you for your reply.

edublancas commented 1 year ago

Hi, thanks for your feedback!

This is the object we use for executing the code, so my guess is that the problem is there.

This is a subclass of InteractiveShell from IPython, but the docs don't mention anything about releasing memory.

Python doesn't have much flexibility for managing memory but there might be stuff we can do with the garbage collection module. Do you have time to give it a try?

amine-ammor commented 1 year ago

Yes , thank you for the references. I would like to work on the pull request. I have some free time this week. I'll keep you updated.

edublancas commented 1 year ago

awesome!

amine-ammor commented 1 year ago

I just posted a pull request, for this issue, you can now review the proposed solution.