holoviz / panel

Panel: The powerful data exploration & web app framework for Python
https://panel.holoviz.org
BSD 3-Clause "New" or "Revised" License
4.69k stars 508 forks source link

Panel + Datashader app indicate serious memory leak #4096

Open MarcSkovMadsen opened 1 year ago

MarcSkovMadsen commented 1 year ago

At awesome-panel.org and at work I can see the memory of my apps steadily increase to a point where I need to restart them. Its been like that for a long period of time. A liveness probe has to some extend solved the problem so I've been caring about other problems.

But I see awesome-panel.org getting slower and slower over time until I restart it. Triggered by @sophiamyang that reached out yesterday about the slowness I now want to identify the cause.

One issue I see once in a while in the logs is

bokeh.document.modules - ERROR - Module <module 'bokeh_app_35f6d52a842147efb3c893bb26bdfdb1' from '/app/examples/lib_datashader.py'> has extra unexpected referrers! This could indicate a serious memory leak. Extra referrers: [<cell at 0x7efd70a65550: module object at 0x7efd47b41900>]

It is caused by this app

"""
The purpose of this app is to demonstrate that Panel works with the tools you know and love
&#10084;&#65039;, including Datashader. It supports both light and dark theme.
"""
import hvplot.xarray  # pylint: disable=unused-import
import panel as pn
import xarray as xr

pn.extension(template="fast")

ACCENT = "#1f77b4"

if not "air" in pn.state.cache:
    air = pn.state.cache["air"] = xr.tutorial.open_dataset("air_temperature").load().air
else:
    air = pn.state.cache["air"]

def get_plot(accent_base_color=ACCENT):
    """Returns a datashaded hvplot"""
    plot = air.hvplot.scatter(
        "time",
        groupby=[],
        rasterize=True,
        dynspread=True,
        responsive=True,
        cmap="YlOrBr",
        colorbar=True,
    ) * air.mean(["lat", "lon"]).hvplot.line("time", color=accent_base_color, responsive=True)
    plot.opts(responsive=True, active_tools=["box_zoom"])
    return plot

PLOT = get_plot()

pn.pane.HoloViews(PLOT, min_height=500, sizing_mode="stretch_both").servable()

Unfortunately, I cannot trigger the problem.

Name: panel
Version: 0.14.1
Name: hvplot
Version: 0.8.1
Name: datashader
Version: 0.14.2
Name: holoviews
Version: 1.15.1
TheoMathurin commented 1 year ago

The same error in relation to a datashader import has been reported before (see #2640). In my case I don't see it anymore though.

Edit: My mistake, I still see it with panel 0.14.1 and datashader 0.14.2!

hoxbro commented 1 year ago

I don't think this problem is related to datashader as I can see the memory leak without it.

I think there are two underlying issues. I'm not entirely sure if it lies in hvplot, holoviews, or both.

Problem 1

The conversion from dataset to dataframe is non-optional. What is done is this (ref):

import xarray as xr

ds = xr.tutorial.open_dataset("air_temperature").load().air
dfm = ds.to_dataframe()
df = dfm.reset_index()

print(f"xarray dataset:    {ds.nbytes / 1024 ** 2:0.2f} MB")
print(f"pandas multiindex: {dfm.memory_usage().sum() / 1024 ** 2:0.2f} MB")
print(f"pandas:            {df.memory_usage().sum() / 1024 ** 2:0.2f} MB")
xarray dataset:    14.76 MB
pandas multiindex: 29.61 MB
pandas:            103.31 MB

Problem 2

The memory leak is then this big DataFrame that is not cleaned up. It could be related to some DynamicMap cache.

When reloading the page, the memory is never released.

import hvplot.xarray  # noqa
import panel as pn
import xarray as xr

def get_plot():
    with xr.tutorial.open_dataset("air_temperature") as ds:
        air = ds.air
    plot = air.hvplot.line("time")
    return plot

pn.pane.HoloViews(get_plot()).servable()

image

An example of a dashboard that releases the memory could be the following. Though I have noticed, I need to close all session before it is released, which I also think is wrong. The smaller memory is because the code bypassed problem 1.

import hvplot.xarray  # noqa
import panel as pn
import xarray as xr
import holoviews as hv

def get_plot():
    with xr.tutorial.open_dataset("air_temperature") as ds:
        air = ds.air
    plot = hv.Curve(air, kdims=["time"], vdims=["lat", "lon"])
    return plot

pn.pane.HoloViews(get_plot()).servable()

image

TheoMathurin commented 1 year ago

There are two distinct origins for this error then because datashader is for sure also involved in this (albeit indirectly, see discussion on the linked thread).

I get it with this simple app script:

import panel as pn
import datashader

pn.Row('Hey').servable()
hoxbro commented 1 year ago

I think what you are seeing is a red herring of the actual problem (though still a problem) because when you use datashader you often have a large dataset that makes it easier to spot the memory leak.

The error message in your simple script is related to numba's JIT. As I don't get the error message when running NUMBA_DISABLE_JIT=1 panel serve app.py where app.py is your simple script. Though I still get the memory leak when running the original piece of code with NUMBA_DISABLE_JIT

philippjfr commented 1 year ago

At some point I had tracked down the memory leak in numba and they pushed a fix here: https://github.com/numba/numba/commit/c7752db9b61e131f6cd435d72069129d4d088f11

However it looks like there's probably another leak somewhere. If I had to guess the JIT cache maybe holds onto a reference of the stack which triggers it compilation which means that the session associated with it never gets released. Here was my original writeup of the issue:

So I’ve been investigating a potential memory leak in Bokeh/Panel applications and this unfortunately led me to you fine folks. Bokeh server does something kind of wild when you serve an application, which is to dynamically create modules. That code looks something like this:

        nodes = ast.parse(source, os.fspath(path))
        self._code = compile(nodes, filename=path, mode='exec', dont_inherit=True)

        module_name = 'bokeh_app_' + make_globally_unique_id().replace('-', '')
        module = ModuleType(module_name)
        module.__dict__['__file__'] = os.path.abspath(self._path)
        if self._package:
            module.__package__ = self._package.__name__
            module.__path__ = [os.path.dirname(self._path)]
        if basename(self.path) == "__init__.py":
            module.__package__ = module_name
            module.__path__ = [os.path.dirname(self._path)]

When a user kills a session this module is meant to be cleaned up, ensuring that any variables and data declared in the module is garbage collected. Where numba comes into this is that when I run import numba.cuda it seems like something keeps a reference to this dynamically created module, which means it doesn’t get cleaned up. Does anyone have an initial guess what that might be? Otherwise I’ll keep investigating.

Migacz85 commented 1 year ago

Hi Guys,

I found exactly the same issue today and for me this is serious issue.

panel=14.4 bokeh=2.4 holoviews=1.15.4 hvplot = 0.8.2

image

Restarting after 2 days reduced Ram & CPU usage and app was working again. Do we have any progress with this issue? Even some partial/unsafe solution from @Hoxbro was something. I don't want to implement ugly solution of automatic restarting of this server every few hours. If I understand correctly not using pandas data frame should help mitigate the issue ? What do we know about current state of this bug ? That's really serious thing and that's eliminating a possibility to use this project in any more advanced situations :|