holoviz / hvplot

A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews
https://hvplot.holoviz.org
BSD 3-Clause "New" or "Revised" License
1.07k stars 105 forks source link

Memory leak #501

Open jmontoyam opened 4 years ago

jmontoyam commented 4 years ago

Hello,

first of all, thank you very much for this amazing project! ;). I think I have detected a possible memory leak. In my use case, I use a function that generates a big xarray DataArray, and another function that receives as input such big xarray DataArray an generates an image from it using hvplot.image(rasterize=True). I use jupyterlab to interact with such functions and visualize the generated image. I have noticed that every time I re-execute the code cell containing both functions, the memory usage keeps growing and growing. Please see the memory usage showed in the attached images (top right corner):

First execution of the code cell: execution1

Second execution of the code cell: execution2

Third execution of the code cell: execution3

I tried the suggestion given by @philippjfr in holoviz/holoviews#1821 (%reset out) but it did not work.

Expected behavior: I expected the memory usage to keep constant between re-execution of the same code cell (the memory occupied but the data generated in the previous execution is supposed to be garbage collected), am I right?, or am I missing something?.

Software info:

hvplot: 0.6.0 holoviews: 1.13.3 datashader: 0.11.0 bokeh: 2.1.1

jupyter core : 4.6.3 jupyter-notebook : 6.0.3 qtconsole : 4.7.5 ipython : 7.16.1 ipykernel : 5.3.0 jupyter client : 6.1.3 jupyter lab : 2.1.5 nbconvert : 5.6.1 ipywidgets : 7.5.1 nbformat : 5.0.7 traitlets : 4.3.3

JupyterLab v2.1.5 Known labextensions: @bokeh/jupyter_bokeh v2.0.2 enabled OK @jupyter-widgets/jupyterlab-manager v2.0.0 enabled OK @pyviz/jupyterlab_pyviz v1.0.4 enabled OK

Google Chrome Version 84.0.4147.125 (Official Build) (64-bit)

OS Ubuntu 18.04.4 LTS

Thank you very much for all your help! ;)

jmakov commented 2 years ago

Same with df[df.columns[0]].hvplot(datashade=True) - memory increases on every execution (hvplot=0.7.3). Cannot use %reset out (causes an exception) because following https://docs.ray.io/en/latest/using-ray-with-jupyter.html?highlight=jupyter.

conda's env.yaml:

name: test
channels:
  - pyviz
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=1_gnu
  - abseil-cpp=20210324.2=h9c3ff4c_0
  - alembic=1.7.3=pyhd8ed1ab_0
  - alsa-lib=1.2.3=h516909a_0
  - anyio=3.3.0=py37h89c1867_0
  - argcomplete=1.12.3=pyhd8ed1ab_2
  - argon2-cffi=20.1.0=py37h5e8e339_2
  - arrow-cpp=5.0.0=py37hdf48254_5_cpu
  - async_generator=1.10=py_0
  - attrs=21.2.0=pyhd8ed1ab_0
  - autopage=0.4.0=pyhd8ed1ab_0
  - aws-c-cal=0.5.11=h95a6274_0
  - aws-c-common=0.6.2=h7f98852_0
  - aws-c-event-stream=0.2.7=h3541f99_13
  - aws-c-io=0.10.5=hfb6a706_0
  - aws-checksums=0.1.11=ha31a3da_7
  - aws-sdk-cpp=1.8.186=hb4091e7_3
  - babel=2.9.1=pyh44b312d_0
  - backcall=0.2.0=pyh9f0ad1d_0
  - backports=1.0=py_2
  - backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
  - backports.zoneinfo=0.2.1=py37h5e8e339_4
  - bleach=4.1.0=pyhd8ed1ab_0
  - bokeh=2.3.3=py37h89c1867_0
  - brotlipy=0.7.0=py37h5e8e339_1001
  - bzip2=1.0.8=h7f98852_4
  - c-ares=1.17.2=h7f98852_0
  - ca-certificates=2021.5.30=ha878542_0
  - certifi=2021.5.30=py37h89c1867_0
  - cffi=1.14.6=py37hc58025e_0
  - chardet=4.0.0=py37h89c1867_1
  - charset-normalizer=2.0.0=pyhd8ed1ab_0
  - click=8.0.1=py37h89c1867_0
  - clickhouse-cityhash=1.0.2.3=py37h3340039_2
  - clickhouse-driver=0.2.1=py37h5e8e339_0
  - cliff=3.9.0=pyhd8ed1ab_0
  - cloudpickle=2.0.0=pyhd8ed1ab_0
  - cmaes=0.8.2=pyh44b312d_0
  - cmd2=2.2.0=py37h89c1867_0
  - colorama=0.4.4=pyh9f0ad1d_0
  - colorcet=2.0.6=pyhd8ed1ab_0
  - colorlog=6.4.1=py37h89c1867_0
  - conda=4.10.3=py37h89c1867_1
  - conda-package-handling=1.7.3=py37h5e8e339_0
  - cramjam=2.3.1=py37h5e8e339_1
  - cryptography=3.4.7=py37h5d9358c_0
  - cycler=0.10.0=py_2
  - cytoolz=0.11.0=py37h5e8e339_3
  - dask=2021.9.0=pyhd8ed1ab_0
  - dask-core=2021.9.0=pyhd8ed1ab_0
  - datashader=0.13.0=pyh6c4a22f_0
  - datashape=0.5.4=py_1
  - dbus=1.13.6=h48d8840_2
  - debugpy=1.4.1=py37hcd2ae1e_0
  - decorator=5.1.0=pyhd8ed1ab_0
  - defusedxml=0.7.1=pyhd8ed1ab_0
  - distributed=2021.9.0=py37h89c1867_0
  - entrypoints=0.3=py37hc8dfbb8_1002
  - expat=2.4.1=h9c3ff4c_0
  - fastparquet=0.7.1=py37hb1e94ed_0
  - filelock=3.0.12=pyh9f0ad1d_0
  - fontconfig=2.13.1=hba837de_1005
  - freetype=2.10.4=h0708190_1
  - fsspec=2021.8.1=pyhd8ed1ab_0
  - gettext=0.19.8.1=h0b5b191_1005
  - gflags=2.2.2=he1b5a44_1004
  - gitdb=4.0.7=pyhd8ed1ab_0
  - gitpython=3.1.23=pyhd8ed1ab_1
  - glib=2.68.4=h9c3ff4c_0
  - glib-tools=2.68.4=h9c3ff4c_0
  - glog=0.5.0=h48cff8f_0
  - greenlet=1.1.1=py37hcd2ae1e_0
  - grpc-cpp=1.40.0=h850795e_0
  - gst-plugins-base=1.18.5=hf529b03_0
  - gstreamer=1.18.5=h76c114f_0
  - heapdict=1.0.1=py_0
  - holoviews=1.14.5=py_0
  - hvplot=0.7.3=py_0
  - icu=68.1=h58526e2_0
  - idna=3.1=pyhd3deb0d_0
  - importlib-metadata=4.8.1=py37h89c1867_0
  - importlib_metadata=4.8.1=hd8ed1ab_0
  - importlib_resources=5.2.2=pyhd8ed1ab_0
  - ipykernel=6.4.1=py37h6531663_0
  - ipympl=0.7.0=pyhd8ed1ab_0
  - ipython=7.27.0=py37h6531663_0
  - ipython_genutils=0.2.0=py_1
  - ipywidgets=7.6.5=pyhd8ed1ab_0
  - jbig=2.1=h7f98852_2003
  - jedi=0.18.0=py37h89c1867_2
  - jinja2=3.0.1=pyhd8ed1ab_0
  - joblib=1.0.1=pyhd8ed1ab_0
  - jpeg=9d=h36c2ea0_0
  - json5=0.9.5=pyh9f0ad1d_0
  - jsonschema=3.2.0=py37hc8dfbb8_1
  - jupyter-server-mathjax=0.2.3=pyhd8ed1ab_0
  - jupyter_client=7.0.2=pyhd8ed1ab_0
  - jupyter_contrib_core=0.3.3=py_2
  - jupyter_contrib_nbextensions=0.5.1=py37hc8dfbb8_1
  - jupyter_core=4.7.1=py37h89c1867_0
  - jupyter_highlight_selected_word=0.2.0=py37h89c1867_1002
  - jupyter_latex_envs=1.4.6=py37h89c1867_1001
  - jupyter_nbextensions_configurator=0.4.1=py37h89c1867_2
  - jupyter_server=1.11.0=pyhd8ed1ab_0
  - jupyterlab=3.1.11=pyhd8ed1ab_0
  - jupyterlab-git=0.32.2=pyhd8ed1ab_0
  - jupyterlab_pygments=0.1.2=pyh9f0ad1d_0
  - jupyterlab_server=2.8.1=pyhd8ed1ab_0
  - jupyterlab_widgets=1.0.2=pyhd8ed1ab_0
  - kiwisolver=1.3.2=py37h2527ec5_0
  - krb5=1.19.2=hcc1bbae_0
  - lcms2=2.12=hddcbb42_0
  - ld_impl_linux-64=2.36.1=hea4e1c9_2
  - lerc=2.2.1=h9c3ff4c_0
  - libarchive=3.5.2=hccf745f_0
  - libblas=3.9.0=11_linux64_openblas
  - libbrotlicommon=1.0.9=h7f98852_5
  - libbrotlidec=1.0.9=h7f98852_5
  - libbrotlienc=1.0.9=h7f98852_5
  - libcblas=3.9.0=11_linux64_openblas
  - libclang=11.1.0=default_ha53f305_1
  - libcurl=7.78.0=h2574ce0_0
  - libdeflate=1.7=h7f98852_5
  - libedit=3.1.20191231=he28a2e2_2
  - libev=4.33=h516909a_1
  - libevent=2.1.10=hcdb4288_3
  - libffi=3.3=h58526e2_2
  - libgcc-ng=11.1.0=hc902ee8_8
  - libgfortran-ng=11.1.0=h69a702a_8
  - libgfortran5=11.1.0=h6c583b3_8
  - libglib=2.68.4=h3e27bee_0
  - libgomp=11.1.0=hc902ee8_8
  - libiconv=1.16=h516909a_0
  - liblapack=3.9.0=11_linux64_openblas
  - libllvm11=11.1.0=hf817b99_2
  - libnghttp2=1.43.0=h812cca2_0
  - libogg=1.3.4=h7f98852_1
  - libopenblas=0.3.17=pthreads_h8fe5266_1
  - libopus=1.3.1=h7f98852_1
  - libpng=1.6.37=h21135ba_2
  - libpq=13.3=hd57d9b9_0
  - libprotobuf=3.16.0=h780b84a_0
  - libsodium=1.0.18=h36c2ea0_1
  - libsolv=0.7.19=h780b84a_5
  - libssh2=1.10.0=ha56f1ee_0
  - libstdcxx-ng=11.1.0=h56837e0_8
  - libta-lib=0.4.0=h516909a_0
  - libthrift=0.14.2=he6d91bd_1
  - libtiff=4.3.0=hf544144_1
  - libutf8proc=2.6.1=h7f98852_0
  - libuuid=2.32.1=h7f98852_1000
  - libuv=1.42.0=h7f98852_0
  - libvorbis=1.3.7=h9c3ff4c_0
  - libwebp-base=1.2.1=h7f98852_0
  - libxcb=1.13=h7f98852_1003
  - libxkbcommon=1.0.3=he3ba5ed_0
  - libxml2=2.9.12=h72842e0_0
  - libxslt=1.1.33=h15afd5d_2
  - llvmlite=0.37.0=py37h9d7f4d0_0
  - locket=0.2.0=py_2
  - lxml=4.6.3=py37h77fd288_0
  - lz4-c=1.9.3=h9c3ff4c_1
  - lzo=2.10=h516909a_1000
  - mako=1.1.5=pyhd8ed1ab_0
  - mamba=0.15.3=py37h7f483ca_0
  - markdown=3.3.4=pyhd8ed1ab_0
  - markupsafe=2.0.1=py37h5e8e339_0
  - matplotlib=3.4.3=py37h89c1867_0
  - matplotlib-base=3.4.3=py37h1058ff1_0
  - matplotlib-inline=0.1.3=pyhd8ed1ab_0
  - mistune=0.8.4=py37h5e8e339_1004
  - modin-core=0.10.2=py37h89c1867_3
  - modin-ray=0.10.2=py37h89c1867_3
  - msgpack-python=1.0.2=py37h2527ec5_1
  - multipledispatch=0.6.0=py_0
  - mysql-common=8.0.25=ha770c72_2
  - mysql-libs=8.0.25=hfa10184_2
  - nb_conda_kernels=2.3.1=py37h89c1867_0
  - nbclassic=0.3.1=pyhd8ed1ab_1
  - nbclient=0.5.4=pyhd8ed1ab_0
  - nbconvert=6.1.0=py37h89c1867_0
  - nbdime=3.1.0=pyhd8ed1ab_0
  - nbformat=5.1.3=pyhd8ed1ab_0
  - ncurses=6.2=h58526e2_4
  - nest-asyncio=1.5.1=pyhd8ed1ab_0
  - notebook=6.4.3=pyha770c72_0
  - nspr=4.30=h9c3ff4c_0
  - nss=3.69=hb5efdd6_0
  - numba=0.54.0=py37h2d894fd_0
  - numpy=1.20.3=py37h038b26d_1
  - olefile=0.46=pyh9f0ad1d_1
  - openjpeg=2.4.0=hb52868f_1
  - openssl=1.1.1l=h7f98852_0
  - optuna=2.9.1=pyhd8ed1ab_0
  - orc=1.6.10=h58a87f1_0
  - packaging=21.0=pyhd8ed1ab_0
  - pandas=1.3.2=py37he8f5f7f_0
  - pandoc=2.14.2=h7f98852_0
  - pandocfilters=1.4.2=py_1
  - panel=0.12.1=py_0
  - param=1.11.1=pyh6c4a22f_0
  - parquet-cpp=1.5.1=1
  - parso=0.8.2=pyhd8ed1ab_0
  - partd=1.2.0=pyhd8ed1ab_0
  - patsy=0.5.2=pyhd8ed1ab_0
  - pbr=5.6.0=pyhd8ed1ab_0
  - pcre=8.45=h9c3ff4c_0
  - pexpect=4.8.0=py37hc8dfbb8_1
  - pickle5=0.0.11=py37h5e8e339_0
  - pickleshare=0.7.5=py37hc8dfbb8_1002
  - pillow=8.3.2=py37h0f21c89_0
  - pip=21.2.4=pyhd8ed1ab_0
  - prettytable=2.2.0=pyhd8ed1ab_0
  - prometheus_client=0.11.0=pyhd8ed1ab_0
  - prompt-toolkit=3.0.20=pyha770c72_0
  - psutil=5.8.0=py37h5e8e339_1
  - pthread-stubs=0.4=h36c2ea0_1001
  - ptyprocess=0.7.0=pyhd3deb0d_0
  - pyarrow=5.0.0=py37h58331f5_5_cpu
  - pycosat=0.6.3=py37h5e8e339_1006
  - pycparser=2.20=pyh9f0ad1d_2
  - pyct=0.4.6=py_0
  - pyct-core=0.4.6=py_0
  - pygments=2.10.0=pyhd8ed1ab_0
  - pykalman=0.9.5=py_1
  - pyopenssl=20.0.1=pyhd8ed1ab_0
  - pyparsing=2.4.7=pyh9f0ad1d_0
  - pyperclip=1.8.2=pyhd8ed1ab_2
  - pyqt=5.12.3=py37h89c1867_7
  - pyqt-impl=5.12.3=py37he336c9b_7
  - pyqt5-sip=4.19.18=py37hcd2ae1e_7
  - pyqtchart=5.12=py37he336c9b_7
  - pyqtwebengine=5.12.1=py37he336c9b_7
  - pyrsistent=0.17.3=py37h5e8e339_2
  - pysocks=1.7.1=py37h89c1867_3
  - python=3.7.10=hffdb5ce_100_cpython
  - python-dateutil=2.8.2=pyhd8ed1ab_0
  - python_abi=3.7=2_cp37m
  - pytz=2021.1=pyhd8ed1ab_0
  - pyviz_comms=2.1.0=py_0
  - pyyaml=5.4.1=py37h5e8e339_1
  - pyzmq=22.2.1=py37h336d617_0
  - qt=5.12.9=hda022c4_4
  - ray-core=1.6.0=py37hf931bba_0
  - re2=2021.09.01=h9c3ff4c_0
  - readline=8.1=h46c0cb4_0
  - redis-py=3.5.3=pyh9f0ad1d_0
  - reproc=14.2.3=h7f98852_0
  - reproc-cpp=14.2.3=h9c3ff4c_0
  - requests=2.26.0=pyhd8ed1ab_0
  - requests-unixsocket=0.2.0=py_0
  - ruamel_yaml=0.15.80=py37h5e8e339_1004
  - s2n=1.0.10=h9b69904_0
  - scikit-learn=0.24.2=py37hf0f1638_1
  - send2trash=1.8.0=pyhd8ed1ab_0
  - setproctitle=1.1.10=py37h5e8e339_1004
  - setuptools=58.0.4=py37h89c1867_0
  - six=1.16.0=pyh6c4a22f_0
  - smmap=3.0.5=pyh44b312d_0
  - snappy=1.1.8=he1b5a44_3
  - sniffio=1.2.0=py37h89c1867_1
  - sortedcontainers=2.4.0=pyhd8ed1ab_0
  - sqlalchemy=1.4.25=py37h5e8e339_0
  - sqlite=3.36.0=h9cd32fc_1
  - statsmodels=0.12.2=py37hb1e94ed_0
  - stevedore=3.4.0=py37h89c1867_0
  - ta-lib=0.4.19=py37ha21ca33_2
  - tabulate=0.8.9=pyhd8ed1ab_0
  - tblib=1.7.0=pyhd8ed1ab_0
  - tensorboardx=2.4=pyhd8ed1ab_0
  - terminado=0.12.1=py37h89c1867_0
  - testpath=0.5.0=pyhd8ed1ab_0
  - threadpoolctl=2.2.0=pyh8a188c0_0
  - thrift=0.13.0=py37hcd2ae1e_2
  - tk=8.6.11=h27826a3_1
  - toolz=0.11.1=py_0
  - tornado=6.1=py37h5e8e339_1
  - tqdm=4.62.2=pyhd8ed1ab_0
  - traitlets=5.1.0=pyhd8ed1ab_0
  - typing_extensions=3.10.0.0=pyha770c72_0
  - tzdata=2021a=he74cb21_1
  - tzlocal=3.0=py37h89c1867_2
  - urllib3=1.26.6=pyhd8ed1ab_0
  - wcwidth=0.2.5=pyh9f0ad1d_2
  - webencodings=0.5.1=py_1
  - websocket-client=0.57.0=py37h89c1867_4
  - wheel=0.37.0=pyhd8ed1ab_1
  - widgetsnbextension=3.5.1=py37h89c1867_4
  - xarray=0.19.0=pyhd8ed1ab_1
  - xeus=2.0.0=h7d0c39e_0
  - xeus-python=0.13.0=py37h4b46df4_1
  - xeus-python-shell=0.1.5=pyhd8ed1ab_0
  - xorg-libxau=1.0.9=h7f98852_0
  - xorg-libxdmcp=1.1.3=h7f98852_0
  - xz=5.2.5=h516909a_1
  - yaml=0.2.5=h516909a_0
  - zeromq=4.3.4=h9c3ff4c_1
  - zict=2.0.0=py_0
  - zipp=3.5.0=pyhd8ed1ab_0
  - zlib=1.2.11=h516909a_1010
  - zstandard=0.15.2=py37h5e8e339_0
  - zstd=1.5.0=ha95c52a_0
  - pip:
    - absl-py==0.13.0
    - aiohttp==3.7.4.post0
    - aiohttp-cors==0.7.0
    - aioredis==1.3.1
    - async-timeout==3.0.1
    - autograd==1.3
    - bayesian-optimization==1.2.0
    - blessings==1.7
    - cachetools==4.2.2
    - cma==2.7.0
    - colorful==0.5.4
    - cython==0.29.24
    - future==0.18.2
    - google-api-core==1.31.2
    - google-auth==1.35.0
    - google-auth-oauthlib==0.4.6
    - googleapis-common-protos==1.53.0
    - gpustat==0.6.0
    - gpy==1.10.0
    - gpytorch==1.5.1
    - grpcio==1.40.0
    - hebo==0.1.0
    - hiredis==2.0.0
    - multidict==5.1.0
    - nevergrad==0.4.3.post8
    - nvidia-ml-py3==7.352.0
    - oauthlib==3.1.1
    - opencensus==0.7.13
    - opencensus-context==0.1.2
    - paramz==0.9.5
    - protobuf==3.17.3
    - py-spy==0.3.9
    - pyasn1==0.4.8
    - pyasn1-modules==0.2.8
    - pymoo==0.4.2.2
    - requests-oauthlib==1.3.0
    - rsa==4.7.2
    - scipy==1.5.4
    - sklearn==0.0
    - tensorboard==2.6.0
    - tensorboard-data-server==0.6.1
    - tensorboard-plugin-wit==1.8.0
    - torch==1.9.1
    - werkzeug==2.0.1
    - yarl==1.6.3
jmakov commented 1 year ago

Bump. I have to restart the kernel all the time in the notebook. Any suggestions very welcome (gc.collect() doesn't change anything).

maximlt commented 1 year ago

Hi @jmakov,

Could you provide:

Reproducing and tracking down memory leaks is notoriously difficult :)

jmakov commented 1 year ago

@maximlt thanks for the quick response! I can reproduce this min example below in jupyter-lab:

import pandas
pandas.options.plotting.backend = "holoviews"  # `datashade=True` doesn't work without this line

# I'm gonna eat about 5GB of your memory and won't give it back :)
df = pandas.DataFrame({"col1": range(0, 100_000_000)})
df.plot(datashade=True)

# output includes this warnings
# WARNING:param.datashade: Parameter(s) [line_width] not consumed by any element rasterizer.
# WARNING:param.datashade: Parameter(s) [line_width] not consumed by any element rasterizer.

OS: Ubuntu 22.04.1 LTS uname info: Linux 5.15.0-56-generic 62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 GNU/Linux conda_list_output.txt

jmakov commented 1 year ago

Would just like to bump this issue a bit since it's almost a blocker for me - imagine a long computation, then trying to plot with diff params, each plot eats memory, and you have to restart the kernel (and again wait for the computation).

tomerroditi commented 1 year ago

I'm experiencing the same issue, ram is piling up after every cell execution. minimal reproducible code:

import pandas as pd
import numpy as np
import hvplot.pandas

df = pd.DataFrame(np.random.rand(1000000,100), columns=[str(i) for i in range(100)])
plots = df.hvplot.hist('0')

installed packages:

GitPython==3.1.29 holoviews==1.15.3 joblib==1.2.0 mat73==0.60 matplotlib==3.6.2 numpy==1.23.0 pandas==1.5.2 PyAutoGUI==0.9.53 pytest==7.2.0 scikit-learn==1.2.1 scipy==1.9.3 seaborn==0.12.1 shapely==2.0.0 sktime==0.15.0 tqdm==4.64.1 tsfel==0.1.4 tsfresh==0.19.0 xgboost==1.7.2 h5py==3.8.0 lightgbm==3.3.4 sklearn==0.0.post1 bokeh==2.4.3

jmakov commented 8 months ago

What's the status of this after half a year?

maximlt commented 8 months ago

Hi @jmakov , I'll investigate this week and see what I can find. Feel all free to investigate too, memory leaks aren't the easiest thing to debug :)