holoviz / hvplot

A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews
https://hvplot.holoviz.org
BSD 3-Clause "New" or "Revised" License
1.14k stars 108 forks source link

Improve performance of `import hvplot` by lazily loading dependencies #1252

Open stinodego opened 10 months ago

stinodego commented 10 months ago

Is your feature request related to a problem? Please describe.

Importing hvplot takes around 1,5 seconds on my machine. That's relatively slow.

I ran into this issue when trying to plot a DataFrame in Polars. The first time you run df.plot is remarkably slow.

Describe the solution you'd like

hvplot seems to be importing a lot of unnecessary things on the first import. By utilizing lazy imports, functionality is only imported when used, which is more efficient. I think this is really important for a library such as hvplot which intends to support all kinds of backends.

Describe alternatives you've considered

I am no export on this topic, so I don't really know the best way to solve it. But in Polars we have greatly reduced our import times using lazy loading. The following module does the heavy lifting: https://github.com/pola-rs/polars/blob/main/py-polars/polars/dependencies.py

Additional context

I reported the issue at the Polars repo initially, there is some context there: https://github.com/pola-rs/polars/issues/13500#issuecomment-1880078547

hoxbro commented 10 months ago

I have already noticed the polars implementation of lazy import (in the original polars plot PR), and I definitely want to add something similar to our packages.

Other than initializing imports, df.plot (and by extension hvplot) also set up communication between the notebook and the Python backend, which is likely also a factor in why the initial run is slow.

maximlt commented 10 months ago

Hi @stinodego, indeed importing hvPlot is pretty slow and we surely could improve that (also in HoloViews and Panel). However, when it comes to Polars users, I have some doubt this will make a big difference since Pandas/Bokeh/Panel are all required for generating plots. And indeed as noted by Simon, the first call to df_polars.plot will do some I/O, injecting some front-end code in the notebook.

hmijail commented 7 months ago

Found this issue while investigating why hvplot is taking about 10 sec in my simple test program (4000 data points, which I assume is on the smaller side of things). Additionally, in the VSCode debugger the import hvplot line takes 20 seconds, and the first df.hvplot() takes almost 1 minute. Python 3.12.2, hvplot 0.9.2, macOS 14.4.1

maximlt commented 7 months ago

Found this issue while investigating why hvplot is taking about 10 sec in my simple test program (4000 data points, which I assume is on the smaller side of things). Additionally, in the VSCode debugger the import hvplot line takes 20 seconds, and the first df.hvplot() takes almost 1 minute. Python 3.12.2, hvplot 0.9.2, macOS 14.4.1

These timings are definitely not expected! 😮 They remind me of this issue https://github.com/holoviz/colorcet/issues/121