aaronspring commented 5 years ago

I just learned about this nice wrapper, but for me it takes too long to plot large xr.dataarray's.

I make the call control3d.isel(time=1).plot() and then after one second this appears as should <matplotlib.collections.QuadMesh at 0x1c203e94a8> and then it takes around one minute for the plot to appear. It's a (time: 1, x: 256, y: 220) dataarray of 1.35MB.

When I use hydrogen inside the atom editor it works like a charm (fast).

aaronspring commented 5 years ago

Without proplot this plots fast. Even when I use cartopy projections.

bradyrx commented 5 years ago

@aaronspring , the below runs in 364ms on my Jupyter notebook. This might be an atom issue or other issues in the demo notebook. I can look at it more closely tomorrow.

import proplot as plot
from climpred.loadutils import open_dataset
control3d = open_dataset('MPI-control-3D')['tos']
aaronspring commented 5 years ago

Maybe some extension in jupyter lab I installed. Less than a second is totally fine. I will check this tomorrow.

aaronspring commented 5 years ago

I just checked with jupyter notebook. I also get the problem there.

aaronspring commented 5 years ago

aaron.spring@d147-139:~/Coding/climpred$ jupyter labextension list
JupyterLab v0.34.12
Known labextensions:
   app dir: /Users/aaron.spring/anaconda3/share/jupyter/lab
        @jupyterlab/celltags v0.1.4  enabled  OK
        @jupyterlab/git v0.2.2  enabled  OK
        @jupyterlab/github v0.9.0  enabled  OK
        @jupyterlab/latex v0.5.0  enabled  OK
        @jupyterlab/shortcutui v0.3.1  enabled  OK
        @jupyterlab/statusbar v0.4.3  enabled  OK
        @jupyterlab/toc v0.5.0  enabled  OK
        @ryantam626/jupyterlab_code_formatter v0.1.5  enabled  OK
        nbdime-jupyterlab v0.6.0  enabled   X

"nbdime-jupyterlab@0.6.0" is not compatible with the current JupyterLab
Conflicting Dependencies:
JupyterLab             Extension        Package
>=0.18.4 <0.19.0       >=0.19.1 <0.20.0 @jupyterlab/apputils
>=0.18.5 <0.19.0       >=0.19.1 <0.20.0 @jupyterlab/notebook
>=0.18.4 <0.19.0       >=0.19.1 <0.20.0 @jupyterlab/rendermime
bradyrx commented 5 years ago

This might be an issue to raise in climpred and close here. It seems like a problem with your workflow/environment and not proplot itself.

bradyrx commented 5 years ago

@lukelbd, actually, I'm with @aaronspring on this one. proplot is slowing down any sort of cartopy plotting to the point of being unusable. Without proplot cartopy plots in O(100ms); with it, it plots in O(1 minute). I tested this on my personal machine and Cheyenne.

Generate data:

import numpy as np
import xarray as xr
x = np.random.rand(180, 360)
lat = np.linspace(-89.5, 89.5, 180)
lon = np.linspace(-179.5, 179.5, 360)
data = xr.DataArray(x, dims=['lat', 'lon'], coords=[lat, lon])

Without proplot:

import matplotlib.pyplot as plt
import as ccrs
f, ax = plt.subplots(figsize=(8,3),
ax.pcolormesh(data.lon,, data, transform=ccrs.PlateCarree())
# runtime: 271ms

With proplot:

import proplot as plot
f, ax = plot.subplots(proj='hammer')
ax.pcolormesh(data.lon,, data.transpose())
# runtime: 72s
lukelbd commented 5 years ago

Hmm, very weird. Thanks for the working example.

Was the plot produced by proplot way higher res? Because when I use the default (blurry; think it's called "GTK") matplotlib backend, both cells execute in milliseconds (matplotlib around 0.04s, proplot around 0.07s). But with the hi-res backend enabled by plot.nbsetup(), I get matplotlib at 40s, and proplot at 47s.

Could your rendering backend (the library used to turn matplotlib.figure.Figure objects into inline graphics) have changed between those two tests? Try the following three cells, both with and without the plot.nbsetup() command. You will have to restart your notebook session between tests.

import numpy as np
import xarray as xr
import proplot as plot
import matplotlib.pyplot as plt
import as ccrs
plot.nbsetup() # try commenting out this line
x = np.random.rand(180, 360)
lat = np.linspace(-89.5, 89.5, 180)
lon = np.linspace(-179.5, 179.5, 360)
data = xr.DataArray(x, dims=['lat', 'lon'], coords=[lat, lon])

Matplotlib test:

def test1():
    f, ax = plt.subplots(figsize=(8,3),
    ax.pcolormesh(data.lon,, data, transform=ccrs.PlateCarree())

Proplot test:

def test2():
    f, ax = plot.subplots(proj='platecarree')
    ax.pcolormesh(data.lon,, data.T)

plot.nbsetup() enables a hi-res default viewer, more suitable for my default "small" font sizes, axes sizes, etc. Generally I save figures as PDFs, so that when figures show up in print, they do not have to be scaled up or down -- font sizes around 8-12 are correct I think for journals. For presentations/posters/posting online, that's when I scale stuff up and down, because these are more loose formats anyway.

For this particular case, I recommend not using plot.nbsetup(), or preferably, when you have reasonably hi-res and not-particularly-noisy data, use contour or contourf instead of pcolor. The former takes way less file space/processing power, because with pcolor, you have to render tens to hundreds of thousands of little boxes, but with contour, you just have to render 10-20 lines.

Maybe my hypothesis is wrong though?

Sidenote: I realize now I should be more accommodating to different workflows. One thing on the post-Masters to-do list is allow users to change the defaults that I have currently set in Maybe I'll let people have a ~/.proplotrc file that can be used to change the default "global" properties, analogous to ~/.matplotlibrc. And I can also make respect any settings found in ~/.matplotlibrc, instead of overriding everything like it currently does.

bradyrx commented 5 years ago

Was the plot produced by proplot way higher res? But with the hi-res backend enabled by plot.nbsetup(), I get matplotlib at 40s, and proplot at 47s.

So this is after running plot.nbsetup(). I run this blindly at the start of every notebook, so maybe I should see what all it's doing. One thing I noticed is that my y-axis labels get cut off the notebook figures unless I run that setup command, so I always do it. I didn't think about it causing high-res matplotlib images.

Here's the results of my testing:

plot.nbsetup() ON plot.nbsetup() OFF
matplotlib pcolormesh 30.1s 0.2s
matplotlib contourf 6.3s 3.8s
proplot pcolormesh 35.2s 2.5s
proplot contourf 175s 60.7s

Okay, maybe not the most fair comparison with contourf. I realize now I fed in that np.random() which made contouring ridiculous, but I don't have time right now to re-run everything.

For this particular case, I recommend not using plot.nbsetup(), or preferably, when you have reasonably hi-res and not-particularly-noisy data, use contour or contourf instead of pcolor. The former takes way less file space/processing power, because with pcolor, you have to render tens to hundreds of thousands of little boxes, but with contour, you just have to render 10-20 lines.

I think the best deal for me is to avoid plot.nbsetup(). I work mainly with POP (irregular structured mesh) or MPAS (unstructured mesh) so contour/contourf do not work unless I remap to a standard grid. Maybe you have a workaround for this, but if you feed contourf a grid with the pole centered on Greenland for instance, it plots all these weird streaks across it. So I tend to remap to 360x180 using CDO/NCO and plot the contour from there.

Sidenote: I realize now I should be more accommodating to different workflows. One thing on the post-Masters to-do list is allow users to change the defaults that I have currently set in Maybe I'll let people have a ~/.proplotrc file that can be used to change the default "global" properties, analogous to ~/.matplotlibrc. And I can also make respect any settings found in ~/.matplotlibrc, instead of overriding everything like it currently does.

This sounds reasonable to me. It's probably best practice to have a global config that one can modify settings via .proplotrc which is favored. I like it. matplotlibrc > proplotrc > globalrc

lukelbd commented 5 years ago

Thanks a lot for the table. Those times are definitely not ideal... will do some more extensive testing to figure out where the bottleneck is when I have the chance. Don't know why my test resulted in such a smaller time difference.

lukelbd commented 5 years ago

Hey, are you guys still having this problem?

I tried both the examples in this thread:

import proplot as plot
from climpred.loadutils import open_dataset
control3d = open_dataset('MPI-control-3D')['tos']


import numpy as np
import xarray as xr
import proplot as plot
import matplotlib.pyplot as plt
import as ccrs
plot.nbsetup() # try commenting out this line
x = np.random.rand(180, 360)
lat = np.linspace(-89.5, 89.5, 180)
lon = np.linspace(-179.5, 179.5, 360)
data = xr.DataArray(x, dims=['lat', 'lon'], coords=[lat, lon])

# Pyplot test
from proplot.utils import _timer
def test1():
    f, ax = plt.subplots(figsize=(8,3),
    ax.pcolormesh(data.lon,, data, transform=ccrs.PlateCarree())

# Proplot test
def test2():
    f, ax = plot.subplots(proj='pcarree', figsize=(8,3))
    ax.pcolormesh(data.lon,, data)

and there's basically no time difference. Was there another dataset you were testing? Maybe one of the changes I've made in the last month magically fixed it?

P.S. Note there are a few differences since this example was last posted: (1) the _timer decorator is now hidden, and you have to import it directly, (2) you no longer have to transpose your data -- the convention is now the same as with matplotlib, "y by x", and (3) the projection name is now "pcarree" instead of "platecarree" -- valid projection names are shown in this table and for the most part correspond to their PROJ.4 names. The documentation is much better now, with the "quick start" and "showcase" merged into one comprehensive tutorial. And it's more suitable for collaboration now if you guys have any of your own ideas/want to submit a PR.

lukelbd commented 5 years ago

Actually ProPlot is slower, but not unusably slower. I'm getting with or without proplot.nbsetup (the second example with random data): about ~0.3s for pyplot, and ~1s for proplot.

It's really strange why proplot takes >0.5s longer. After some profiling all of the slowdown seems to be from the proplot.Figure.draw command (usually takes around 0.8s), but none of the slowdown is from my custom functions like smart_tight_layout (about 0.008s) -- it is all from the call to the native matplotlib matplotlib.figure.Figure.draw command. It even happens when I disable all calls to format. Maybe it has something to do with inherent inefficiencies in using subclassed axes and figure objects.

I have a hunch for where the original problem might have come from. I may have had reso='hi' as the default for all geographic features before. Plotting high-resolution geographic features takes a lot longer... up to 6s on my machine. Maybe it's even worse on yours? The ax.coastline() command uses low-resolution by default.

lukelbd commented 5 years ago

Figured out where the ~1s time difference was coming from! After fixing the following two things, pyplot and proplot speeds are roughly identical (+/- a few hundredths of a second):

  1. By default, wrapper_cmap fixes white lines between contour edges and white lines between pcolor edges, as discussed here. But this adds a whole bunch of new lines to render, which slows things down. Now, you can use fix=False in your call to ax.pcolormesh or ax.contourf to disable this.
  2. Previously, I always used LinearSegmentedNorm for colormap plots ("normalizers" convert data values into coordinates in the range [0,1], which correspond to points along the colormap). LinearSegmentedNorm gives you "even" color gradations across arbitrarily-spaced monotonic levels -- for example, with levels=[0, 0.1, 9.9, 10], the color intensity change between 0 and 0.1 will be the same as between 0.1 and 9.9. This is useful when you want levels that span a broad range of magnitudes, but the algorithm is pretty slow.

    Now, if the levels were automatically selected by matplotlib or if the user input levels are linearly spaced, I don't use this normalizer.

Download the latest version to try it out. Note nbsetup makes no difference in these times -- now that I know this, nbsetup is called every time you import ProPlot. You can disable this with the new nbsetup setting in your .proplotrc file.

Let me know if you guys still have issues.

aaronspring commented 5 years ago

now proplot only makes it a little slower for xr.plot() and even faster for xr.plot(cartopy.projections). However, when I do plot.nbsetup(), I still get fast timings with %time but still it takes 2 extra seconds for the plot to appear on screen.

bradyrx commented 5 years ago

Thanks @lukelbd for all this work. Looking forward to being able to jump fully to proplot with these faster timings!