psyplot / psy-maps

The psyplot plugin for visualizations on a map
https://psyplot.github.io/psy-maps
8 stars 5 forks source link

Plotting operational ICON data and benchmarking #24

Open guidocioni opened 3 years ago

guidocioni commented 3 years ago

This is not a bug but instead a request of clarifications and additional infos.

I have a suite of scripts which download, process and plot data from the operational ICON model run at DWD and available on opendata.dwd.de. This data contains a LOT of points so optimizing the processing and plotting of data is paramount in order to maintain the script operational every day.

I'm trying to understand whether a transition from my combination of basemap and matplotlib to psyplot would be possible. In order to do that I'm trying to benchmark execution time of psyplot when doing some of my standard plots. In theory, as far as I understand, for filled contour plots psyplot is calling pyplot.tricontourf under the hood, so the difference should not be that big. The only difference, I guess, is an additional layer of preprocessing on the data, which I'm doing anyway in my script.

However I'm having some problems with psy-maps.

First of all, as the files downloaded from DWD opendata server do not have grid information inside (presumably to save space) I need to do a cdo setgrid before passing the file to psyplot.project.plot.mapplot function otherwise the plotting does not work. As far as I understand psyplot does not only need clat and clon from the grid but also the edges. Is it possible to use one of the more primitive methods before mapplot to pass only three 1-dimensional arrays (e.g. clat, clon and data) as done in pyplot.tricontourf? This way I can use an external file to store only the grid information as I currently do.

Second, I tried to reproduce my basic plot that you see here download (5) which I simply do like this:

dset= read_dataset(variables=['T'])
lat, lon = get_coordinates()
temp_850 = dset['t'].metpy.sel(vertical=850 * units.hPa).load()
temp_850.metpy.convert_units('degC')
fig = plt.figure(figsize=(15, 10))
ax  = plt.gca()
_, x, y = get_projection(lon, lat, "world")
cs = ax.tricontourf(x, y, temp_850[0], extend='both', cmap=get_colormap("temp"),
                                    levels=np.arange(-35., 30., 1.))

read_dataset, get_projection and get_coordinates are functions that I wrote to simplify the code. They only wrap some xarray and basemap calls. This takes about 34 seconds of wall time; the input array has dimensions time: 93, ncells: 2949120. The coordinates lat and lon are saved in a different file.

Trying to do the same with psy-maps I first created a file where I also store the entire grid information inside (not only clat and clon otherwise it doesn't work as said before)

psy.plot.mapplot('~/Downloads/temp/test_psy/T_2020093006_global_grid.nc',
                 name='t', projection='robin',
                 levels=np.arange(238, 303, 2.), cmap=get_colormap("temp"),
                 )

I get this result download (6) which takes about 50 seconds. It seems that the levels parameter is completely ignored and somehow I cannot pass the pyplot axis to mapplot to control the figure size.

Do you have any idea on how to create a plotting script that I could use to reproduce my plotting setup so that I could benchmark it and see whether it is really faster?

Right now in the documentation of psy-maps there are only examples with really small grids so I'm not sure how the packages would behave for larger grids.

Chilipp commented 3 years ago

hey @guidocioni! thanks for your input here! I am currently releasing the new version 1.3.0 for each package (psyplot, psy-simple, psy-maps and psyplot-gui) and has just been finished. I'd be happy to do some benchmarking here.

A few comments:

Chilipp commented 3 years ago

somehow I cannot pass the pyplot axis to mapplot to control the figure size.

you can pass an ax keyword to the plotmethod, but this then needs to be initialized with a projection, e.g.

import cartopy.crs as ccrs
ax = ccrs.Robinson()
psy.plot.mapplot(ax=ax, ...)

Alternatively you can also do this afterwards, then you need to do a

sp = psy.plot.mapplot(...)
fig = sp.plotters[0].ax.figure
fig.set_figwidth(...)
fig.set_figheight(...)
guidocioni commented 3 years ago

Hey Philipp, thanks for the feedback. Let me come back to you after testing with your options. It would be really nice to benchmark psy-maps with this "large" data. But you're right, I want to use the pcolormesh kind of method to plot the data as I'm pretty sure this would be quite faster than tricontourf.

Chilipp commented 3 years ago

Hey @guidocioni, sorry, there seems to be a misunderstanding. tricontourf is the faster method as the pcolormesh kind of method (which, for unstructured data is rather a pcolor kind of method) needs to draw every individual grid cell. the pcolormesh kind of method is the default for psyplot, which is why it is slower than your small script.

guidocioni commented 3 years ago

Uhm actually from my previous tests with regular lat-lon grids I found out that pcolormesh is actually faster than contourf in the same way imshow is, as to make the contour you need to interpolate somehow the original data, while with "raster" fill methods you just have to fill the polygons with colors based on the levels. But as a matter of fact I have never tried to compare pcolormesh and tricontourf on unstructured grids as in the standard version of matplotlib if I remember well you cannot use unstructured data in pcolormesh. I'm pretty sure for large grids the overhead of having to interpolate the data to make the contour lines would overcome the "loop" that actually takes care of filling the polygons, but maybe I'm wrong :) Anyway that's exactly why I wanted to perform some benchmarks. I will try to set up a script where we can actually compare the two methods.

Chilipp commented 3 years ago

hey @guidocioni, yes, this is indeed true for regular lat-lon grids. But you cannot use pcolormesh for triangular grids (or unstructured grids in general). Here you would use tripcolor which under the hood uses pcolor. For regular grids, pcolor and pcolormesh produce the same result, but pcolormesh is much faster. However, pcolormesh only works for regular lat-lon grids and does not support the visualization of unstructured data. That is why you fall back to tripcolor (i.e. pcolor) and this is much slower than tricontourf.

PS: As mentioned before, this is how you would do it manually. In psy-maps I am avoiding the use tripcolor because the standard matplotlib way of transforming the coordinates is way to slow, and because it would only work for triangular data. However, it is still slower than the contour plot because, as for pcolor, matplotlib still needs to plot each and every gridcell (patch) individually.