Open guidocioni opened 3 years ago
hey @guidocioni! thanks for your input here! I am currently releasing the new version 1.3.0 for each package (psyplot, psy-simple, psy-maps and psyplot-gui) and has just been finished. I'd be happy to do some benchmarking here.
A few comments:
the default for mapplot is to draw each and every grid cell individually. So it is not making a contour plot (and is therefore ignoring the levels
formatoption (see also the docs). This is why it is taking a bit longer.
You can make contour plots (for triangular or rectilinear data) via psy.plot.mapplot(plot='contourf', ...)
(see the docs). I did not yet do a lot of experiments with contourf
because I prefer plots on the grid cell level, and I remember that there were some issues with warping the data at the map boundaries. Nevertheless, I'd be happy to improve this further.
when you draw each and every grid cell, you need to format the color levels via the bounds formatoption. In general, bounds
controls the levels of the colorbar (i.e. the norm
in the plt.pcolormesh()
function, for instance), and levels
controls the levels of the contour plot (as you use it in plt.contourf
, for instance). By default, levels
is the same as bounds
, but you can also tell it to be something else.
concerning your grid info: You can specify a gridfile
when you use the psyplot.data.open_dataset function (which is also available via psyplot.project.open_dataset
). This would merge the grid information into the xarray dataset, so you should not have to do this by yourself. Your call would then be something like
import psyplot.project as psy
ds = psy.open_dataset(
'~/Downloads/temp/test_psy/T_2020093006_global_grid.nc',
gridfile="path-to-file-with-gridcell-bounds.nc")
ds.psy.plot.mapplot(...)
The reason, why psyplot needs this information is to determine, whether you are plotting unstructured data, or not.
Concerning the data size:
Right now in the documentation of psy-maps there are only examples with really small grids so I'm not sure how the packages would behave for larger grids.
Yes, as the documentation is automatically built on every commit I do not want to have to large datafiles in the examples. And plotting the individual grid cells is quite time-consuming with matplotlib, as it runs on the CPU (although I did already speed it up quite a lot, as I am transforming the coordinates independently of the standard matplotlib approach, see https://github.com/psyplot/psyplot/issues/6 for instance). There is a potential for gaining a lot of speed if I'd add functionality based on VTK (see the psy-vtk project), but so far there has not been time for me to work on this further (and for some reason, the examples on mybinder don't work anymore).
somehow I cannot pass the pyplot axis to mapplot to control the figure size.
you can pass an ax
keyword to the plotmethod, but this then needs to be initialized with a projection, e.g.
import cartopy.crs as ccrs
ax = ccrs.Robinson()
psy.plot.mapplot(ax=ax, ...)
Alternatively you can also do this afterwards, then you need to do a
sp = psy.plot.mapplot(...)
fig = sp.plotters[0].ax.figure
fig.set_figwidth(...)
fig.set_figheight(...)
Hey Philipp,
thanks for the feedback. Let me come back to you after testing with your options.
It would be really nice to benchmark psy-maps
with this "large" data. But you're right, I want to use the pcolormesh
kind of method to plot the data as I'm pretty sure this would be quite faster than tricontourf
.
Hey @guidocioni, sorry, there seems to be a misunderstanding. tricontourf
is the faster method as the pcolormesh
kind of method (which, for unstructured data is rather a pcolor
kind of method) needs to draw every individual grid cell. the pcolormesh
kind of method is the default for psyplot, which is why it is slower than your small script.
Uhm actually from my previous tests with regular lat-lon grids I found out that pcolormesh
is actually faster than contourf
in the same way imshow
is, as to make the contour you need to interpolate somehow the original data, while with "raster" fill methods you just have to fill the polygons with colors based on the levels. But as a matter of fact I have never tried to compare pcolormesh
and tricontourf
on unstructured grids as in the standard version of matplotlib
if I remember well you cannot use unstructured data in pcolormesh
.
I'm pretty sure for large grids the overhead of having to interpolate the data to make the contour lines would overcome the "loop" that actually takes care of filling the polygons, but maybe I'm wrong :)
Anyway that's exactly why I wanted to perform some benchmarks. I will try to set up a script where we can actually compare the two methods.
hey @guidocioni, yes, this is indeed true for regular lat-lon grids. But you cannot use pcolormesh
for triangular grids (or unstructured grids in general). Here you would use tripcolor which under the hood uses pcolor. For regular grids, pcolor
and pcolormesh
produce the same result, but pcolormesh
is much faster. However, pcolormesh
only works for regular lat-lon grids and does not support the visualization of unstructured data. That is why you fall back to tripcolor
(i.e. pcolor
) and this is much slower than tricontourf
.
PS: As mentioned before, this is how you would do it manually. In psy-maps I am avoiding the use tripcolor
because the standard matplotlib way of transforming the coordinates is way to slow, and because it would only work for triangular data. However, it is still slower than the contour plot because, as for pcolor
, matplotlib still needs to plot each and every gridcell (patch) individually.
This is not a bug but instead a request of clarifications and additional infos.
I have a suite of scripts which download, process and plot data from the operational ICON model run at DWD and available on opendata.dwd.de. This data contains a LOT of points so optimizing the processing and plotting of data is paramount in order to maintain the script operational every day.
I'm trying to understand whether a transition from my combination of
basemap
andmatplotlib
topsyplot
would be possible. In order to do that I'm trying to benchmark execution time ofpsyplot
when doing some of my standard plots. In theory, as far as I understand, for filled contour plotspsyplot
is callingpyplot.tricontourf
under the hood, so the difference should not be that big. The only difference, I guess, is an additional layer of preprocessing on the data, which I'm doing anyway in my script.However I'm having some problems with
psy-maps
.First of all, as the files downloaded from DWD opendata server do not have grid information inside (presumably to save space) I need to do a
cdo setgrid
before passing the file topsyplot.project.plot.mapplot
function otherwise the plotting does not work. As far as I understandpsyplot
does not only needclat
andclon
from the grid but also the edges. Is it possible to use one of the more primitive methods beforemapplot
to pass only three 1-dimensional arrays (e.g.clat
,clon
anddata
) as done inpyplot.tricontourf
? This way I can use an external file to store only the grid information as I currently do.Second, I tried to reproduce my basic plot that you see here which I simply do like this:
read_dataset
,get_projection
andget_coordinates
are functions that I wrote to simplify the code. They only wrap somexarray
andbasemap
calls. This takes about 34 seconds of wall time; the input array has dimensionstime: 93, ncells: 2949120
. The coordinateslat
andlon
are saved in a different file.Trying to do the same with
psy-maps
I first created a file where I also store the entire grid information inside (not onlyclat
andclon
otherwise it doesn't work as said before)I get this result which takes about 50 seconds. It seems that the
levels
parameter is completely ignored and somehow I cannot pass thepyplot
axis tomapplot
to control the figure size.Do you have any idea on how to create a plotting script that I could use to reproduce my plotting setup so that I could benchmark it and see whether it is really faster?
Right now in the documentation of
psy-maps
there are only examples with really small grids so I'm not sure how the packages would behave for larger grids.