jswhit / pygrib

Python interface for reading and writing GRIB data
https://jswhit.github.io/pygrib
MIT License
327 stars 97 forks source link

Slow retrieval of point data #28

Open dhwx opened 8 years ago

dhwx commented 8 years ago

Hi I use pygrib to plot charts from GFS grib2 files (a UK subset). This works great and quick enough for my purposes. I am now looking into plotting time series of point data from the same grib files, but am finding that the process of extracting the data values for a given point to be far to slow.

I was hoping to create time series plots of variables such as temperature and winds on the fly - but on my system it takes over 30 seconds to get and process the data for one grid point from the first timestep to the 81st (10 days).

I have tried looping through individual grib files for each time step, extracting the values for the desired grid point and storing in a list. I also tried concatenating the grib files into one master grb2 file that contains all time steps, then looping through the messages for a given variable, but this was much slower.

I am used to using GrADS where a full gribfile for all timesteps could be opened, the desired timesteps selected (set t 1 81) and then a plot produced for these times for a given variable. Is a similar usage possible using pygrib/matplotlib?

For example, is there a way to specify the lat/lon subset required at the open/index stage, or can this only be done to individual grib messages eg grb.data(lat1=lat,lat2=lat,lon1=lon,lon2=lon)? I searched through all the docs but can't see that you deal with time sequences for point data anywhere. I'd be grateful for any pointers you can provide. cheers Dan

jswhit commented 8 years ago

I think this is a fundamental shortcoming of GRIB - you have to decode the entire message (a message is a 2d grid) just to access one grid point. I don't see any way around this, other than rewriting the data in some other format (netcdf?) so that time series access is more efficient.

dhwx commented 8 years ago

Thanks for the reply. I'll have a look into conversion to netcdf, or possibly writing the data into a mysql database using wgrib2. Thanks for all the work you've put into pygrib - using it in combination with matplotlib and basemap produces some beautiful spatial plots!

webisu commented 8 years ago

The slowness is probably caused by using the very slow jpeg2000 compression. Fortunately the GFS is going to complex packing which is 20x faster. Until then, you can convert from jpeg2000 to complex packing by

 wgrib2 IN.grb -set_grib_type c3 -grib_out OUT.grb

This job is trivially parallelizable, so you can use N cores by

 wgrib2ms N IN.grb -set_grib_type c3 -grib_out OUT.grb