pygeode / pygeode

Python package for dealing with large gridded datasets
http://pygeode.github.io/
Other
11 stars 8 forks source link

method for making a time axis regularly spaced #27

Open neishm opened 9 years ago

neishm commented 9 years ago

Original issue 27 created by neishm on 2011-09-29T18:49:10.000Z:

Sometimes, the data being fed into PyGeode has an irregular time axis. This can happen, for instance, if we are using 'multifile' to join a bunch of input files together, and there's a few missing months (files) within the set. It may be easier to implement certain operations if we can assume we have a regularly spaced time axis going in (even if there's missing data). We could define a helper method that:

This may also help for plotting timeseries, as it would force the missing data to be blanked out. At the moment (I assume) there would be a line (1D) or contours (2D) that extend across the gaps.

neishm commented 9 years ago

Comment #1 originally posted by aph42 on 2013-11-06T15:47:10.000Z:

Set target for release 1.0; probably some should not be included.

neishm commented 9 years ago

Comment #4 originally posted by neishm on 2014-12-21T21:10:47.000Z:

Or, maybe this logic should go directly in openall / open_multi, and have the gaps established while the input files are being assembled together?

aph42 commented 8 years ago

Detecting missing data in large datasets is still something that comes up often for me and is not always an easy task. The check_dataset stuff has been very useful in some cases, though, for instance, I've had issues with a dataset recently in which a couple of files were filled with NaNs - so the dataset was perfectly well-formed, but I wanted to know which files were corrupted and it wasn't entirely straightforward to determine this.

One approach here that I've used is to use interpolate to fill in the (internal) gaps - it's a very efficient call and patches over both missing data and NaNs well, though if one wants to avoid this kind of synthetic filling it wouldn't do the job.

From my point of view it's more useful to work on the tools for detecting the gaps (and problems) in large datasets rather than filling them so I would prioritize improvements to check_dataset.