spencerahill / aospy

Python package for automated analysis and management of gridded climate data
Apache License 2.0
84 stars 13 forks source link

New use case: 'av' data with no time coordinate or meaningful calendar #144

Open spencerahill opened 7 years ago

spencerahill commented 7 years ago

As configured on Caltech's Fram cluster, the idealized models executed via FMS produce what is essentially dtype_in_time='av' data that, unlike GFDL's case where the data retains a time coordinate and a single time value, has no time coordinate:

>>> xr.open_dataset('/home/shill/fms_output/default_idealized/test_dry/history/day0300h00.nc')

<xarray.Dataset>
Dimensions:     (lat: 64, legendre: 43, lon: 128, rhum_bin: 41, sigma: 30, theta: 30, times: 1, zon_waven: 43)
Coordinates:
  * lat         (lat) float64 -87.86 -85.1 -82.31 -79.53 -76.74 -73.95 ...
  * lon         (lon) float64 0.0 2.812 5.625 8.438 11.25 14.06 16.88 19.69 ...
  * sigma       (sigma) float64 0.004601 0.01082 0.01455 0.01937 0.02552 ...
  * theta       (theta) float64 650.0 416.8 353.0 318.0 294.7 277.5 264.0 ...
  * zon_waven   (zon_waven) float64 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 ...
  * legendre    (legendre) float64 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 ...
  * rhum_bin    (rhum_bin) float64 -0.0125 0.0125 0.0375 0.0625 0.0875 ...
  * times       (times) float64 1.2e+03
Data variables:
    u           (sigma, lat, lon) float64 0.9051 0.9288 0.9504 0.9697 0.9867 ...
    v           (sigma, lat, lon) float64 -0.4785 -0.4305 -0.3811 -0.3302 ...
    w           (sigma, lat, lon) float64 1.325e-11 1.526e-11 1.728e-11 ...
    temp        (sigma, lat, lon) float64 216.7 216.7 216.7 216.7 216.7 ...
    pot_temp    (sigma, lat, lon) float64 1.101e+03 1.101e+03 1.101e+03 ...
    p_full      (sigma, lat, lon) float64 338.4 338.4 338.4 338.4 338.4 ...
    ps          (lat, lon) float64 9.996e+04 9.996e+04 9.996e+04 9.996e+04 ...
    vu          (sigma, lat, lon) float64 -2.018 -2.094 -2.139 -2.153 -2.135 ...
    wu          (sigma, lat, lon) float64 7.356e-10 7.465e-10 7.533e-10 ...
    utemp       (sigma, lat, lon) float64 198.8 203.9 208.6 212.8 216.5 ...
    vtemp       (sigma, lat, lon) float64 -103.1 -92.74 -82.11 -71.19 -59.99 ...
    wtemp       (sigma, lat, lon) float64 3.477e-09 3.889e-09 4.301e-09 ...
    utheta      (sigma, lat, lon) float64 1.01e+03 1.036e+03 1.06e+03 ...
    vtheta      (sigma, lat, lon) float64 -525.4 -472.9 -418.8 -363.3 -306.3 ...
    wtheta      (sigma, lat, lon) float64 1.76e-08 1.969e-08 2.179e-08 ...
    u2          (sigma, lat, lon) float64 14.42 14.71 15.01 15.31 15.62 ...
    v2          (sigma, lat, lon) float64 16.27 16.07 15.87 15.67 15.46 ...
    w2          (sigma, lat, lon) float64 9.298e-19 9.297e-19 9.296e-19 ...
    temp2       (sigma, lat, lon) float64 4.726e+04 4.726e+04 4.726e+04 ...
    pot_temp2   (sigma, lat, lon) float64 1.219e+06 1.219e+06 1.219e+06 ...
    z_mean      (sigma, lat, lon) float64 3.702e+04 3.702e+04 3.702e+04 ...
    uz          (sigma, lat, lon) float64 3.369e+04 3.456e+04 3.535e+04 ...
    vz          (sigma, lat, lon) float64 -1.745e+04 -1.569e+04 -1.388e+04 ...
    wz          (sigma, lat, lon) float64 5.845e-07 6.554e-07 7.262e-07 ...
    dt_conv     (sigma, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
    dt_rad      (sigma, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
    dt_diff     (sigma, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
    QdT         (sigma, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
    QrT         (sigma, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
    QdiffT      (sigma, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
    Qdtheta     (sigma, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
    Qrtheta     (sigma, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
    Qdifftheta  (sigma, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
    uek         (sigma, lat, lon) float64 110.7 116.3 121.7 126.7 131.4 ...
    vek         (sigma, lat, lon) float64 -101.9 -95.46 -88.53 -81.16 -73.35 ...
    upot_temp2  (sigma, lat, lon) float64 1.128e+06 1.157e+06 1.183e+06 ...
    vpot_temp2  (sigma, lat, lon) float64 -5.771e+05 -5.197e+05 -4.606e+05 ...
    omegalpha   (sigma, lat, lon) float64 2.912e-09 3.262e-09 3.612e-09 ...
Attributes:
    title: OfflineDiag Analyses

All of our logic, including for 'av' data, assumes that the data has a time coordinate.

The first thing that comes to my mind is simply checking once any 'av' data has been loaded via data_loader._load_data_from_disk, whether the desired data has a time coordinate or not.

@spencerkclark, let me know if you think that wouldn't be a good idea; otherwise I will start down this approach.

spencerkclark commented 7 years ago

I want to say this should be fixed as of #135 so long as 'times' is a listed alternative name for time in internal_names.GRID_ATTRS.

E.g. if you read in one of the test files (for which aospy works) you have a similar situation:

In [3]: xr.open_dataset('00060101.sphum_monthly.nc', decode_times=False)
Out[3]:
<xarray.Dataset>
Dimensions:      (lat: 64, latb: 65, lon: 128, lonb: 129, nv: 2, pfull: 30, phalf: 31)
Coordinates:
  * lonb         (lonb) float64 -1.406 1.406 4.219 7.031 9.844 12.66 15.47 ...
  * lon          (lon) float64 0.0 2.812 5.625 8.438 11.25 14.06 16.88 19.69 ...
  * latb         (latb) float64 -90.0 -86.58 -83.76 -80.96 -78.16 -75.36 ...
  * lat          (lat) float64 -87.86 -85.1 -82.31 -79.53 -76.74 -73.95 ...
  * phalf        (phalf) float64 0.0 9.202 12.44 16.66 22.07 28.97 37.63 ...
  * pfull        (pfull) float64 3.385 10.78 14.5 19.3 25.44 33.2 42.9 54.88 ...
  * nv           (nv) float64 1.0 2.0
    time         float64 1.841e+03
Data variables:
    ps           (lat, lon) float64 1.001e+05 1.001e+05 1.001e+05 1.001e+05 ...
    bk           (phalf) float64 0.0 0.009202 0.01244 0.01666 0.02207 ...
    pk           (phalf) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
    sphum        (pfull, lat, lon) float64 1.952e-07 1.952e-07 1.952e-07 ...
    time_bounds  (nv) float64 1.825e+03 1.856e+03
    average_DT   float64 31.0
Attributes:
    grid_tile: N/A
    title: FMS Model results
    grid_type: regular
    filename: 00060101.atmos_month.nc

That said there is no time_bounds or dt variable in your Dataset though right?

spencerahill commented 7 years ago

Good point re: times v. time.

That said there is no time_bounds or dt variable in your Dataset though right?

That's right, and 'times' is not an actual time, it is (something like) how many timesteps are included in the averaging period:

In [7]: ds.times
Out[7]:
<xarray.DataArray 'times' (times: 1)>
array([ 1200.])
Coordinates:
  * times    (times) float64 1.2e+03
Attributes:
    long_name: number of instants

So it's a little trickier.

spencerkclark commented 7 years ago

~That seems inconvenient (whether you're using aospy or not). Is that the only way the model can be configured?~

~(Though of course if you're handed that data, you really have no choice)~

spencerkclark commented 7 years ago

Oh I see, this is from a dry model (I'm assuming with no seasonal cycle, so the time of year doesn't really matter); I think that's run with "no calendar" on GFDL systems, so the output is also a bit unusual with regard to how time is handled. I've run it a couple times (to help debug / find workarounds for some past issues with FRE), but never tried reading the data into aospy.

It might be messy, but I think it could be worthwhile to figure out how to handle this type of output (however rare it might be).

The first thing that comes to my mind is simply checking once any 'av' data has been loaded via data_loader._load_data_from_disk, whether the desired data has a time coordinate or not.

I agree, starting here might be the best chance you have. Just to keep the logic clean, might it be worth writing a separate DataLoader for this kind of output?

Sorry for my ignorant prior comment.

spencerahill commented 7 years ago

Oh I see, this is from a dry model (I'm assuming with no seasonal cycle, so the time of year doesn't really matter); I think that's run with "no calendar" on GFDL systems, so the output is also a bit unusual with regard to how time is handled.

Sorry, should have mentioned that. Yes, it's a dry run with constant forcing, so there isn't even a physically meaningful calendar.

Just to keep the logic clean, might it be worth writing a separate DataLoader for this kind of output?

Yes, most likely. But unlike the other concrete DataLoader implementations so far, this would overwrite the load_variable method since that's where the _prep_time_data and related calls are made that need to be modified.

spencerkclark commented 7 years ago

I think you could potentially extend DictDataLoader or NestedDictDataLoader, if you wanted to retain the file-finding / constructor logic within those (and override load_variable as necessary).

spencerahill commented 7 years ago

I think you could potentially extend DictDataLoader or NestedDictDataLoader, if you wanted to retain the file-finding / constructor logic within those (and overriding load_variable as necessary).

Was just starting in on exactly that :)