pacificclimate / pdp

The PCIC Data Portal - Server software to run the entire web application
GNU General Public License v3.0
1 stars 2 forks source link

PDP front end assumes all time variables start at 0 #264

Open corviday opened 2 years ago

corviday commented 2 years ago

The PDP front end accesses metadata about the time variable of file in order to know how to translate a user's request for data from certain calendar dates to which indexes of the time variable in the file it needs to send.

Currently in pdp_raster_map.js, the front end consults the two PyDAP metadata APIs, DAS and DDS.

The DAS API provides it metadata on all variables, including the time variable:

Attributes {
    NC_GLOBAL {
        String domain "Canada";
        String method_id "BCCAQv2";
        ...
    }
    lon {
        ,,,
    }
    lat {
        ...
    }
    time {
        String long_name "Time";
        String standard_name "Time";
        String NAME "time";
        String units "days since 1950-01-01";
        Int32 _Netcdf4Dimid 2;
        String calendar "365_day";
        String CLASS "DIMENSION_SCALE";
    }
    tasmax {
        ...
    }
}

The DDS API tells it how many timestamps the file has.

Dataset {
    Float64 time[time = 55115];
} tasmax_day_BCCAQv2%2BANUSPLIN300_CanESM2_historical%2Brcp26_r1i1p1_19500101-21001231%2Enc;

So the front end knows how many timestamps there are, and the units of the timestamps (days since 1950-01-01) but it has no way to tell from the metadata what the actual first timestamp is. So it always assumes the first timestamp is less than 1 and greater than or equal to zero.

The front end will correctly handle this file:

netcdf pr_day_BCCAQv2+ANUSPLIN300_BNU-ESM_historical+rcp85_r1i1p1_19500101-21001231 {
dimensions:
    lon = 1068 ;
    lat = 510 ;
    time = 55115 ;
variables:
    double lon(lon) ;
    double lat(lat) ;
    double time(time) ;
        time:units = "days since 1950-01-01 00:00:00" ;
        time:calendar = "365_day" ;
        time:long_name = "Time" ;
        time:standard_name = "Time" ;
    short pr(time, lat, lon) ;

data:

 time = 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5,
...
}

But not this one:

netcdf pr_day_BCCAQv2+ANUSPLIN300_bcc-csm1-1-m_historical+rcp85_r1i1p1_19500101-21001231 {
dimensions:
    lon = 1068 ;
    lat = 510 ;
    time = 55115 ;
variables:
    double lon(lon) ;
    double lat(lat) ;
    double time(time) ;
        time:units = "days since 1850-01-01 00:00:00" ;
        time:calendar = "365_day" ;
        time:long_name = "Time" ;
        time:standard_name = "Time" ;
    short pr(time, lat, lon) ;
data:

 time = 36500.5, 36501.5, 36502.5, 36503.5, 36504.5, 36505.5, 36506.5,
...
}

Historically, when we've added new datasets to the PDP, we've normalized the time variable so that the first timestamp is in that range. We could skip that step if the PDP front end was a little smarter about time variables.

In order for the front end to handle files with offsettimevariables, it would need to look at the actual time data, and not just the time metadata.