Open amsnyder opened 10 months ago
In the c404 hourly zarr dataset the precipitation variable, PREC_ACC_NC
, represents an accumulation of precipitation for the prior 60 minutes at each timestep. For example, timestep 2023-01-01_00:00:00 is the accumulated precip for the prior 60 minutes. When creating the daily-from-hourly zarr dataset this accumulation was taken into account when computing the daily accumulated precipitation. For example, to compute the daily accumulated precipitation from the hourly for 2023-01-01, we summed the precipitation from timesteps 2023-01-01_01:00:00 to 2023-01-02_00:00:00
With the daily diagnostic xtrm zarr dataset the files were converted as-is to the zarr format. It does appear the given dates (e.g. 1980-01-01_00:00:00) do represent the prior day. We could fix this by adjusting the time values in the dataset.
does fixing it mean dropping the reported daily accumulation for 1979-10-01 from the dataset?
No, it just means that 1979-10-01 becomes 1979-09-30. We'll have the same number of days in the dataset, we're just shifting the dates to reflect reality.
For the hourly dataset, the first time step is '1979-10-01T00:00:00.000000000'. This would be the rainfall between '1979-09-30T00:00:00.000000000' and '1979-10-01T00:00:00.000000000' - is that right?
For wrfxtrm, I guess our options are to either shift the time labels so that the value you get on a given day represents the flux for that day. Or we could add an attribute integration_length
of flux over prior 24 hours
- but perhaps that is confusing given that it is a flux, rather than an accumulated value.
It sounds like both of our zarr stores currently match the data format of the raw data output, in terms of how the dates/values align, right @pnorton-usgs ? So if we shifted the dates, we would be making the data more intuitive for a data user, but it would now be a slight mismatch with the raw output format?
For the hourly dataset, the first time step is '1979-10-01T00:00:00.000000000'. This would be the rainfall between '1979-09-30T00:00:00.000000000' and '1979-10-01T00:00:00.000000000' - is that right?
For 1979-10-0100:00:00 it would represent rainfall for 1979-09-3001:00:00 to 1979-10-01_00:00:00
The raw hourly output time values have not been modified and in the case of the PREC_ACC_NC
variable the integration_length
is set to accumulated over prior 60 minutes
. It was only with the daily (and monthly) datasets that I adjusted the time values to reflect an intuitive understanding of what they represent. The integration_length
for the PREC_ACC_NC
variable was set to 24-hour accumulation
in the daily dataset and to month accumulation
in the monthly dataset.
For the hourly dataset, the first time step is '1979-10-01T00:00:00.000000000'. This would be the rainfall between '1979-09-30T00:00:00.000000000' and '1979-10-01T00:00:00.000000000' - is that right?
For 1979-10-0100:00:00 it would represent rainfall for 1979-09-3001:00:00 to 1979-10-01_00:00:00
I think we might both have typos lol - we mean 1979-09-30_23:00:00, right? For the hourly data. And the for the daily data you aggregated from hourly, that first time step of the dataset is thrown out because
1979-10-01T00:00:00.000000000
would use1979-10-01T01:00:00.000000000
to1979-10-02T00:00:00.000000000
?
The raw hourly output time values have not been modified and in the case of the
PREC_ACC_NC
variable theintegration_length
is set toaccumulated over prior 60 minutes
. It was only with the daily (and monthly) datasets that I adjusted the time values to reflect an intuitive understanding of what they represent. Theintegration_length
for thePREC_ACC_NC
variable was set to24-hour accumulation
in the daily dataset and tomonth accumulation
in the monthly dataset.
This makes sense. I guess I am asking if we should add a label like this to the conus404-daily-diagnostic data from wrfxtrm, or if we should adjust the dates to make them more intuitive (but now out of line with the raw output date formatting).
For the hourly dataset, the first time step is '1979-10-01T00:00:00.000000000'. This would be the rainfall between '1979-09-30T00:00:00.000000000' and '1979-10-01T00:00:00.000000000' - is that right?
For 1979-10-0100:00:00 it would represent rainfall for 1979-09-3001:00:00 to 1979-10-01_00:00:00
I think we might both have typos lol - we mean 1979-09-30_23:00:00, right? For the hourly data. And the for the daily data you aggregated from hourly, that first time step of the dataset is thrown out because
1979-10-01T00:00:00.000000000
would use1979-10-01T01:00:00.000000000
to1979-10-02T00:00:00.000000000
?
I totally missed you were talking about hourly. :)
For the daily timestep: 1979-10-01 would represent rainfall from the hourly for 1979-10-0101:00:00 to 1979-10-0200:00:00
The raw hourly output time values have not been modified and in the case of the
PREC_ACC_NC
variable theintegration_length
is set toaccumulated over prior 60 minutes
. It was only with the daily (and monthly) datasets that I adjusted the time values to reflect an intuitive understanding of what they represent. Theintegration_length
for thePREC_ACC_NC
variable was set to24-hour accumulation
in the daily dataset and tomonth accumulation
in the monthly dataset.This makes sense. I guess I am asking if we should add a label like this to the conus404-daily-diagnostic data from wrfxtrm, or if we should adjust the dates to make them more intuitive (but now out of line with the raw output date formatting).
I think we should adjust the dates; IMO it will be a headache for others if we don't.
Ok - I am ok with that plan. Maybe you can adjust when you do the update to get the rest of the data through 2022 into the zarr?
From Changhai Liu: The data in wrfxtrm files represent the results in the past 24 hours, and the timestamp corresponds to the end time of the 24 hours. The standard timestamp of these files is yyyy-mm-dd_00:00:00. For example, the values in the file with a timestamp 1979-10-02_00:00:00 correspond to the simulation results between 1979-10-01_00:00:00 and 1979-10-02_00:00:00. (note that the time at 1979-10-01_00:00:00 is NOT included.) Since CONUS404 started at 1979-10-01_00:00:00, the first wrfxtrm file (1979-10-01_00:00:00) is all zeros.
In terms of a decision on if we want to shift the dates in conus404-daily-diagnostic, we will wait for input from NCAR via email.
Consider adding time_bnds variable to denote the period of time a time step represents, add attribute to time variable to point to time_bnds variable.
The conus404-daily-diagnostic data seems to contain all zeros on the first day (1979-10-01). Then the data values pick up on the second day (1979-10-02). Is this intentional or do we need to shift the data by one day?