Note: This issue is filed in this repo because it appears that the root of the problem is here, not in the higher-level PDP app construction.
When a station does not have values for a variable at all time points in the station's total observation set, a fill value is provided for the absent values.
For example, suppose a station records temperature for one year, then precipitation for a second year. The total observation record spans 2 years. In the data file downloaded for this station, fill values are written to the temperature variable for the second year, and to the precipitation variable for the first year. Both variables have the same length, 2 years.
The fill values are peculiar, in the following ways:
The value is not (at least in observed cases) the missing_value value specified in the metadata (available in NetCDF files).
The value is not anything easily identifiable such as NaN, maximum or minimum value.
The value changes when the server is restarted.
So far, the values observed from several servers (the production server and server instances running locally on a dev machine) are large floats, e.g., 1.48655878184907e+158.
To reproduce
If experimenting with a local dev instance, install pdp and run PCDS locally:
Mount /storage
Activate the virtual env in which PDP is installed.
In Download Data, select format NetCDF, then click Timeseries.
Save file to disk.
Unzip it and examine the contents of 0550502.nc. Since the file is fairly small, ncdump 0550502.nc is usable for this. Note the fill values at the beginning of variable HUMIDITY and at the end of WDIR_VECT. These are the peculiar values which vary by server instance.
Additional information:
The non-fill values that appear for the variables have, in this particular example, been (partially) verified by querying the database directly. The only thing that seems to be amiss is the fill values.
If you restart the local server, a different peculiar fill value is provided for the same dataset.
Problem description
Note: This issue is filed in this repo because it appears that the root of the problem is here, not in the higher-level PDP app construction.
When a station does not have values for a variable at all time points in the station's total observation set, a fill value is provided for the absent values.
For example, suppose a station records temperature for one year, then precipitation for a second year. The total observation record spans 2 years. In the data file downloaded for this station, fill values are written to the temperature variable for the second year, and to the precipitation variable for the first year. Both variables have the same length, 2 years.
The fill values are peculiar, in the following ways:
missing_value
value specified in the metadata (available in NetCDF files).NaN
, maximum or minimum value.To reproduce
If experimenting with a local dev instance, install
pdp
and run PCDS locally:/storage
Set environment variables:
python scripts/rast_serve.py -p 8000
Open the PCDS app in the browser.
Select start date:
2019/09/25
Select variables:
Temperature (Mean)
Draw a polygon around the Williams Lake station.
In Download Data, select format
NetCDF
, then clickTimeseries
.Save file to disk.
Unzip it and examine the contents of
0550502.nc
. Since the file is fairly small,ncdump 0550502.nc
is usable for this. Note the fill values at the beginning of variableHUMIDITY
and at the end ofWDIR_VECT
. These are the peculiar values which vary by server instance.Additional information:
The non-fill values that appear for the variables have, in this particular example, been (partially) verified by querying the database directly. The only thing that seems to be amiss is the fill values.
If you restart the local server, a different peculiar fill value is provided for the same dataset.