Open dpsnowden opened 8 years ago
@tcarval and @kevin-obrien has there been any discussion on strategy for doing this?
@dpsnowden - As you know, we've received some funding from BEDI to do just this. However, it will be later in November when NDBC can put some time to this. Until then, I do have a copy of some of the data and am experimenting with it in ERDDAP.....
I had mentioned this with IT managements in several meetings. IT agreed to give a higher priority of Tomcat/THREDDS/ERDDAP on a new web server, while IT have several main web servers now in progress of upgrade. We will see how soon IT can arrange a new Tomcat/ERDDAP testing environments for this effort.
ERDDAP server is installed in Ifremer. http://www.ifremer.fr/erddap
As a starting point, it distributes data from W1M3A mooring site. But, fill_values are not correctly managed. http://www.ifremer.fr/erddap/tabledap/oceansitesW1M3A_Tabledap.graph?time%2CTEMP&time%3E=2000-08-22T00%3A00%3A00Z&time%3C=2016-08-29T00%3A00%3A00Z&.draw=lines&.color=0x000000&.bgColor=0xffccccff
We are in contact with Kevin O'Brien to fix this issue.
We installed the ERDDAP version 1.74 to fix the fill_value issue. But the problem is still there. http://www.ifremer.fr/erddap/tabledap/oceansitesW1M3A_Tabledap.graph?time%2CTEMP&time%3E=2000-08-22T00%3A00%3A00Z&time%3C=2016-08-29T00%3A00%3A00Z&.draw=lines&.color=0x000000&.bgColor=0xffccccff (see the temperature chart)
Thierry, Is there a way to specify that all values with absolute value over, say, 1e33, are treated as NaN/missing? It appears that this dataset contains some extreme negative values (-1e35?) mixed with the actual data.
If we take the example the temperature variable from ftp://ftp.ifremer.fr/ifremer/oceansites/DATA/W1M3A/OS_W1M3A_2004_R.nc
The fill value for NaN/missing is set to 99999.f : float TEMP(TIME, DEPTH) ; TEMP:standard_name = "sea_water_temperature" ; TEMP:units = "degree_Celsius" ; TEMP:_FillValue = 99999.f ; ... I think that ERDDAP should ingnorethe temperature values of "99999" (they are fill values).
Make sure also that the ERDDAP instance appears on OceanSITES website...
Thierry, Is there a way to specify that all values with absolute value over, say, 1e33, are treated as NaN/missing? It appears that this dataset contains some extreme negative values (-1e35?) mixed with the actual data. Just wondering, does errdap NOT look at the _FillValue field?
Hi Nan,
My understanding is that the downstream tools should fill with the _FillValue when the data is outside the valid_max and valid_min range. If you want the downstream tools to fill with NAN then set the _FillValue to NaN. If you per-fill the array with _FillValue then any missing values end up being set to the _FillValue, _FillValue should be outside the valid_max and valid_min range.
Pete
From: Nan Galbraith notifications@github.com Sent: Friday, 19 July 2019 1:49 AM To: oceansites/dmt dmt@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [oceansites/dmt] Create ERDDAP instance aggregating long time series in the DATA_GRIDDED directory (#28)
Thierry, Is there a way to specify that all values with absolute value over, say, 1e33, are treated as NaN/missing? It appears that this dataset contains some extreme negative values (-1e35?) mixed with the actual data. Just wondering, does errdap NOT look at the _FillValue field?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/oceansites/dmt/issues/28?email_source=notifications&email_token=AAFQXTQD5OQUQT2N4TPZHFTQACGIHA5CNFSM4CCNOL52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2I5QNA#issuecomment-512874548, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAFQXTTTGU7G323EUKZSHATQACGIHANCNFSM4CCNOL5Q.
Hmm, I think _FillValue is defined a little differently.
From the NUG Users Guide
'Sometimes there are missing values in the data, and some value is needed to represent them. ... In netCDF, you can create an attribute for the variable (and of the same type as the variable) called “_FillValue” that contains a value that you have used for missing data."
Yes thanks, _FillValue was not defined the way I had thought. I was confused by using python, an example,
netcdf example { // example of CDL notation
dimensions:
lon = 3 ;
lat = 8 ;
variables:
float rh(lon, lat) ;
rh:units = "percent" ;
rh:long_name = "Relative humidity" ;
rh:_FillValue = 1000.0f ;
rh:valid_max = 100.0f ;
rh:valid_min = 0.0f ;
// global attributes
:title = "Simple example, lacks some conventions" ;
data:
rh =
2, 3, 5, 7, 11, 13, 17, 19,
23, 29, 31, 37, 41, 43, 47, 53,
59, 61, 67, 71, 1000, -1, 101 ;
}
Using ncgen and ncdump I get this output,
ncdump test.nc
netcdf test {
dimensions:
lon = 3 ;
lat = 8 ;
variables:
float rh(lon, lat) ;
rh:units = "percent" ;
rh:long_name = "Relative humidity" ;
rh:_FillValue = 1000.f ;
rh:valid_max = 100.f ;
rh:valid_min = 0.f ;
// global attributes:
:title = "Simple example, lacks some conventions" ;
data:
rh =
2, 3, 5, 7, 11, 13, 17, 19,
23, 29, 31, 37, 41, 43, 47, 53,
59, 61, 67, 71, _, -1, 101, _ ;
}
so the _ is being put in place of _FillValue,
but with python netCDF4 it will mask values with _FillValue and outside valid_range unless I change the mask
Here is the issue discussion with python https://github.com/Unidata/netcdf4-python/issues/576 pointing to the netCDF text "Generic applications should treat values outside the valid range as missing."
>>> ds = Dataset('test.nc', 'r')
>>> rh = ds.variables["rh"]
>>> rh
<class 'netCDF4._netCDF4.Variable'>
float32 rh(lon, lat)
units: percent
long_name: Relative humidity
_FillValue: 1000.0
valid_max: 100.0
valid_min: 0.0
unlimited dimensions:
current shape = (3, 8)
filling on
>>> values = rh[:]
>>> values
masked_array(
data=[[2.0, 3.0, 5.0, 7.0, 11.0, 13.0, 17.0, 19.0],
[23.0, 29.0, 31.0, 37.0, 41.0, 43.0, 47.0, 53.0],
[59.0, 61.0, 67.0, 71.0, --, --, --, --]],
mask=[[False, False, False, False, False, False, False, False],
[False, False, False, False, False, False, False, False],
[False, False, False, False, True, True, True, True]],
fill_value=1000.0,
dtype=float32)
>>> values.mask = False
>>> values
masked_array(
data=[[2.0, 3.0, 5.0, 7.0, 11.0, 13.0, 17.0, 19.0],
[23.0, 29.0, 31.0, 37.0, 41.0, 43.0, 47.0, 53.0],
[59.0, 61.0, 67.0, 71.0, 1000.0, -1.0, 101.0, 1000.0]],
mask=[[False, False, False, False, False, False, False, False],
[False, False, False, False, False, False, False, False],
[False, False, False, False, False, False, False, False]],
fill_value=1000.0,
dtype=float32)
MatLAB seems to handle as expected,
>> rh = ncread('test.nc', 'rh')
rh =
2 23 59
3 29 61
5 31 67
7 37 71
11 41 NaN
13 43 -1
17 47 101
19 53 NaN
Kevin O
We decided to focus the ERDDAP efforts on the product (formerly data_gridded) directory initially.