oceansites / dmt

Activities of the OceanSITES Data Management Team
http://www.oceansites.org/data
6 stars 1 forks source link

Create ERDDAP instance aggregating long time series in the DATA_GRIDDED directory #28

Open dpsnowden opened 8 years ago

dpsnowden commented 8 years ago

Kevin O

We decided to focus the ERDDAP efforts on the product (formerly data_gridded) directory initially.

dpsnowden commented 7 years ago

@tcarval and @kevin-obrien has there been any discussion on strategy for doing this?

kevin-obrien commented 7 years ago

@dpsnowden - As you know, we've received some funding from BEDI to do just this. However, it will be later in November when NDBC can put some time to this. Until then, I do have a copy of some of the data and am experimenting with it in ERDDAP.....

jing-at-ndbc commented 7 years ago

I had mentioned this with IT managements in several meetings. IT agreed to give a higher priority of Tomcat/THREDDS/ERDDAP on a new web server, while IT have several main web servers now in progress of upgrade. We will see how soon IT can arrange a new Tomcat/ERDDAP testing environments for this effort.

tcarval commented 7 years ago

ERDDAP server is installed in Ifremer. http://www.ifremer.fr/erddap

As a starting point, it distributes data from W1M3A mooring site. But, fill_values are not correctly managed. http://www.ifremer.fr/erddap/tabledap/oceansitesW1M3A_Tabledap.graph?time%2CTEMP&time%3E=2000-08-22T00%3A00%3A00Z&time%3C=2016-08-29T00%3A00%3A00Z&.draw=lines&.color=0x000000&.bgColor=0xffccccff

We are in contact with Kevin O'Brien to fix this issue.

tcarval commented 7 years ago

We installed the ERDDAP version 1.74 to fix the fill_value issue. But the problem is still there. http://www.ifremer.fr/erddap/tabledap/oceansitesW1M3A_Tabledap.graph?time%2CTEMP&time%3E=2000-08-22T00%3A00%3A00Z&time%3C=2016-08-29T00%3A00%3A00Z&.draw=lines&.color=0x000000&.bgColor=0xffccccff (see the temperature chart)

nanderson123 commented 7 years ago

Thierry, Is there a way to specify that all values with absolute value over, say, 1e33, are treated as NaN/missing? It appears that this dataset contains some extreme negative values (-1e35?) mixed with the actual data.

tcarval commented 7 years ago

If we take the example the temperature variable from ftp://ftp.ifremer.fr/ifremer/oceansites/DATA/W1M3A/OS_W1M3A_2004_R.nc

The fill value for NaN/missing is set to 99999.f : float TEMP(TIME, DEPTH) ; TEMP:standard_name = "sea_water_temperature" ; TEMP:units = "degree_Celsius" ; TEMP:_FillValue = 99999.f ; ... I think that ERDDAP should ingnorethe temperature values of "99999" (they are fill values).

kevin-obrien commented 5 years ago

Make sure also that the ERDDAP instance appears on OceanSITES website...

ngalbraith commented 4 years ago

Thierry, Is there a way to specify that all values with absolute value over, say, 1e33, are treated as NaN/missing? It appears that this dataset contains some extreme negative values (-1e35?) mixed with the actual data. Just wondering, does errdap NOT look at the _FillValue field?

petejan commented 4 years ago

Hi Nan,

My understanding is that the downstream tools should fill with the _FillValue when the data is outside the valid_max and valid_min range. If you want the downstream tools to fill with NAN then set the _FillValue to NaN. If you per-fill the array with _FillValue then any missing values end up being set to the _FillValue, _FillValue should be outside the valid_max and valid_min range.

Pete

From: Nan Galbraith notifications@github.com Sent: Friday, 19 July 2019 1:49 AM To: oceansites/dmt dmt@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [oceansites/dmt] Create ERDDAP instance aggregating long time series in the DATA_GRIDDED directory (#28)

Thierry, Is there a way to specify that all values with absolute value over, say, 1e33, are treated as NaN/missing? It appears that this dataset contains some extreme negative values (-1e35?) mixed with the actual data. Just wondering, does errdap NOT look at the _FillValue field?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/oceansites/dmt/issues/28?email_source=notifications&email_token=AAFQXTQD5OQUQT2N4TPZHFTQACGIHA5CNFSM4CCNOL52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2I5QNA#issuecomment-512874548, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAFQXTTTGU7G323EUKZSHATQACGIHANCNFSM4CCNOL5Q.

ngalbraith commented 4 years ago

Hmm, I think _FillValue is defined a little differently.

From the NUG Users Guide

'Sometimes there are missing values in the data, and some value is needed to represent them. ... In netCDF, you can create an attribute for the variable (and of the same type as the variable) called “_FillValue” that contains a value that you have used for missing data."

petejan commented 4 years ago

Yes thanks, _FillValue was not defined the way I had thought. I was confused by using python, an example,

netcdf example {   // example of CDL notation
  dimensions:
      lon = 3 ;
      lat = 8 ;
  variables:
      float rh(lon, lat) ;
          rh:units = "percent" ;
          rh:long_name = "Relative humidity" ;
          rh:_FillValue = 1000.0f ;
          rh:valid_max = 100.0f ;
          rh:valid_min = 0.0f ;
  // global attributes
      :title = "Simple example, lacks some conventions" ;
  data:
   rh =
    2, 3, 5, 7, 11, 13, 17, 19,
    23, 29, 31, 37, 41, 43, 47, 53,
    59, 61, 67, 71, 1000, -1, 101 ;
  }

Using ncgen and ncdump I get this output,

ncdump test.nc
netcdf test {
dimensions:
        lon = 3 ;
        lat = 8 ;
variables:
        float rh(lon, lat) ;
                rh:units = "percent" ;
                rh:long_name = "Relative humidity" ;
                rh:_FillValue = 1000.f ;
                rh:valid_max = 100.f ;
                rh:valid_min = 0.f ;

// global attributes:
                :title = "Simple example, lacks some conventions" ;
data:

 rh =
  2, 3, 5, 7, 11, 13, 17, 19,
  23, 29, 31, 37, 41, 43, 47, 53,
  59, 61, 67, 71, _, -1, 101, _ ;
}

so the _ is being put in place of _FillValue,

but with python netCDF4 it will mask values with _FillValue and outside valid_range unless I change the mask

Here is the issue discussion with python https://github.com/Unidata/netcdf4-python/issues/576 pointing to the netCDF text "Generic applications should treat values outside the valid range as missing."

>>> ds = Dataset('test.nc', 'r')
>>> rh = ds.variables["rh"]
>>> rh
<class 'netCDF4._netCDF4.Variable'>
float32 rh(lon, lat)
    units: percent
    long_name: Relative humidity
    _FillValue: 1000.0
    valid_max: 100.0
    valid_min: 0.0
unlimited dimensions: 
current shape = (3, 8)
filling on
>>> values = rh[:]
>>> values
masked_array(
  data=[[2.0, 3.0, 5.0, 7.0, 11.0, 13.0, 17.0, 19.0],
        [23.0, 29.0, 31.0, 37.0, 41.0, 43.0, 47.0, 53.0],
        [59.0, 61.0, 67.0, 71.0, --, --, --, --]],
  mask=[[False, False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False, False],
        [False, False, False, False,  True,  True,  True,  True]],
  fill_value=1000.0,
  dtype=float32)
>>> values.mask = False
>>> values
masked_array(
  data=[[2.0, 3.0, 5.0, 7.0, 11.0, 13.0, 17.0, 19.0],
        [23.0, 29.0, 31.0, 37.0, 41.0, 43.0, 47.0, 53.0],
        [59.0, 61.0, 67.0, 71.0, 1000.0, -1.0, 101.0, 1000.0]],
  mask=[[False, False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False, False],
        [False, False, False, False, False, False, False, False]],
  fill_value=1000.0,
  dtype=float32)

MatLAB seems to handle as expected,

>> rh = ncread('test.nc', 'rh')

rh =

     2    23    59
     3    29    61
     5    31    67
     7    37    71
    11    41   NaN
    13    43    -1
    17    47   101
    19    53   NaN