set_auto_maskandscale on variables without _FillValue attribute

GoogleCodeExporter commented 8 years ago

Hi,

I try to use set_auto_maskandscale on a uint8 scaled variable without 
_FillValue (the full uint8 range is used for the scaling) and the problem is 
that the library force a _FillValue of 255 when I use set_auto_maskandscale, 
which is not what I want.

It this a bug? Or is there a way to handle this kind of variables?

Thanks in advance

PS: I use netCDF4-python v1.0.5

Original issue reported on code.google.com by DeM...@gmail.com on 27 Nov 2013 at 8:49

GoogleCodeExporter commented 8 years ago

The netcdf C library uses a default fill value of 255 for uint8 (set by 
NC_FILL_UBTYE in netcdf.h).  If you don't want a fill value, you must set the 
fill_value keyword to False when you create the variable with the 
createVariable Dataset method.  If you don't do this, then the netcdf C library 
will use the default fill value, and you should only use the range 0-254 for 
scaling.

Original comment by whitaker.jeffrey@gmail.com on 27 Nov 2013 at 4:47

GoogleCodeExporter commented 8 years ago

[deleted comment]

GoogleCodeExporter commented 8 years ago

If you didn't create this dataset, then a workaround would be to set 
var.set_auto_maskandscale(False), and then do the scaling manually.

Original comment by whitaker.jeffrey@gmail.com on 27 Nov 2013 at 4:52

GoogleCodeExporter commented 8 years ago

Yes I have the problem when reading the variable, I didn't create it.
For the moment I have implemented your workaround, but is it not possible to 
modify set_auto_maskandscale behavior to return a masked array only if the 
variable has a fill or missing value attribute? For me a variable only scaled 
but without fill value should be only automatically scaled into a 
numpy.ndarray, not into a masked array.
What do you think about that?

Original comment by DeM...@gmail.com on 28 Nov 2013 at 9:01

GoogleCodeExporter commented 8 years ago

Sounds reasonable, but...  Technically speaking, every netcdf variable has a 
_FillValue, since the library sets one by default.  That is, unless 
nc_def_var_fill was used to explicitly disable filling.

Ultimately, I think your data provider was wrong to provide a dataset with 
valid data equal to the fill value.  They should have disabled filling.

Original comment by whitaker.jeffrey@gmail.com on 28 Nov 2013 at 3:47

GoogleCodeExporter commented 8 years ago

I just realized that I was not checking to see if filling was disabled before 
masking data equal to the default _FillValue.  This is now fixed.  Can you try 
updating from SVN?  It's possible your data provider did disable filling, in 
which case you should get the desired result now.

Original comment by whitaker.jeffrey@gmail.com on 28 Nov 2013 at 3:50

GoogleCodeExporter commented 8 years ago

Ok thanks for the explanations, which is also what I found in the NetCDF4 
documentation, it is more clear for me now. I write/read NetCDF since a long 
time but never get this point about default fillvalue. So a variable has a fill 
value even if it doesn't have an explicit _FillValue attribute (and so by 
default you cannot use the full range of the variable except by setting the 
fill mode). This seems not very intuitive to me and I think a lot of files on 
the world doesn't follow this rule... but anyway this is not your problem 
because this is the NetCDF specifications :-)

Note that it seems there is an exception for byte variables:
http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-c/Fill-Values.html#Fill-
Values
"If you need a fill value for a byte variable, it is recommended that you 
explicitly define an appropriate _FillValue attribute, as generic utilities 
such as ncdump will not assume a default fill value for byte variables."
Explained here too:
http://www.unidata.ucar.edu/software/netcdf/docs/known_problems.html#ncdump_ubyt
e_fill
"There should be no default fill values when reading any byte type, signed or 
unsigned, because the byte ranges are too small to assume one of the values 
should appear as a missing value unless a _FillValue attribute is set 
explicitly."

I suppose you didn't implement this exception because my test was on a byte 
variable?

Unfortunately we can't update all our data provider and have tons of existing 
files supposing there is no fill value if the attribute is missing, so for the 
moment I will stick to the workaround.

Thanks a lot for your great NetCDF4-Python library, it's very useful, and very 
well designed!

Original comment by DeM...@gmail.com on 29 Nov 2013 at 3:50

GoogleCodeExporter commented 8 years ago

I forgot: is there an easy way to dump the fill mode information of each 
variable with ncdump or with your ncinfo?

Original comment by DeM...@gmail.com on 29 Nov 2013 at 3:57

GoogleCodeExporter commented 8 years ago

I had not seen that exception for byte variables in the docs - thank you for 
pointing that out.  I have now implemented that exception in svn, so no default 
fill_value is assumed for signed or unsigned byte data dtypes.

ncdump does not print fill mode information.  I just modified ncinfo so it will 
print fill mode info when you do 'ncinfo -v <varname> <filename>'.

Original comment by whitaker.jeffrey@gmail.com on 29 Nov 2013 at 4:33

GoogleCodeExporter commented 8 years ago

Original comment by whitaker.jeffrey@gmail.com on 26 Feb 2014 at 2:04

Changed state: Fixed

junxiemq / netcdf4-python

set_auto_maskandscale on variables without _FillValue attribute #209