Closed Kodiologist closed 6 months ago
Possibly relevant:
The AOD and its valid range and filled value are stored as packed 16-bit unsigned integers in the product files. This may be important when reading data that was generated prior to September 4, 2020 as some software may interpret the data as signed integers. For example, the packed values of AOD valid range [-0.05, 5] may be shown as [0, -6]. To read the correct values users are required to manually convert the AOD and its valid range and fill value attributes to unsigned integers before unscaling. This was fixed on September 4, 2020, and the software should read the AOD and the attributes as unsigned integer correctly. Note that the '_unsigned' attribute has been removed during this fix.
This is related to an (invalid?) valid range specification.
First note that the file has a valid_range and a scale and offset
d <- describe('NETCDF:"example.nc":AOD')
d[138]
#[1] " valid_range={0,65530}"
d[125]
#[1] " Offset: -0.0500000007450581, Scale:7.70600017858669e-05"
Setting up the data for comparison
v.nc1 = as.numeric(ncdf4::ncvar_get(ncdf4::nc_open("example.nc"), "AOD"))
v.nc2 = as.numeric(ncdf4::ncvar_get(ncdf4::nc_open("example.nc"), "AOD", raw=TRUE))
v.terra = values(terra::rast("example.nc", "AOD", raw=TRUE))
r <- terra::rast("example.nc", "AOD", raw=FALSE)
so <- scoff(r)
Get the cells that are missing when reading with terra but not when reading with ncdf4
i <- which(is.na(v.terra) & (!is.na(v.nc1)))
head(v.nc1[i])
#[1] -0.05023118 -0.05023118 -0.05023118 -0.05023118 -0.05023118 -0.05023118
The raw values are
head(v.nc2[i])
#[1] -3 -3 -3 -3 -3 -3
And these are converted to
-3 * so[,"scale"] + so[,"offset"]
# scale
#-0.05023118
"terra" reads the file via GDAL. GDAL does not return these values because -3 is outside the valid_range
. "ncdf4" leaves this to the user to deal with.
With terra/GDAL you can ignore the valid_range
by using opts="HONOUR_VALID_RANGE=NO"
. With that, the NA
counts match:
x <- rast("example.nc", "AOD", opts="HONOUR_VALID_RANGE=NO")
global(x, "isNA")
# isNA
#AOD 2411771
sum(is.na(v.nc))
#[1] 2411771
Perhaps the "ncdf4" approach is better. But my thinking is that, if these values should not be exluded, the file would need to be fixed by the data provider. After all, it is not required to set a "valid_range" in a NetCDF file. Since they did set it, I will assume that the raw values of "-3" ("-0.05" after transformation) are not valid and should be considered to be equivalent to "missing".
Note that the valid_range applies to the raw values, not to the transformed values, but in this case that does not matter as they are both below zero. Also see https://github.com/rspatial/terra/issues/1360
What a mess. Thanks for the thorough explanation.
A copy of the file
I'm using R 4.2.1,
terra
1.7.65, andncdf4
1.22 on Ubuntu 20.04.