rspatial / terra

R package for spatial data handling https://rspatial.github.io/terra/reference/terra-package.html
GNU General Public License v3.0
535 stars 88 forks source link

`ncdf4` can see values in a NetCDF 4 file that appear as NA to `terra` #1399

Closed Kodiologist closed 6 months ago

Kodiologist commented 6 months ago
$ aws s3 cp --no-sign-request s3://noaa-goes16/ABI-L2-AODC/2019/037/18/OR_ABI-L2-AODC-M3_G16_s20190371812140_e20190371814513_c20190371817010.nc example.nc
$ R

A copy of the file

> v.nc = as.numeric(ncdf4::ncvar_get(ncdf4::nc_open("example.nc"), "AOD"))
> v.terra = as.data.frame(terra::rast("example.nc")$AOD, na.rm = F)$AOD
> table(is.na(v.nc))

  FALSE    TRUE 
1338229 2411771 
> table(is.na(v.terra))

   TRUE 
3750000 
> head(which(!is.na(v.nc)))
[1] 47110 47111 47113 47114 47115 47116
> head(na.omit(v.nc))
[1] 0.08053964 0.10280998 0.06936594 0.06928888 0.06135170 0.06173700

I'm using R 4.2.1, terra 1.7.65, and ncdf4 1.22 on Ubuntu 20.04.

Kodiologist commented 6 months ago

Possibly relevant:

The AOD and its valid range and filled value are stored as packed 16-bit unsigned integers in the product files. This may be important when reading data that was generated prior to September 4, 2020 as some software may interpret the data as signed integers. For example, the packed values of AOD valid range [-0.05, 5] may be shown as [0, -6]. To read the correct values users are required to manually convert the AOD and its valid range and fill value attributes to unsigned integers before unscaling. This was fixed on September 4, 2020, and the software should read the AOD and the attributes as unsigned integer correctly. Note that the '_unsigned' attribute has been removed during this fix.

From https://www.star.nesdis.noaa.gov/atmospheric-composition-training/documents/GOES-16_ABI_L2_AOD_Provisional_ReadMe_v3.pdf

rhijmans commented 6 months ago

This is related to an (invalid?) valid range specification.

First note that the file has a valid_range and a scale and offset

d <- describe('NETCDF:"example.nc":AOD')
d[138]
#[1] "    valid_range={0,65530}"
d[125]
#[1] "  Offset: -0.0500000007450581,   Scale:7.70600017858669e-05"

Setting up the data for comparison

v.nc1 = as.numeric(ncdf4::ncvar_get(ncdf4::nc_open("example.nc"), "AOD"))
v.nc2 = as.numeric(ncdf4::ncvar_get(ncdf4::nc_open("example.nc"), "AOD", raw=TRUE))
v.terra = values(terra::rast("example.nc", "AOD", raw=TRUE))
r <- terra::rast("example.nc", "AOD", raw=FALSE)
so <- scoff(r)

Get the cells that are missing when reading with terra but not when reading with ncdf4

i <- which(is.na(v.terra) & (!is.na(v.nc1)))
head(v.nc1[i])
#[1] -0.05023118 -0.05023118 -0.05023118 -0.05023118 -0.05023118 -0.05023118

The raw values are

head(v.nc2[i])
#[1] -3 -3 -3 -3 -3 -3

And these are converted to

-3 * so[,"scale"] + so[,"offset"]
#      scale 
#-0.05023118 

"terra" reads the file via GDAL. GDAL does not return these values because -3 is outside the valid_range. "ncdf4" leaves this to the user to deal with.

With terra/GDAL you can ignore the valid_range by using opts="HONOUR_VALID_RANGE=NO". With that, the NA counts match:

x <- rast("example.nc", "AOD", opts="HONOUR_VALID_RANGE=NO") 
global(x, "isNA")
#      isNA
#AOD 2411771
sum(is.na(v.nc))
#[1] 2411771

Perhaps the "ncdf4" approach is better. But my thinking is that, if these values should not be exluded, the file would need to be fixed by the data provider. After all, it is not required to set a "valid_range" in a NetCDF file. Since they did set it, I will assume that the raw values of "-3" ("-0.05" after transformation) are not valid and should be considered to be equivalent to "missing".

Note that the valid_range applies to the raw values, not to the transformed values, but in this case that does not matter as they are both below zero. Also see https://github.com/rspatial/terra/issues/1360

Kodiologist commented 6 months ago

What a mess. Thanks for the thorough explanation.