rspatial / terra

R package for spatial data handling https://rspatial.github.io/terra/reference/terra-package.html
GNU General Public License v3.0
544 stars 94 forks source link

Issues with reading HDF5 files using `terra::rast()` #686

Closed dimfalk closed 2 years ago

dimfalk commented 2 years ago

Hello! I'm facing some problems with reading HDF5 files using terra at the moment but, to be honest, I have to mention I'm not completely sure whether this is the right place to ask for help - ergo, if this is really terra-related - or just a result of my lack of understanding of the HDF5 format or terra functionality respectively.

However, I'll give it a try and if this is not the right place, please tell me.

I'm working with precipitation raster data stored in HDF5 containers (data sample: hd2011010000.zip).

terra::rast("hd2011010000.scu", subds = "/dataset_DXk/image")
#> Warning: [rast] unknown extent
#> class       : SpatRaster 
#> dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
#> resolution  : 0.001865672, 0.001865672  (x, y)
#> extent      : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 
#> source      : hd2011010000.scu 
#> name        : hd2011010000

Created on 2022-06-23 by the reprex package (v2.0.1)

Reading seems fine at first sight until I attempt to plot() the resulting SpatRaster object or perform some basic calculations (like a division by a certain factor to adjust units):

HDF5-DIAG: Error detected in HDF5 (1.12.0) thread 0:
  #000: ../../src/H5Dio.c line 192 in H5Dread(): can't read data
    major: Dataset
    minor: Read failed
  #001: ../../src/H5VLcallback.c line 2080 in H5VL_dataset_read(): dataset read failed
    major: Virtual Object Layer
    minor: Read failed
  #002: ../../src/H5VLcallback.c line 2046 in H5VL__dataset_read(): dataset read failed
    major: Virtual Object Layer
    minor: Read failed
  #003: ../../src/H5VLnative_dataset.c line 167 in H5VL__native_dataset_read(): can't read data
    major: Dataset
    minor: Read failed
  #004: ../../src/H5Dio.c line 567 in H5D__read(): can't read data
    major: Dataset
    minor: Read failed
  #005: ../../src/H5Dchunk.c line 2594 in H5D__chunk_read(): unable to read raw data chunk
    major: Low-level I/O
    minor: Read failed
  #006: ../../src/H5Dchunk.c line 3957 in H5D__chunk_lock(): data pipeline read failed
    major: Dataset
    minor: Filter operation failed
  #007: ../../src/H5Z.c line 1311 in H5Z_pipeline(): required filter 'szip' is not registered
    major: Data filters
    minor: Read failed
  #008: ../../src/H5PLint.c line 274 in H5PL_load(): search in path table failed
    major: Plugin for dynamically loaded library
    minor: Can't get value
  #009: ../../src/H5PLpath.c line 604 in H5PL__find_plugin_in_path_table(): search in path C:\ProgramData\hdf5\lib\plugin encountered an error
    major: Plugin for dynamically loaded library
    minor: Can't get value
  #010: ../../src/H5PLpath.c line 734 in H5PL__find_plugin_in_path(): can't open directory
    major: Plugin for dynamically loaded library
    minor: Can't open directory or file
Error: [readValues] cannot read values
In addition: Warning messages:
1: [rast] unknown extent

2: H5Dread() failed for block. (GDAL error 1) 
3: HDF5:"hd2011010000.scu"://dataset_DXk/image, band 1: IReadBlock failed at X offset 0, Y offset 0: H5Dread() failed for block. (GDAL error 1) 

Although *.scu does not seem to be a common file extension for the HDF5 format, hdf5-tools seems to be able to read this, c.f. output from h5debug and h5dump.

Moreover, the file is recognized using HDFView 3.14 in terms of the content being plotable. Making use of Panoply 5.1.0 the image layer is not recognized but this could also be a result of the need to change the file extension to *.hdf5 in order to be recognized by the software, no idea tbh. Maybe the structure of the container is not quite up to standards so that reading is failing with some products?

Before trying to migrate from raster to terra I was making use of rhdf5 to read these files but I'd like to reduce further dependencies if possible, hence my attempt to solve this using terra only.

rhdf5::h5read("hd2011010000.scu", "/dataset_DXk/image") |> t() |> raster::raster()
#> class      : RasterLayer 
#> dimensions : 536, 536, 287296  (nrow, ncol, ncell)
#> resolution : 0.001865672, 0.001865672  (x, y)
#> extent     : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
#> crs        : NA 
#> source     : memory
#> names      : layer 
#> values     : -999, 15  (min, max)

Created on 2022-06-23 by the reprex package (v2.0.1)

I'd really like to understand why reading fails here. Is there something I can do to be able to read these files using terra?

Thank you very much in advance!

rhijmans commented 2 years ago

Below is what GDAL sees (with "GDALinfo"). It suggests that this file is not organized as a typical geospatial raster file.

describe("hd2011010000.scu")
# [1] "Driver: HDF5Image/HDF5 Dataset"                        
# [2] "Files: hd2011010000.scu"                               
# [3] "Size is 536, 536"                                      
# [4] "Metadata:"                                             
# [5] "  dataset_DXk_what_elev_[deg]=0.5 "                    
# [6] "  dataset_DXk_what_enddate_[YYYYMMDD]=20201101"        
# [7] "  dataset_DXk_what_endtime_[HHmmss]=000000"            
# [8] "  dataset_DXk_what_gain=1 "                            
# [9] "  dataset_DXk_what_nodata=-999 "                       
#[10] "  dataset_DXk_what_ori_format=DX"                      
#[11] "  dataset_DXk_what_ori_name=HD2011010000.scu"          
#[12] "  dataset_DXk_what_product=COMP  "                     
#[13] "  dataset_DXk_what_quantity=RATE "                     
#[14] "  dataset_DXk_what_rad=1 "                             
#[15] "  dataset_DXk_where_xsize=536 "                        
#[16] "  dataset_DXk_where_ysize=536 "                        
#[17] "  how_a1gate=1512001264 "                              
#[18] "  how_adjustment=1 "                                   
#[19] "  how_angles_[deg]=0.5 "                               
#[20] "  how_doppler=4 "                                      
#[21] "  how_maxlevel_[dBZ]=65 "                              
#[22] "  how_minlevel_[dBZ]=0 "                               
#[23] "  how_nodes=ESS,NHB,FLD,"                              
#[24] "  how_nodesn=3 "                                       
#[25] "  how_place=fld"                                       
#[26] "  how_resolution_[dBZ]=0.5 "                           
#[27] "  how_wavelength_[cm]=5 "                              
#[28] "  how_WMO=121 "                                        
#[29] "  how_zr_a=256 "                                       
#[30] "  how_zr_b=1.42 "                                      
#[31] "  what_date_[YYYYMMDD]=20201101\\"                     
#[32] "  what_object=IMAGE"                                   
#[33] "  what_sets=1 "                                        
#[34] "  what_time_[HHmmss]=000000\\"                         
#[35] "  what_version_[H5Vol_0.2]=SCP_BIG 5.0.11.24   "       
#[36] "  where_poserror="                                     
#[37] "  where_projdef=UTM_N32"                               
#[38] "  where_range=128 "                                    
#[39] "  where_start_lat_[deg]=5943 "                         
#[40] "  where_start_lon_[deg]=195 "                          
#[41] "  where_UL_ipixel=195 "                                
#[42] "  where_UL_jpixel=5943 "                               
#[43] "  where_xscale_[m]=1000 "                              
#[44] "  where_xsize=536 "                                    
#[45] "  where_yscale_[m]=1000 "                              
#[46] "  where_ysize=536 "                                    
#[47] "Corner Coordinates:"                                   
#[48] "Upper Left  (    0.0,    0.0)"                         
#[49] "Lower Left  (    0.0,  536.0)"                         
#[50] "Upper Right (  536.0,    0.0)"                         
#[51] "Lower Right (  536.0,  536.0)"                         
#[52] "Center      (  268.0,  268.0)"                         
#[53] "Band 1 Block=64x72 Type=Float32, ColorInterp=Undefined"
#[54] "  Metadata:"                                           
#[55] "    dataset_DXk_image_CLASS=IMAGE"                     
#[56] "    dataset_DXk_image_IMAGE_BACKGROUNDINDEX=0 "        
#[57] "    dataset_DXk_image_IMAGE_COLORMODEL=RGB"            
#[58] "    dataset_DXk_image_IMAGE_MINMAXRANGE=0 0 "          
#[59] "    dataset_DXk_image_IMAGE_SUBCLASS=BITMAP"           
#[60] "    dataset_DXk_image_IMAGE_VERSION=1.2"               
#[61] "    dataset_DXk_image_PALETTE="  

I suspected that the GDAL HDF5 driver made a wrong assumption, and thus failed when reading the file. However, the GDAL command line tool gdal_translate works, as this creates a valid GTiff

gdal_translate -of GTiff hd2011010000.scu test.tif

So that suggests that terra makes a wrong assumption. I need to investigate that.

dimfalk commented 2 years ago

Thank your for your fast reply! I was able to reproduce this behaviour with GDAL 3.0.4 - thanks for the hint using gdalinfo and gdal_translate! Let me know if you need further information on the data.

By the way, stars::read_stars("hd2011010000.scu") fails with the same error message.

edzer commented 2 years ago

By the way, stars::read_stars("hd2011010000.scu") fails with the same error message.

I see no errors, and

library(stars)
# Loading required package: abind
# Loading required package: sf
# Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.0; sf_use_s2() is TRUE
(r = read_stars("hd2011010000.scu"))
# stars object with 2 dimensions and 1 attribute
# attribute(s):
#                   Min. 1st Qu. Median      Mean 3rd Qu. Max.
# hd2011010000.scu  -999    -999   -999 -587.3865       0   15
# dimension(s):
#   from  to offset delta refsys point values x/y
# x    1 536      0     1     NA    NA   NULL [x]
# y    1 536    536    -1     NA    NA   NULL [y]
r[r == -999] = NA
plot(r)

gives

x

The file seems to lack a missing value flag, and the array is not georeferenced, as you noticed.

rhijmans commented 2 years ago

This is what I see with CRAN-stars:

library(stars)
#Loading required package: abind
#Loading required package: sf
#Linking to GEOS 3.9.1, GDAL 3.3.2, PROJ 7.2.1; sf_use_s2() is TRUE
(r = read_stars("hd2011010000.scu"))
#Error in CPL_read_gdal(as.character(x), as.character(options), as.character(driver),  : 
#  read failure
#In addition: Warning messages:
#1: In CPL_read_gdal(as.character(x), as.character(options), as.character(driver),  :
#  GDAL Error 1: H5Dread() failed for block.
#2: In CPL_read_gdal(as.character(x), as.character(options), as.character(driver),  :
#  GDAL Error 1: HDF5:"C:\Users\rhijm\Downloads\hd2011010000.scu"://dataset_DXk/image, band 1: IReadBlock failed at X #offset 0, Y offset 0: H5Dread() failed for block.

On windows with this sessionInfo()

R Under development (unstable) (2022-05-14 r82360 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                           LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stars_0.5-5 sf_1.0-7    abind_1.4-5

loaded via a namespace (and not attached):
 [1] crayon_1.5.1       vctrs_0.4.1        cli_3.3.0          rlang_1.0.2        DBI_1.1.2          KernSmooth_2.23-20 purrr_0.3.4       
 [8] generics_0.1.2     assertthat_0.2.1   glue_1.6.2         lwgeom_0.2-8       e1071_1.7-9        fansi_1.0.3        grid_4.3.0        
[15] classInt_0.4-3     tibble_3.1.7       ellipsis_0.3.2     lifecycle_1.0.1    compiler_4.3.0     dplyr_1.0.9        Rcpp_1.0.8.3      
[22] pkgconfig_2.0.3    R6_2.5.1           class_7.3-20       tidyselect_1.1.2   utf8_1.2.2         parallel_4.3.0     pillar_1.7.0      
[29] magrittr_2.0.3     tools_4.3.0        proxy_0.4-26       units_0.8-0       
rhijmans commented 2 years ago

And with the development version as well. So perhaps this was fixed somewhere in GDAL > 3.3.2?

rhijmans commented 2 years ago

It works for me on Ubuntu with GDAL 3.4.0:

library(stars)
#Loading required package: abind
#Loading required package: sf
#Linking to GEOS 3.10.1, GDAL 3.4.0, PROJ 8.2.0; sf_use_s2() is TRUE
(r = read_stars("hd2011010000.scu"))
#stars object with 2 dimensions and 1 attribute
#attribute(s):
#                  Min. 1st Qu. Median      Mean 3rd Qu. Max.
#hd2011010000.scu  -999    -999   -999 -587.3865       0   15
#dimension(s):
#  from  to offset delta refsys point values x/y
#x    1 536      0     1     NA    NA   NULL [x]
#y    1 536    536    -1     NA    NA   NULL [y]

r <- rast("hd2011010000.scu")
NAflag(r) <- -999
r * 1
#class       : SpatRaster
#dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
#resolution  : 0.001865672, 0.001865672  (x, y)
#extent      : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84
#source      : memory
#name        : hd2011010000
#min value   :            0
#max value   :           15

Although on another Ubuntu with GDAL 2.2.3 it works with stars but not with terra. So there may be more to it.

rhijmans commented 2 years ago

It could also possibly be related to how R GDAL HDF5 is built on windows: https://github.com/OSGeo/gdal/issues/1428

edzer commented 2 years ago

So terra puts a raster without georeference on the unit square, where GDAL (and stars) put them on the rectangle from (0,0) to (rcol,nrow). Is that on purpose?

dimfalk commented 2 years ago

I was about to say it cannot be ruled out I messed up a bit in my last late-night-session considering stars, but seems like you're already some steps ahead of this.. 😏

Nevertheless, just to add this as a supplement:

library(stars)
#> Lade nötiges Paket: abind
#> Lade nötiges Paket: sf
#> Linking to GEOS 3.9.1, GDAL 3.3.2, PROJ 7.2.1; sf_use_s2() is TRUE

(r <- read_stars("hd2011010000.scu"))
#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Error 1: H5Dread() failed for block.
#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Error 1: HDF5:"hd2011010000.scu":
#> //dataset_DXk/image, band 1: IReadBlock failed at X offset 0, 
#> Y offset 0: H5Dread() failed for block.
#> Error in CPL_read_gdal(as.character(x), as.character(options), as.character(driver), : read failure
sessionInfo()
#> R version 4.2.0 (2022-04-22 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=German_Germany.utf8  LC_CTYPE=German_Germany.utf8   
#> [3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C                   
#> [5] LC_TIME=German_Germany.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base    
#> 
#> other attached packages:
#> [1] stars_0.5-5 sf_1.0-7    abind_1.4-5
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.8.3       pillar_1.7.0       compiler_4.2.0     highr_0.9         
#>  [5] class_7.3-20       R.methodsS3_1.8.2  R.utils_2.11.0     tools_4.2.0       
#>  [9] digest_0.6.29      evaluate_0.15      lifecycle_1.0.1    tibble_3.1.7      
#> [13] R.cache_0.15.0     pkgconfig_2.0.3    rlang_1.0.2        reprex_2.0.1      
#> [17] cli_3.3.0          DBI_1.1.3          rstudioapi_0.13    parallel_4.2.0    
#> [21] yaml_2.3.5         xfun_0.31          fastmap_1.1.0      e1071_1.7-11      
#> [25] dplyr_1.0.9        withr_2.5.0        styler_1.7.0       stringr_1.4.0     
#> [29] knitr_1.39         generics_0.1.2     fs_1.5.2           vctrs_0.4.1       
#> [33] tidyselect_1.1.2   grid_4.2.0         classInt_0.4-7     glue_1.6.2        
#> [37] R6_2.5.1           fansi_1.0.3        rmarkdown_2.14     purrr_0.3.4       
#> [41] magrittr_2.0.3     units_0.8-0        ellipsis_0.3.2     htmltools_0.5.2   
#> [45] assertthat_0.2.1   KernSmooth_2.23-20 utf8_1.2.2         proxy_0.4-27      
#> [49] stringi_1.7.6      lwgeom_0.2-8       crayon_1.5.1       R.oo_1.25.0
edzer commented 2 years ago

Yes, sorry, I forgot to say that I was looking at stars from github, remotes::install_github("r-spatial/stars")

rhijmans commented 2 years ago

@edzer: terra assigns an extent of (0, ncol, 0, nrow) when creating a SpatRaster from a matrix (#1); but it is (0,1,0,1) when a file does not supply an extent. I cannot say that there is a strong reason for that.

To get the dev version of stars on windows I would instead do install.packages('terra', repos='https://rspatial.r-universe.dev') --- but either way, that won't change anything in this case.

I assume that this is a GDAL bug that will go away once R-tools gets to GDAL >= 3.4.0; so I am closing the issue.

dimfalk commented 2 years ago

Ok, thank you very much for your efforts and all the explanations!

terra assigns an extent of (0, ncol, 0, nrow) when creating a SpatRaster from a matrix (https://github.com/rspatial/terra/issues/1); but it is (0,1,0,1) when a file does not supply an extent. I cannot say that there is a strong reason for that.

I tried providing crs and extent parameters before opening this issue - presumably I was hoping to be able to override the defaults of the container resp. supply missing values - but this did not really seem to change anything:

library(terra)
#> terra 1.5.34

terra::rast("hd2011010000.scu")
#> Warning: [rast] unknown extent
#> class       : SpatRaster 
#> dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
#> resolution  : 0.001865672, 0.001865672  (x, y)
#> extent      : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 
#> source      : hd2011010000.scu 
#> name        : hd2011010000

terra::rast("hd2011010000.scu", 
            crs = sf::st_crs(25832)$proj4string)
#> Error in .local(x, ...): unused argument (crs = "+proj=utm +zone=32 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs")

terra::rast("hd2011010000.scu", 
            extent = c(195000, 731000, 5407000, 5943000) |> terra::ext())
#> Error in .local(x, ...): unused argument (extent = new("SpatExtent", ptr = new("Rcpp_SpatExtent", .xData = <environment>)))

I assume that this is a GDAL bug that will go away once R-tools gets to GDAL >= 3.4.0; so I am closing the issue.

Being quite a newbie here, let me ask a little naive question: How long would it take from your point of view approximately until Rtools gets to GDAL >= 3.4.0?

In the meantime, I would still be able to make use of rhdf5 - so this should not be that critical. However, since these are my first attempts to work with these containers - and as far as I understand, there are lots of flavors of hdf5 - is there some kind of a standard how spatial information attributes are be named and where to be placed in order to make the file readable for e.g. GDAL?

I'm asking because the information seems to be available per se (crs + origin + ncols/nrows + cellsize should add up to the extent) according to gdalinfo, but is named quite unconventionally from my personal point of view:

#[9] "  dataset_DXk_what_nodata=-999 "

#[37] "  where_projdef=UTM_N32"
#[41] "  where_UL_ipixel=195 "
#[42] "  where_UL_jpixel=5943 "
#[43] "  where_xscale_[m]=1000 "
#[44] "  where_xsize=536 "
#[45] "  where_yscale_[m]=1000 "
#[46] "  where_ysize=536 "

Thanks again!

rhijmans commented 2 years ago

After a bit more digging, I now see that:

So, given that the file can be read on linux and OSX, but not with RTools GDAL on windows, it seems more likely an HDF5/GDAL building issue; perhaps similar to the issue discussed here: https://github.com/OSGeo/gdal/issues/1428. I am adding @jeroen who runs some of this, but I realize that he may not be able to do much with the limited information at hand.

As for the extent and crs. You can set the correct extent after you open the file, by extracting it from the metadata (I do not know how but I suppose that is documented somewhere), just like I show for the NAflag. It won't be automatic because the way it is stored does not appear to follow a more standard approach.

dimfalk commented 2 years ago

No idea if this is relevant but I was also able to display the data in QGIS 3.20.1-Odense using GDAL 3.3.1 without any problems - making it seem like this is not an issue being fixed automatically with GDAL >= 3.4.0.

jeroen commented 2 years ago

Can you test if the problem appears on both R-4.1 and R-4.2? Note that R-4.2 uses a different GDAL (from Tomas Kaliber) than R-4.1 (from rwinlib).

dimfalk commented 2 years ago

Yeah, sure.

library(terra)
#> Warning: Paket 'terra' wurde unter R Version 4.1.3 erstellt
#> terra 1.5.34

terra::rast("hd2011010000.scu")
#> Warning: [rast] unknown extent
#> class       : SpatRaster 
#> dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
#> resolution  : 0.001865672, 0.001865672  (x, y)
#> extent      : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 
#> source      : hd2011010000.scu 
#> name        : hd2011010000

terra::rast("hd2011010000.scu") |> plot()
#> Warning: [rast] unknown extent
#> Warning: H5Dread() failed for block. (GDAL error 1)
#> Warning: HDF5:"hd2011010000.scu":
#> //dataset_DXk/image, band 1: IReadBlock failed at X offset 0,
#> Y offset 0: H5Dread() failed for block. (GDAL error 1)
#> Error: [readValues] cannot read values
library(stars)
#> Warning: Paket 'stars' wurde unter R Version 4.1.3 erstellt
#> Lade nötiges Paket: abind
#> Warning: Paket 'abind' wurde unter R Version 4.1.1 erstellt
#> Lade nötiges Paket: sf
#> Warning: Paket 'sf' wurde unter R Version 4.1.3 erstellt
#> Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1; sf_use_s2() is TRUE

stars::read_stars("hd2011010000.scu")
#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Error 1: H5Dread() failed for block.
#> Warning in CPL_read_gdal(as.character(x), as.character(options),
#> as.character(driver), : GDAL Error 1: HDF5:"hd2011010000.scu"://dataset_DXk/
#> image, band 1: IReadBlock failed at X offset 0, Y offset 0: H5Dread() failed for
#> block.
#> Error in CPL_read_gdal(as.character(x), as.character(options), as.character(driver), : read failure
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 17763)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
#> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
#> [5] LC_TIME=German_Germany.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] stars_0.5-5  sf_1.0-7     abind_1.4-5  terra_1.5-34
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.8.3       compiler_4.1.0     pillar_1.7.0       highr_0.9         
#>  [5] class_7.3-20       tools_4.1.0        digest_0.6.29      tibble_3.1.7      
#>  [9] evaluate_0.15      lifecycle_1.0.1    pkgconfig_2.0.3    rlang_1.0.2       
#> [13] reprex_2.0.1       DBI_1.1.3          cli_3.3.0          rstudioapi_0.13   
#> [17] parallel_4.1.0     yaml_2.3.5         xfun_0.31          fastmap_1.1.0     
#> [21] e1071_1.7-11       withr_2.5.0        stringr_1.4.0      dplyr_1.0.9       
#> [25] knitr_1.39         generics_0.1.2     fs_1.5.2           vctrs_0.4.1       
#> [29] tidyselect_1.1.2   classInt_0.4-7     grid_4.1.0         glue_1.6.2        
#> [33] R6_2.5.1           fansi_1.0.3        rmarkdown_2.14     purrr_0.3.4       
#> [37] magrittr_2.0.3     codetools_0.2-18   htmltools_0.5.2    ellipsis_0.3.2    
#> [41] units_0.8-0        utf8_1.2.2         KernSmooth_2.23-20 stringi_1.7.6     
#> [45] proxy_0.4-27       lwgeom_0.2-8       crayon_1.5.1
rhijmans commented 2 years ago

I also see this in R 4.1 and 4.3 --- both on windows

# R version 4.1.3 (2022-03-10) -- "One Push-Up"
library(terra)
#terra 1.5.48
gdal()
#[1] "3.4.1"
r = rast("hd2011010000.scu")
Warning message:
#[rast] unknown extent
r * 1
#Error: [*] cannot read from hd2011010000.scu
#R Under development (unstable) (2022-06-26 r82524 ucrt)
library(terra)
#terra 1.5.48
gdal()
#[1] "3.4.3"
r = rast("hd2011010000.scu")
#Warning message:
#[rast] unknown extent
r * 1
#Error: [*] cannot read from hd2011010000.scu
dimfalk commented 2 years ago

Guess I am not contributing anything significantly new to this discussion but since I found some time to mess around with Linux, I can at least confirm that reading the data works on Ubuntu 20.04 with R 4.1.3 as well as R 4.2.1 using both, terra and stars, in contrast to the behaviour on windows (s. above).

# R version 4.1.3 (2022-03-10) -- "One Push-Up"
library(terra)
#terra 1.5.34
gdal()
#[1] "3.0.4"
r = rast("hd2011010000.scu")
Warning message:
#[rast] unknown extent
r * 1
#class       : SpatRaster 
#dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
#resolution  : 0.001865672, 0.001865672  (x, y)
#extent      : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 
#source      : memory 
#name        : hd2011010000 
#min value   :         -999 
#max value   :           15 
# R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
library(terra)
#terra 1.5.34
gdal()
#[1] "3.0.4"
r = rast("hd2011010000.scu")
Warning message:
#[rast] unknown extent
r * 1
#class       : SpatRaster 
#dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
#resolution  : 0.001865672, 0.001865672  (x, y)
#extent      : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 
#source      : memory 
#name        : hd2011010000 
#min value   :         -999 
#max value   :           15 

Since I feel like this is OS-related and not an issue with terra (or stars) per se, I do not know if this is really the right place to investigate (is it e.g. possible to move issues to another repo?)... However, if there is a way I might help, please let me know, @jeroen.

Thank you very much in advance!

rhijmans commented 2 years ago

Perhaps @roualt can comment?

rsbivand commented 2 years ago

followup from https://github.com/r-spatial/sf/issues/1995#issuecomment-1250199092

Windows HDF5 version seems to be 1.12.0 https://svn.r-project.org/R-dev-web/trunk/WindowsBuilds/winutf8/ucrt3/toolchain_libs/mxe/src/hdf5.mk, confirmed in Rtools42 include/H5pubconf.h.

Fedora 36, HDF5 1.12.1, GDAL 3.5.2:

> library(terra)
terra 1.6.18
> x <- rast("hd2011010000.scu")
Warning message:
[rast] unknown extent

> x
class       : SpatRaster 
dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
resolution  : 1, 1  (x, y)
extent      : 0, 536, 0, 536  (xmin, xmax, ymin, ymax)
coord. ref. :  
source      : hd2011010000.scu 
name        : hd2011010000 
> y <- x + 1
> y
class       : SpatRaster 
dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
resolution  : 1, 1  (x, y)
extent      : 0, 536, 0, 536  (xmin, xmax, ymin, ymax)
coord. ref. :  
source      : memory 
name        : hd2011010000 
min value   :         -998 
max value   :           16 

Windows 10, my binary build of terra from the sf issue:

> library(terra)
terra 1.6.18
> gdal()
[1] "3.5.0"
> x <- rast("hd2011010000.scu")
Warning message:
[rast] unknown extent

> x
class       : SpatRaster 
dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
resolution  : 1, 1  (x, y)
extent      : 0, 536, 0, 536  (xmin, xmax, ymin, ymax)
coord. ref. :  
source      : hd2011010000.scu 
name        : hd2011010000 
> y <- x * 1
|---------|---------|---------|---------|
=========================================

Warning messages:
1: H5Dread() failed for block. (GDAL error 1) 
2: HDF5:"C:/Users/RB/work/hd2011010000.scu"://dataset_DXk/image, band 1: IReadBlock failed at X offset 0, Y offset 0: H5Dread() failed for block. (GDAL error 1) 
> 
> y
class       : SpatRaster 
dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
resolution  : 1, 1  (x, y)
extent      : 0, 536, 0, 536  (xmin, xmax, ymin, ymax)
coord. ref. :  
> 

No ideas from me, never use HDF at all; if HDF5 is not UCRT-ready, or uses std::regex, it might fail if the problem only occurred R >= 4.2, but is reported above in R 4.1 on Windows.

For completeness, R 4.2.1, HDF5 1.12.0 macOS arm64:

> library(terra)
terra 1.6.7
> gdal()
[1] "3.4.2"
> x <- rast("hd2011010000.scu")
Warning message:
[rast] unknown extent

> #Warning message:
> #[rast] unknown extent
> y <- x * 1
> 
> y
class       : SpatRaster 
dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
resolution  : 0.001865672, 0.001865672  (x, y)
extent      : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
coord. ref. :  
source      : memory 
name        : hd2011010000 
min value   :         -999 
max value   :           15 
rsbivand commented 2 years ago

Windows OSGeo4W:

>gdalinfo  hd2011010000.scu
Driver: netCDF/Network Common Data Format
Files: hd2011010000.scu
Size is 536, 536
Metadata:
  /dataset_DXk/image#IMAGE_BACKGROUNDINDEX=0
  /dataset_DXk/image#IMAGE_COLORMODEL=RGB
  /dataset_DXk/image#IMAGE_MINMAXRANGE={0,0}
  /dataset_DXk/image#IMAGE_SUBCLASS=BITMAP
  /dataset_DXk/image#IMAGE_VERSION=1.2
  /dataset_DXk/what/NC_GLOBAL#elev [deg]=0.5
  /dataset_DXk/what/NC_GLOBAL#enddate [YYYYMMDD]=20201101
  /dataset_DXk/what/NC_GLOBAL#endtime [HHmmss]=000000
  /dataset_DXk/what/NC_GLOBAL#gain=1
  /dataset_DXk/what/NC_GLOBAL#nodata=-999
  /dataset_DXk/what/NC_GLOBAL#ori_format=DX
  /dataset_DXk/what/NC_GLOBAL#ori_name=HD2011010000.scu
  /dataset_DXk/what/NC_GLOBAL#product=COMP
  /dataset_DXk/what/NC_GLOBAL#quantity=RATE
  /dataset_DXk/what/NC_GLOBAL#rad=1
  /dataset_DXk/where/NC_GLOBAL#xsize=536
  /dataset_DXk/where/NC_GLOBAL#ysize=536
  /how/NC_GLOBAL#a1gate=1512001264
  /how/NC_GLOBAL#adjustment=1
  /how/NC_GLOBAL#angles [deg]=0.5
  /how/NC_GLOBAL#doppler=4
  /how/NC_GLOBAL#maxlevel [dBZ]=65
  /how/NC_GLOBAL#minlevel [dBZ]=0
  /how/NC_GLOBAL#nodes=ESS,NHB,FLD,
  /how/NC_GLOBAL#nodesn=3
  /how/NC_GLOBAL#place=fld
  /how/NC_GLOBAL#resolution [dBZ]=0.5
  /how/NC_GLOBAL#wavelength [cm]=5
  /how/NC_GLOBAL#WMO=121
  /how/NC_GLOBAL#zr_a=256
  /how/NC_GLOBAL#zr_b=1.42
  /what/NC_GLOBAL#date [YYYYMMDD]=20201101\
  /what/NC_GLOBAL#object=IMAGE
  /what/NC_GLOBAL#sets=1
  /what/NC_GLOBAL#time [HHmmss]=000000\
  /what/NC_GLOBAL#version [H5Vol 0.2]=SCP_BIG 5.0.11.24
  /where/NC_GLOBAL#poserror=
  /where/NC_GLOBAL#projdef=UTM_N32
  /where/NC_GLOBAL#range=128
  /where/NC_GLOBAL#start_lat [deg]=5943
  /where/NC_GLOBAL#start_lon [deg]=195
  /where/NC_GLOBAL#UL_ipixel=195
  /where/NC_GLOBAL#UL_jpixel=5943
  /where/NC_GLOBAL#xscale [m]=1000
  /where/NC_GLOBAL#xsize=536
  /where/NC_GLOBAL#yscale [m]=1000
  /where/NC_GLOBAL#ysize=536
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,  536.0)
Upper Right (  536.0,    0.0)
Lower Right (  536.0,  536.0)
Center      (  268.0,  268.0)
Band 1 Block=64x72 Type=Float32, ColorInterp=Undefined
  Metadata:
    IMAGE_BACKGROUNDINDEX=0
    IMAGE_COLORMODEL=RGB
    IMAGE_MINMAXRANGE={0,0}
    IMAGE_SUBCLASS=BITMAP
    IMAGE_VERSION=1.2
    NETCDF_VARNAME=image
There are 2 HDF5 objects open!
>gdal_translate -of GTiff hd2011010000.scu hd2011010000.tif
Input file size is 536, 536
0...10...20...30...40...50...60...70...80...90...100 - done.
There are 2 HDF5 objects open!

Report: open objects on 72057594037927936
Type = File(72057594037927936) name='/'Type = Attribute(432345564227567616) name='CLASS'ERROR 1: netcdf error #-101 : NetCDF: HDF error .
at (D:\src\osgeo4w\src\gdal\gdal-3.5.1\frmts\netcdf\netcdfdataset.cpp,netCDFDataset::~netCDFDataset,2847)

Report: open objects on 72057594037927936
Type = File(72057594037927936) name='/'Type = Attribute(432345564227567616) name='CLASS'ERROR 1: netcdf error #-101 : NetCDF: HDF error .
at (D:\src\osgeo4w\src\gdal\gdal-3.5.1\frmts\netcdf\netcdfdataset.cpp,netCDFDataset::~netCDFDataset,2847)

The file converted with OSGeo4W's gdal_translate may or may not be complete, gdalinfo is showing errors of some kind. This is not the GDAL build used by Rtools42.

> library(terra)
terra 1.6.18
> x <- rast("hd2011010000.tif")
Warning message:
[rast] unknown extent

> y <- x * 1
|---------|---------|---------|---------|
=========================================

> y
class       : SpatRaster 
dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
resolution  : 1, 1  (x, y)
extent      : 0, 536, 0, 536  (xmin, xmax, ymin, ymax)
coord. ref. :  
source      : memory 
name        : hd2011010000 
min value   :         -999 
max value   :           15 
>
rhijmans commented 2 years ago

It appears that the cause of the problem is that the windows HDF5 library was built without the "szip" library. That library is needed to read this file (presumably because it was compressed using szip). See https://github.com/r-spatial/sf/issues/1995#issuecomment-1250364222 by @kalibera.

rsbivand commented 2 years ago

@rhijmans I think that the problem is that szip has had an unfortunate license: https://support.hdfgroup.org/doc_resource/SZIP/. I'm unsure whether the HDF5 driver should expect to write to a file created with szip. It is an open question whether packagers of hdf5 should build with szip read/write or just default read. Could some check what for example Debian says about szip?

jeroen commented 2 years ago

Homebrew has replaced szip with libaec, see: https://gitlab.dkrz.de/k202009/libaec/-/blob/master/README.SZIP

rsbivand commented 2 years ago

Fedora also distributes libaec, and HDF5 is built using it, so that explains being able to write to Szip-encoded files. MXE has aec, and the hdf5 MXE build now uses it. Untested with the latest MXE development versions.

rsbivand commented 2 years ago

With current library binaries in https://www.r-project.org/nosvn/winutf8/ucrt3/gcc10_ucrt3_full_5349.tar.zst, copied into Rtools42, locally manually patched terra/src/Makevars.ucrt (https://www.r-project.org/nosvn/winutf8/ucrt3/patches/CRAN/terra.diff), and r-devel 5349 :

> library(terra)
terra 1.6.21
> gdal()
[1] "3.5.2"
> x <- rast("hd2011010000.scu")
Warning message:
[rast] unknown extent

> y <- x * 1
> y
class       : SpatRaster 
dimensions  : 536, 536, 1  (nrow, ncol, nlyr)
resolution  : 1, 1  (x, y)
extent      : 0, 536, 0, 536  (xmin, xmax, ymin, ymax)
coord. ref. :  
source      : memory 
name        : hd2011010000 
min value   :         -999 
max value   :           15 
>

So when the next release of Rtools occurs, libael will be used when building HDF5. Thanks to @kalibera for adding AEC to the Rtools42 MXE build tree.

rhijmans commented 2 years ago

Thank you very much @kalibera, @jeroen and @rsbivand!

dimfalk commented 2 years ago

Thank you very much to everyone for digging into this!