r-spatial / stars

Spatiotemporal Arrays, Raster and Vector Data Cubes
https://r-spatial.github.io/stars/
Apache License 2.0
554 stars 94 forks source link

Make c work for disparate windows of the same layer #210

Closed kendonB closed 3 years ago

kendonB commented 5 years ago

There are two cases to consider. The input stars objects either have:

  1. same deltas and the offsets differ by delta*N (N integer).
  2. Above condition doesn't hold (i.e. the input objects come from different parent grids)

One approach could be to:

A real life example that currently fails (case 2.):

elevation_nz.zip

library(stars)
library(raster)
library(tidyverse)
library(cowplot)

stars_objects = list.files("elevation_nz", full.names = TRUE) %>% 
  str_subset(".tif") %>% 
  map(read_stars)

stars_objects = stars_objects %>% 
  map(as, "Raster") %>% 
  map(raster::aggregate, fact = 30) %>% 
  map(st_as_stars)

plot1 = ggplot() + 
  geom_stars(data = stars_objects[[1]]) +
  coord_equal() + 
  theme(legend.position = "none",
        axis.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        axis.line = element_blank()) + 
  ggtitle("object 1")

plot2 = ggplot() + 
  geom_stars(data = stars_objects[[2]]) + 
  coord_equal() + 
  theme(legend.position = "none",
        axis.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        axis.line = element_blank()) + 
  ggtitle("object 2")

plot3 = ggplot() + 
  geom_stars(data = stars_objects[[1]]) +
  geom_stars(data = stars_objects[[2]]) + 
  coord_equal() + 
  theme(legend.position = "none",
        axis.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        axis.line = element_blank()) + 
  ggtitle("objects 1 and 2")

plot_grid(plot_grid(plot1, plot2, ncol = 1, rel_heights = c(1, 1.65)), plot3, nrow = 1)


# Offsets don't quite align and don't have the same x range
st_dimensions(stars_objects[[1]])
#>   from  to  offset delta                       refsys point values    
#> x    1 187 1498853   750 +proj=tmerc +lat_0=0 +lon...    NA   NULL [x]
#> y    1 112 6227902  -750 +proj=tmerc +lat_0=0 +lon...    NA   NULL [y]
st_dimensions(stars_objects[[2]])
#>   from  to  offset delta                       refsys point values    
#> x    1 186 1499155   750 +proj=tmerc +lat_0=0 +lon...    NA   NULL [x]
#> y    1 209 6144000  -750 +proj=tmerc +lat_0=0 +lon...    NA   NULL [y]

# Fails
c(stars_objects[[1]], stars_objects[[2]])
#> Error in c.stars(stars_objects[[1]], stars_objects[[2]]): don't know how to merge arrays: please specify parameter along
# Fails
c(stars_objects[[1]], stars_objects[[2]], along = "x")
#> Error in (function (..., along = N, rev.along = NULL, new.names = NULL, : arg 'X2' has dims=186, 209; but need dims=X, 112

Created on 2019-08-14 by the reprex package (v0.3.0)

edzer commented 5 years ago

Could you make the tif files available somehow?

edzer commented 5 years ago

Not sure whether c is the right tool for this: c has an along argument that lets you combine along any dimension, but in the sense of gluing, like your example, not like compositing, as you describe. Maybe sth like this?

library(stars)
# Loading required package: abind
# Loading required package: sf
# Linking to GEOS 3.7.0, GDAL 2.4.0, PROJ 5.2.0

st_vrt = function(..., options = character(0)) {
    dst = tempfile(fileext = ".tif")
    lst_write = function(obj) {
        fname = tempfile(fileext = ".tif")
        write_stars(obj, fname)
        fname
    }
    src = sapply(list(...), lst_write)
    gdal_utils("buildvrt", src, dst, options)
    #sapply(src, unlink)
    ret = read_stars(dst)
    #unlink(dst)
    ret
}

tif = system.file("tif/L7_ETMs.tif", package = "stars")
x = read_stars(tif)

x1 = x[,5:50, 7:80,]
x2 = x[,10:150, 5:20,]
x3 = x[,80:90, 100:115,]
x4 = x[,3:253, 22:33,]
x5 = x[,33:44, 55:66,]
x6 = x[,9:90, 8:80,]

out = st_vrt(x1,x2,x3,x4,x5,x6)
warnings()
out[[1]][out[[1]]==0] = NA
out
# stars object with 3 dimensions and 1 attribute
# attribute(s):
#  file652e2bae051d.tif 
#  Min.   : 11.00       
#  1st Qu.: 46.00       
#  Median : 61.00       
#  Mean   : 62.22       
#  3rd Qu.: 75.00       
#  Max.   :255.00       
#  NA's   :109314       
# dimension(s):
#      from  to  offset delta                       refsys point values    
# x       1 251  288833  28.5 +proj=utm +zone=25 +south...    NA   NULL [x]
# y       1 111 9120647 -28.5 +proj=utm +zone=25 +south...    NA   NULL [y]
# band    1   6      NA    NA                           NA    NA   NULL    
plot(x[,,,1], reset = FALSE)
plot(out[,,,1], col = sf.colors(9), add = TRUE, breaks = seq(10,255,length.out=10))

x

kendonB commented 5 years ago

Whoops I intended to add the files. Added above now.

kendonB commented 5 years ago

That function works quite well!

Only thing is it looks like it adds 0s in some bits where they should be NAs:

library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
library(raster)
#> Loading required package: sp
library(tidyverse)
#> Registered S3 methods overwritten by 'ggplot2':
#>   method         from 
#>   [.quosures     rlang
#>   c.quosures     rlang
#>   print.quosures rlang
library(cowplot)
#> 
#> Attaching package: 'cowplot'
#> The following object is masked from 'package:ggplot2':
#> 
#>     ggsave

st_vrt = function(..., options = character(0)) {
  dst = tempfile(fileext = ".tif")
  lst_write = function(obj) {
    fname = tempfile(fileext = ".tif")
    write_stars(obj, fname)
    fname
  }
  src = sapply(list(...), lst_write)
  gdal_utils("buildvrt", src, dst, options)
  #sapply(src, unlink)
  ret = read_stars(dst)
  #unlink(dst)
  ret
}

stars_objects = list.files("C:/Users/bellk/Downloads/elevation_nz", full.names = TRUE) %>% 
  str_subset(".tif") %>% 
  map(read_stars)

merged = st_vrt(stars_objects[[1]], stars_objects[[2]])
merged = merged %>% 
  as("Raster") %>% 
  raster::aggregate(fact = 30) %>% 
  st_as_stars()

stars_objects = stars_objects %>% 
  map(as, "Raster") %>% 
  map(raster::aggregate, fact = 30) %>% 
  map(st_as_stars)

plot1 = ggplot() + 
  geom_stars(data = stars_objects[[1]]) +
  geom_stars(data = stars_objects[[2]]) + 
  coord_equal() + 
  theme(legend.position = "none",
        axis.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        axis.line = element_blank()) + 
  ggtitle("objects 1 and 2")

plot2 = ggplot() + 
  geom_stars(data = merged) +
  coord_equal() + 
  theme(legend.position = "none",
        axis.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        axis.line = element_blank()) + 
  ggtitle("st_vrted")

plot_grid(plot1, plot2, nrow = 1)

Created on 2019-08-16 by the reprex package (v0.3.0)

edzer commented 5 years ago

Thanks! Specifying options = c("-vrtnodata", "-9999") in the call to st_vrt seems to solve the zero problem.

edzer commented 5 years ago

Would st_mosaic be the better name for this function?

adrfantini commented 5 years ago

Would st_mosaic be the better name for this function?

That'd be nice.

kendonB commented 5 years ago

The file name comes in as the layer name when reading tifs. I think it would be good to create a temporary directory then write the new tif inside that. You could expose name as an argument with layer as the default which would write layer.tif. What do you think?

kendonB commented 5 years ago

Should the unlink call be in an on.exit?

Nowosad commented 4 years ago

Hi @edzer, I've tried to use st_moisac() lately and stumbled across two things:

  1. The user must provide each object separately (the function does not accept lists). See approach 1 and 3 below.
  2. The st_moisac() function does not work on the outputs of read_ncdf(). It is important in the example files, because read_stars() does not preserves NA values. You can find the files at mod.zip.
library(stars)
files = dir(full.names = TRUE, pattern = "*.hdf$")

# approach 1 --------------------------------------------------------------
ncdf_list_out = lapply(files, read_ncdf, var = "Npp_1km")
out1 = st_mosaic(ncdf_list_out)

# approach 2 --------------------------------------------------------------
ncdf1 = read_ncdf(files[1])
ncdf2 = read_ncdf(files[2])
ncdf3 = read_ncdf(files[3])
out2 = st_mosaic(ncdf1, ncdf2, ncdf3)

# approach 3 --------------------------------------------------------------
stars_list_out = lapply(files, read_stars, sub = "Npp_1km")
out3 = st_mosaic(stars_list_out)

# approach 4 --------------------------------------------------------------
stars1 = read_stars(files[1])
stars2 = read_stars(files[2])
stars3 = read_stars(files[3])
out2 = st_mosaic(stars1, stars2, stars3)

# works!
plot(out2)
edzer commented 4 years ago
  1. Maybe learn using do.call?
do.call(st_mosaic, stars_list_out)

Maybe we should add an example in the help of `st_mosaic of its use.

  1. I see only failure trying to read these HDF5 files with read_ncdf:
    > ncdf1 = read_ncdf(files[1])
    Error in nc_meta.character(.x) : 
    failed to open 'x', value given was: "./MOD17A3.A2014001.h31v09.055.2016008121113.hdf"

    and

    gdal_utils("info", "MOD17A3.A2014001.h31v09.055.2016008121113.hdf")

    also does not recognise these as NetCDF files (nor does the ncdump utility)

mdsumner commented 4 years ago

(because they are HDF4 not HDF5 , still used in lots of streams - NetCDF-4 is derived from HDF5, but HDF4 is the old legacy form)

gdalinfo MOD17A3.A2014001.h31v09.055.2016008121113.hdf 
Driver: HDF4/Hierarchical Data Format Release 4
Files: MOD17A3.A2014001.h31v09.055.2016008121113.hdf
Size is 512, 512
Coordinate System is `'
Nowosad commented 4 years ago

Hi @edzer,

  1. I do know do.call(). However, I think users would prefer that st_mosaic() just working on the list of file names (if that's possible).
  2. I am able to read the files using read_ncdf().
> library(stars)
Loading required package: abind
Loading required package: sf
Linking to GEOS 3.7.1, GDAL 2.3.2, PROJ 5.2.0
> 
> files = dir(full.names = TRUE, pattern = "*.hdf$")
> 
> ncdf1 = read_ncdf(files[1], var = "Npp_1km")
Warning messages:
1: In .get_nc_projection(meta$attribute, rep_var, all_coord_var) :
  No projection information found in nc file.
2: ignoring unrecognized unit: kg_C/m^2 
> ncdf1
stars object with 2 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
    Npp_1km     
 Min.   :0.00   
 1st Qu.:0.81   
 Median :0.88   
 Mean   :0.81   
 3rd Qu.:0.93   
 Max.   :1.25   
 NA's   :92473  
dimension(s):
                      from   to offset delta refsys point values    
XDim:MOD_Grid_MOD17A3    1 1200    0.5     1     NA    NA   NULL [x]
YDim:MOD_Grid_MOD17A3    1 1200    0.5     1     NA    NA   NULL [y]
edzer commented 4 years ago
  1. Of course this can be made to work, the question is whether we want this, but it's bikeshedding IMO. It would give the same mess as sf::st_sf() (and data.frame, for that sake). What if you'd give two lists to st_mosaic? Or a vector of data source names? And so on.
  2. What is your sessionInfo()?
Nowosad commented 4 years ago
  1. Yep. I understand the issues there.
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 30 (Workstation Edition)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stars_0.4-0 sf_0.8-0    abind_1.4-5

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2         rstudioapi_0.10    magrittr_1.5       units_0.6-4        tidyselect_0.2.5  
 [6] R6_2.4.0           rlang_0.4.1.9000   dplyr_0.8.3        tools_3.6.1        parallel_3.6.1    
[11] grid_3.6.1         packrat_0.5.0      KernSmooth_2.23-15 ncmeta_0.2.0       e1071_1.7-2       
[16] DBI_1.0.0          class_7.3-15       assertthat_0.2.1   lifecycle_0.1.0    tibble_2.1.3      
[21] lwgeom_0.1-7       RNetCDF_2.1-1      crayon_1.3.4       tidyr_1.0.0        purrr_0.3.3       
[26] vctrs_0.2.0.9007   zeallot_0.1.0      glue_1.3.1         compiler_3.6.1     pillar_1.4.2      
[31] backports_1.1.5    classInt_0.4-1     pkgconfig_2.0.3   

And st_drivers():

name long_name write copy is_raster is_vector vsi
PCIDSK PCIDSK PCIDSK Database File TRUE FALSE TRUE TRUE TRUE
netCDF netCDF Network Common Data Format TRUE TRUE TRUE TRUE FALSE
JP2OpenJPEG JP2OpenJPEG JPEG-2000 driver based on OpenJPEG library FALSE TRUE TRUE TRUE TRUE
JPEG2000 JPEG2000 JPEG-2000 part 1 (ISO/IEC 15444-1), based on Jasper library FALSE TRUE TRUE TRUE TRUE
PDF PDF Geospatial PDF TRUE TRUE TRUE TRUE TRUE
MBTiles MBTiles MBTiles TRUE TRUE TRUE TRUE TRUE
ESRI Shapefile ESRI Shapefile ESRI Shapefile TRUE FALSE FALSE TRUE TRUE
MapInfo File MapInfo File MapInfo File TRUE FALSE FALSE TRUE TRUE
UK .NTF UK .NTF UK .NTF FALSE FALSE FALSE TRUE TRUE
OGR_SDTS OGR_SDTS SDTS FALSE FALSE FALSE TRUE TRUE
S57 S57 IHO S-57 (ENC) TRUE FALSE FALSE TRUE TRUE
DGN DGN Microstation DGN TRUE FALSE FALSE TRUE TRUE
OGR_VRT OGR_VRT VRT - Virtual Datasource FALSE FALSE FALSE TRUE TRUE
REC REC EPIInfo .REC FALSE FALSE FALSE TRUE FALSE
Memory Memory Memory TRUE FALSE FALSE TRUE FALSE
BNA BNA Atlas BNA TRUE FALSE FALSE TRUE TRUE
CSV CSV Comma Separated Value (.csv) TRUE FALSE FALSE TRUE TRUE
NAS NAS NAS - ALKIS FALSE FALSE FALSE TRUE TRUE
GML GML Geography Markup Language (GML) TRUE FALSE FALSE TRUE TRUE
GPX GPX GPX TRUE FALSE FALSE TRUE TRUE
LIBKML LIBKML Keyhole Markup Language (LIBKML) TRUE FALSE FALSE TRUE TRUE
KML KML Keyhole Markup Language (KML) TRUE FALSE FALSE TRUE TRUE
GeoJSON GeoJSON GeoJSON TRUE FALSE FALSE TRUE TRUE
ESRIJSON ESRIJSON ESRIJSON FALSE FALSE FALSE TRUE TRUE
TopoJSON TopoJSON TopoJSON FALSE FALSE FALSE TRUE TRUE
Interlis 1 Interlis 1 Interlis 1 TRUE FALSE FALSE TRUE TRUE
Interlis 2 Interlis 2 Interlis 2 TRUE FALSE FALSE TRUE TRUE
OGR_GMT OGR_GMT GMT ASCII Vectors (.gmt) TRUE FALSE FALSE TRUE TRUE
GPKG GPKG GeoPackage TRUE TRUE TRUE TRUE TRUE
SQLite SQLite SQLite / Spatialite TRUE FALSE FALSE TRUE TRUE
OGR_DODS OGR_DODS OGR_DODS FALSE FALSE FALSE TRUE FALSE
ODBC ODBC ODBC TRUE FALSE FALSE TRUE FALSE
WAsP WAsP WAsP .map format TRUE FALSE FALSE TRUE TRUE
PGeo PGeo ESRI Personal GeoDatabase FALSE FALSE FALSE TRUE FALSE
MSSQLSpatial MSSQLSpatial Microsoft SQL Server Spatial Database TRUE FALSE FALSE TRUE FALSE
OGR_OGDI OGR_OGDI OGDI Vectors (VPF, VMAP, DCW) FALSE FALSE FALSE TRUE FALSE
PostgreSQL PostgreSQL PostgreSQL/PostGIS TRUE FALSE FALSE TRUE FALSE
MySQL MySQL MySQL TRUE FALSE FALSE TRUE FALSE
OpenFileGDB OpenFileGDB ESRI FileGDB FALSE FALSE FALSE TRUE TRUE
XPlane XPlane X-Plane/Flightgear aeronautical data FALSE FALSE FALSE TRUE TRUE
DXF DXF AutoCAD DXF TRUE FALSE FALSE TRUE TRUE
CAD CAD AutoCAD Driver FALSE FALSE TRUE TRUE TRUE
Geoconcept Geoconcept Geoconcept TRUE FALSE FALSE TRUE TRUE
GeoRSS GeoRSS GeoRSS TRUE FALSE FALSE TRUE TRUE
GPSTrackMaker GPSTrackMaker GPSTrackMaker TRUE FALSE FALSE TRUE TRUE
VFK VFK Czech Cadastral Exchange Data Format FALSE FALSE FALSE TRUE FALSE
PGDUMP PGDUMP PostgreSQL SQL dump TRUE FALSE FALSE TRUE TRUE
OSM OSM OpenStreetMap XML and PBF FALSE FALSE FALSE TRUE TRUE
GPSBabel GPSBabel GPSBabel TRUE FALSE FALSE TRUE FALSE
SUA SUA Tim Newport-Peace's Special Use Airspace Format FALSE FALSE FALSE TRUE TRUE
OpenAir OpenAir OpenAir FALSE FALSE FALSE TRUE TRUE
OGR_PDS OGR_PDS Planetary Data Systems TABLE FALSE FALSE FALSE TRUE TRUE
WFS WFS OGC WFS (Web Feature Service) FALSE FALSE FALSE TRUE TRUE
WFS3 WFS3 OGC WFS 3 client (Web Feature Service) FALSE FALSE FALSE TRUE FALSE
HTF HTF Hydrographic Transfer Vector FALSE FALSE FALSE TRUE TRUE
AeronavFAA AeronavFAA Aeronav FAA FALSE FALSE FALSE TRUE TRUE
Geomedia Geomedia Geomedia .mdb FALSE FALSE FALSE TRUE FALSE
EDIGEO EDIGEO French EDIGEO exchange format FALSE FALSE FALSE TRUE TRUE
GFT GFT Google Fusion Tables TRUE FALSE FALSE TRUE FALSE
SVG SVG Scalable Vector Graphics FALSE FALSE FALSE TRUE TRUE
CouchDB CouchDB CouchDB / GeoCouch TRUE FALSE FALSE TRUE FALSE
Cloudant Cloudant Cloudant / CouchDB TRUE FALSE FALSE TRUE FALSE
Idrisi Idrisi Idrisi Vector (.vct) FALSE FALSE FALSE TRUE TRUE
ARCGEN ARCGEN Arc/Info Generate FALSE FALSE FALSE TRUE TRUE
SEGUKOOA SEGUKOOA SEG-P1 / UKOOA P1/90 FALSE FALSE FALSE TRUE TRUE
SEGY SEGY SEG-Y FALSE FALSE FALSE TRUE TRUE
XLS XLS MS Excel format FALSE FALSE FALSE TRUE FALSE
ODS ODS Open Document/ LibreOffice / OpenOffice Spreadsheet TRUE FALSE FALSE TRUE TRUE
XLSX XLSX MS Office Open XML spreadsheet TRUE FALSE FALSE TRUE TRUE
ElasticSearch ElasticSearch Elastic Search TRUE FALSE FALSE TRUE FALSE
Walk Walk Walk FALSE FALSE FALSE TRUE FALSE
Carto Carto Carto TRUE FALSE FALSE TRUE FALSE
AmigoCloud AmigoCloud AmigoCloud TRUE FALSE FALSE TRUE FALSE
SXF SXF Storage and eXchange Format FALSE FALSE FALSE TRUE TRUE
Selafin Selafin Selafin TRUE FALSE FALSE TRUE TRUE
JML JML OpenJUMP JML TRUE FALSE FALSE TRUE TRUE
PLSCENES PLSCENES Planet Labs Scenes API FALSE FALSE TRUE TRUE FALSE
CSW CSW OGC CSW (Catalog Service for the Web) FALSE FALSE FALSE TRUE FALSE
VDV VDV VDV-451/VDV-452/INTREST Data Format TRUE FALSE FALSE TRUE TRUE
GMLAS GMLAS Geography Markup Language (GML) driven by application schemas FALSE TRUE FALSE TRUE TRUE
MVT MVT Mapbox Vector Tiles TRUE FALSE FALSE TRUE TRUE
TIGER TIGER U.S. Census TIGER/Line TRUE FALSE FALSE TRUE TRUE
AVCBin AVCBin Arc/Info Binary Coverage FALSE FALSE FALSE TRUE TRUE
AVCE00 AVCE00 Arc/Info E00 (ASCII) Coverage FALSE FALSE FALSE TRUE TRUE
HTTP HTTP HTTP Fetching Wrapper FALSE FALSE TRUE TRUE FALSE
meteosimon commented 4 years ago

Would it be an option to have an use_mosaic as a logical argument in read_stars, or alternatively make along in read_stars (and c.stars) accept "use_mosaic"?

Right now I am doing

x <- lapply(hgtfiles, read_stars)
dem <- do.call("st_mosaic", x)

but would prefer doing

dem <- read_stars(hgtfiles, along = "use_mosaic")

or

dem <- read_stars(hgtfiles, use_mosaic = TRUE)

What do you think?

BTW: Could you add st_mosaic to the see also list of c.stars?

edzer commented 4 years ago

Looking back at how things are implemented, and imagining that users will want to do this on large files, I think I'm going to go back to what @Nowosad proposed: having multiple ways to invoke mosaic. I suggest to make st_mosaic a generic, and have a method that works on file names only; in your workflow that would become

dem = read_stars(st_mosaic(hgtfiles))

This would avoid reading all hgtfiles first, then writing them to disk before creating the mosaic, and might save a lot of memory.

edzer commented 4 years ago

... and that should now work.

mdsumner commented 4 years ago

works nicely with .vrt too, so no copy of the data required with read_stars(vrt, proxy = TRUE)

meteosimon commented 4 years ago

Very nice!

However, the st_mosaic method for character should return the name of the dst file (Otherwise sf::gdal_utils return a TRUE invisibly). Something like this:

st_mosaic.character = function(.x, ...,  dst = tempfile(fileext = file_ext),
                options = c("-vrtnodata", "-9999"), file_ext = ".tif") {
        sf::gdal_utils("buildvrt", .x, dst, options)
        dst
}