r-spatial / stars

Spatiotemporal Arrays, Raster and Vector Data Cubes
https://r-spatial.github.io/stars/
Apache License 2.0
554 stars 94 forks source link

Prediction on multiband rasters #448

Closed pat-s closed 3 years ago

pat-s commented 3 years ago

AFAICS there is no way around to "split" a multiband raster, i.e. convert it to a data.frame, if one wants to predict on it - is that correct?

In ?predict.stars I read:

separate predictors in object need to be separate attributes in object; in case they are e.g. in a band dimension, use 'split(object)'

I've found this part hard to understand in the first place when getting started with {stars}. It became clearer after looking at the modelling vignette, especially at section "stars objects as data.frame" (just as a feedback).

Both terra::predict() and raster::predict() seem to do this internally, i.e. if the predictors are stored in multiple bands, users can just pass the stack and they will be found/used.

Is it likely that this will stay like this in the future? Asking because I am currently about to add spatial prediction support in {mlr3spatial}. I was wondering if potentially {stars} could detect a multiband raster and try to do the split internally - the model should then complain anyhow if some variables are missing when doing the final predict call.

edzer commented 3 years ago

The terra data model is much simpler: it doesn't have attributes, and only copes with stacks of raster - whatever the stack "dimension" represents.

I'd like to make this simpler, but suppose you have a time series of spectral imagery, and time and spectral are both dimensions (meaning you have a single attribute, four-dimensional data cube) - how could a predict know what the different predictors are: the time instances, the spectral bands, or all unique combinations of time x band?

pat-s commented 3 years ago

Yes I see the issues.

I haven't yet dived into the multi-dim capabilities of {stars}, e.g. a variable with observations across multiple timestamps. For now, I was solely focusing on getting spatial-only prediction right. Maybe one option would be to let {stars} try some "smart" defaults, based on the dimension metadata information it has and report this during the call (something like "Detected XY, using Z as TZ" and so forth?

Some potentially helpful ideas:

I also see the advantages of unfolding the stars object into a data.frame first and then do predict.data.frame() and handing the variable curation over to the user. In {mlr3spatial} the return would be a {mlr3::Prediction} object and optionally a stars object written to disk - in {mlr3spatial} we anyhow do the prediction on a data.table internally, also for {terra} and {raster} objects. The use case for me only came up as I am about to benchmark predict() calls for various spatial packages and stumbled over the multi-band predict issue mentioned here :)

edzer commented 3 years ago

I agree that the simple case: a 3 D, single attr cube, where the third needs to be split for predict, could be a good default (with a message, and after trying & failing to call predict on that single attribute), and will look into that.

edzer commented 3 years ago

This should do it; would be great if you could check!

pat-s commented 3 years ago

Thanks!

In my reprex it does not work yet:

library(mlr3spatial)
#> Loading required package: mlr3
# remotes::install_github("mlr-org/mlr3spatial")

library(mlr3)
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.9.1, GDAL 3.3.1, PROJ 8.1.0

stack_rasterbrick <- demo_stack_rasterbrick(size = 1, layers = 5)
stack_stars <- stars::st_as_stars(stack_rasterbrick)
backend_stars <- as_stars_backend(stack_stars, quiet = TRUE, response = "y.1", response_is_factor = TRUE)
task_stars <- as_task_classif(backend_stars, target = "y.1", positive = "1")

set.seed(42)
row_ids <- sample(1:task_stars$nrow, 500)
e1071svm <- e1071::svm(y.1 ~ ., task_stars$data(rows = row_ids))

predict(e1071svm, stack_stars)
#> Error in eval(predvars, data, env): object 'x_2' not found

# works
stack_stars_df <- as.data.frame(split(stack_stars, "band"))
head(predict(e1071svm, stack_stars_df))
#> 1 2 3 4 5 6 
#> 1 1 1 1 1 1 
#> Levels: 1 0

Created on 2021-08-31 by the reprex package (v2.0.1)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.1 Patched (2021-08-27 r80829) #> os macOS Big Sur 10.16 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/Zurich #> date 2021-08-31 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib #> abind * 1.4-5 2016-07-21 [1] #> assertthat 0.2.1 2019-03-21 [1] #> backports 1.2.1 2020-12-09 [1] #> checkmate 2.0.0 2020-02-06 [1] #> class 7.3-19 2021-05-03 [2] #> classInt 0.4-3 2020-04-07 [1] #> cli 3.0.1 2021-07-17 [1] #> codetools 0.2-18 2020-11-04 [2] #> crayon 1.4.1 2021-02-08 [1] #> data.table 1.14.0 2021-02-21 [1] #> DBI 1.1.1 2021-01-15 [1] #> digest 0.6.27 2020-10-24 [1] #> dplyr 1.0.7 2021-06-18 [1] #> e1071 1.7-8 2021-07-28 [1] #> ellipsis 0.3.2 2021-04-29 [1] #> evaluate 0.14 2019-05-28 [1] #> fansi 0.5.0 2021-05-25 [1] #> fastmap 1.1.0 2021-01-25 [1] #> fs 1.5.0 2020-07-31 [1] #> future 1.22.1 2021-08-25 [1] #> generics 0.1.0 2020-10-31 [1] #> globals 0.14.0 2020-11-22 [1] #> glue 1.4.2 2020-08-27 [1] #> highr 0.9 2021-04-16 [1] #> htmltools 0.5.2 2021-08-25 [1] #> KernSmooth 2.23-20 2021-05-03 [2] #> knitr 1.33 2021-04-24 [1] #> lattice 0.20-44 2021-05-02 [2] #> lgr 0.4.2 2021-01-10 [1] #> lifecycle 1.0.0 2021-02-15 [1] #> listenv 0.8.0 2019-12-05 [1] #> lwgeom 0.2-7 2021-07-28 [1] #> magrittr 2.0.1 2020-11-17 [1] #> mlr3 * 0.12.0-9000 2021-08-26 [1] #> mlr3misc 0.9.3.9000 2021-08-23 [1] #> mlr3spatial * 0.0.0.9000 2021-08-28 [1] #> palmerpenguins 0.1.0 2020-07-23 [1] #> paradox 0.7.1 2021-03-07 [1] #> parallelly 1.27.0-9002 2021-08-30 [1] #> pillar 1.6.2 2021-07-29 [1] #> pkgconfig 2.0.3 2019-09-22 [1] #> proxy 0.4-26 2021-06-07 [1] #> purrr 0.3.4 2020-04-17 [1] #> R.cache 0.15.0 2021-04-30 [1] #> R.methodsS3 1.8.1 2020-08-26 [1] #> R.oo 1.24.0 2020-08-26 [1] #> R.utils 2.10.1 2020-08-26 [1] #> R6 2.5.1 2021-08-19 [1] #> raster 3.4-13 2021-06-18 [1] #> Rcpp 1.0.7 2021-07-07 [1] #> rematch2 2.1.2 2020-05-01 [1] #> reprex 2.0.1 2021-08-05 [1] #> rlang 0.4.11 2021-04-30 [1] #> rmarkdown 2.10.6 2021-08-30 [1] #> rstudioapi 0.13 2020-11-12 [1] #> sessioninfo 1.1.1 2018-11-05 [1] #> sf * 1.0-2 2021-07-26 [1] #> sp 1.4-5 2021-01-10 [1] #> stars * 0.5-4 2021-08-31 [1] #> stringi 1.7.4 2021-08-25 [1] #> stringr 1.4.0 2019-02-10 [1] #> styler 1.5.1.9001 2021-08-23 [1] #> tibble 3.1.4 2021-08-25 [1] #> tidyselect 1.1.1 2021-04-30 [1] #> units 0.7-2 2021-06-08 [1] #> utf8 1.2.2 2021-07-24 [1] #> uuid 0.1-4 2020-02-26 [1] #> vctrs 0.3.8 2021-04-29 [1] #> withr 2.4.2 2021-04-18 [1] #> xfun 0.25 2021-08-06 [1] #> yaml 2.2.1 2020-02-01 [1] #> source #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> standard (@2.0.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> standard (@1.0.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> local #> Github (mlr-org/mlr3misc@e02079a) #> local #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> Github (HenrikBengtsson/parallelly@39c2056) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> standard (@0.3.4) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> standard (@2.1.2) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> Github (rstudio/rmarkdown@c8c502b) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> Github (r-spatial/stars@4bfedd7) #> CRAN (R 4.1.1) #> standard (@1.4.0) #> Github (r-lib/styler@2b1eeac) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> standard (@2.2.1) #> #> [1] /Users/pjs/Library/R/x86_64/4.1/library #> [2] /Library/Frameworks/R.framework/Versions/4.1/Resources/library ```
edzer commented 3 years ago

That's because predict.stars seems to be never called - how would stars know it needs to call split() on the object?

> predict(e1071svm, stack_stars)
Error in eval(predvars, data, env) : object 'x_2' not found
> traceback()
8: eval(predvars, data, env)
7: eval(predvars, data, env)
6: model.frame.default(object, data, xlev = xlev)
5: model.frame(object, data, xlev = xlev)
4: model.matrix.default(delete.response(terms(object)), as.data.frame(newdata))
3: model.matrix(delete.response(terms(object)), as.data.frame(newdata))
2: predict.svm(e1071svm, stack_stars)
1: predict(e1071svm, stack_stars)
pat-s commented 3 years ago

Ah well, that was a dumb ordering mistake by me - ofc the stars object needs to be first in line.

Yet I get the following then

predict(stack_stars, e1071svm)
prediction on the entire object failed; will try to split() bands over attributes
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : 
  cannot coerce class ‘"try-error"’ to a data.frame
Backtrace:
1: stop(gettextf("cannot coerce class %s to a data.frame", sQuote(deparse(class(x))[1L])), 2: as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors)
3: as.data.frame(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors)
4: data.frame(prediction = pr)
5: predict.stars(stack_stars, e1071svm)
6: predict(stack_stars, e1071svm)

Could it be that predict(split(object), model, ..., drop_dimensions = drop_dimensions) should be predict(split(object, "band"), model, ..., drop_dimensions = drop_dimensions) to make it work?

edzer commented 3 years ago

Ah, yes, recursion always needs care. This should do it - pls try.

pat-s commented 3 years ago

Looking good!

I am wondering in which case "prediction on the entire object" would succeed here. I got to admit that I am also having a bit of trouble with the terminology used in the message, especially "entire object". In this particular case I would assume that "the entire object" would make use of all attributes of the object and hence succeed - or would that mean here that only the first attribute would be used unless split() is applied? Maybe a refinement of the message using the terms "dimension", "attribute" and "columns" could help clarify from a users perspective?

library(mlr3spatial)
#> Loading required package: mlr3
# remotes::install_github("mlr-org/mlr3spatial")

library(mlr3)
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.9.1, GDAL 3.3.1, PROJ 8.1.0

stack_rasterbrick <- demo_stack_rasterbrick(size = 1, layers = 5)
stack_stars <- stars::st_as_stars(stack_rasterbrick)
backend_stars <- as_stars_backend(stack_stars, quiet = TRUE, response = "y.1", response_is_factor = TRUE)
task_stars <- as_task_classif(backend_stars, target = "y.1", positive = "1")

set.seed(42)
row_ids <- sample(1:task_stars$nrow, 500)
e1071svm <- e1071::svm(y.1 ~ ., task_stars$data(rows = row_ids))

predict(stack_stars, e1071svm)
#> prediction on the entire object failed; will try to split() bands over attributes
#> stars object with 2 dimensions and 1 attribute
#> attribute(s):
#>  prediction 
#>  1:28126    
#>  0:21603    
#> dimension(s):
#>   from  to offset      delta refsys point values x/y
#> x    1 223      0  0.0044843     NA    NA   NULL [x]
#> y    1 223      1 -0.0044843     NA    NA   NULL [y]

Created on 2021-08-31 by the reprex package (v2.0.1)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.1 Patched (2021-08-27 r80829) #> os macOS Big Sur 10.16 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/Zurich #> date 2021-08-31 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib #> abind * 1.4-5 2016-07-21 [1] #> assertthat 0.2.1 2019-03-21 [1] #> backports 1.2.1 2020-12-09 [1] #> checkmate 2.0.0 2020-02-06 [1] #> class 7.3-19 2021-05-03 [2] #> classInt 0.4-3 2020-04-07 [1] #> cli 3.0.1 2021-07-17 [1] #> codetools 0.2-18 2020-11-04 [2] #> crayon 1.4.1 2021-02-08 [1] #> data.table 1.14.0 2021-02-21 [1] #> DBI 1.1.1 2021-01-15 [1] #> digest 0.6.27 2020-10-24 [1] #> dplyr 1.0.7 2021-06-18 [1] #> e1071 1.7-8 2021-07-28 [1] #> ellipsis 0.3.2 2021-04-29 [1] #> evaluate 0.14 2019-05-28 [1] #> fansi 0.5.0 2021-05-25 [1] #> fastmap 1.1.0 2021-01-25 [1] #> fs 1.5.0 2020-07-31 [1] #> future 1.22.1 2021-08-25 [1] #> generics 0.1.0 2020-10-31 [1] #> globals 0.14.0 2020-11-22 [1] #> glue 1.4.2 2020-08-27 [1] #> highr 0.9 2021-04-16 [1] #> htmltools 0.5.2 2021-08-25 [1] #> KernSmooth 2.23-20 2021-05-03 [2] #> knitr 1.33 2021-04-24 [1] #> lattice 0.20-44 2021-05-02 [2] #> lgr 0.4.2 2021-01-10 [1] #> lifecycle 1.0.0 2021-02-15 [1] #> listenv 0.8.0 2019-12-05 [1] #> lwgeom 0.2-7 2021-07-28 [1] #> magrittr 2.0.1 2020-11-17 [1] #> mlr3 * 0.12.0-9000 2021-08-26 [1] #> mlr3misc 0.9.3.9000 2021-08-23 [1] #> mlr3spatial * 0.0.0.9000 2021-08-28 [1] #> palmerpenguins 0.1.0 2020-07-23 [1] #> paradox 0.7.1 2021-03-07 [1] #> parallelly 1.27.0-9002 2021-08-30 [1] #> pillar 1.6.2 2021-07-29 [1] #> pkgconfig 2.0.3 2019-09-22 [1] #> proxy 0.4-26 2021-06-07 [1] #> purrr 0.3.4 2020-04-17 [1] #> R.cache 0.15.0 2021-04-30 [1] #> R.methodsS3 1.8.1 2020-08-26 [1] #> R.oo 1.24.0 2020-08-26 [1] #> R.utils 2.10.1 2020-08-26 [1] #> R6 2.5.1 2021-08-19 [1] #> raster 3.4-13 2021-06-18 [1] #> Rcpp 1.0.7 2021-07-07 [1] #> rematch2 2.1.2 2020-05-01 [1] #> reprex 2.0.1 2021-08-05 [1] #> rlang 0.4.11 2021-04-30 [1] #> rmarkdown 2.10.6 2021-08-30 [1] #> rstudioapi 0.13 2020-11-12 [1] #> sessioninfo 1.1.1 2018-11-05 [1] #> sf * 1.0-2 2021-07-26 [1] #> sp 1.4-5 2021-01-10 [1] #> stars * 0.5-4 2021-08-31 [1] #> stringi 1.7.4 2021-08-25 [1] #> stringr 1.4.0 2019-02-10 [1] #> styler 1.5.1.9001 2021-08-23 [1] #> tibble 3.1.4 2021-08-25 [1] #> tidyselect 1.1.1 2021-04-30 [1] #> units 0.7-2 2021-06-08 [1] #> utf8 1.2.2 2021-07-24 [1] #> uuid 0.1-4 2020-02-26 [1] #> vctrs 0.3.8 2021-04-29 [1] #> withr 2.4.2 2021-04-18 [1] #> xfun 0.25 2021-08-06 [1] #> yaml 2.2.1 2020-02-01 [1] #> source #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> standard (@2.0.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> standard (@1.0.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> local #> Github (mlr-org/mlr3misc@e02079a) #> local #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> Github (HenrikBengtsson/parallelly@39c2056) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> standard (@0.3.4) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> standard (@2.1.2) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> Github (rstudio/rmarkdown@c8c502b) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> Github (r-spatial/stars@89b74ae) #> CRAN (R 4.1.1) #> standard (@1.4.0) #> Github (r-lib/styler@2b1eeac) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> standard (@2.2.1) #> #> [1] /Users/pjs/Library/R/x86_64/4.1/library #> [2] /Library/Frameworks/R.framework/Versions/4.1/Resources/library ```
edzer commented 3 years ago

This now gives the message

prediction on array(s) `x_1' failed; will try to split() dimension `band' over attributes
pat-s commented 3 years ago

Thanks, this is more descriptive for users I'd say :)

Happy with the current setup, hence closing here.