Prediction on multiband rasters

pat-s commented 3 years ago

AFAICS there is no way around to "split" a multiband raster, i.e. convert it to a data.frame, if one wants to predict on it - is that correct?

In ?predict.stars I read:

separate predictors in object need to be separate attributes in object; in case they are e.g. in a band dimension, use 'split(object)'

I've found this part hard to understand in the first place when getting started with {stars}. It became clearer after looking at the modelling vignette, especially at section "stars objects as data.frame" (just as a feedback).

Both terra::predict() and raster::predict() seem to do this internally, i.e. if the predictors are stored in multiple bands, users can just pass the stack and they will be found/used.

Is it likely that this will stay like this in the future? Asking because I am currently about to add spatial prediction support in {mlr3spatial}. I was wondering if potentially {stars} could detect a multiband raster and try to do the split internally - the model should then complain anyhow if some variables are missing when doing the final predict call.

edzer commented 3 years ago

The terra data model is much simpler: it doesn't have attributes, and only copes with stacks of raster - whatever the stack "dimension" represents.

I'd like to make this simpler, but suppose you have a time series of spectral imagery, and time and spectral are both dimensions (meaning you have a single attribute, four-dimensional data cube) - how could a predict know what the different predictors are: the time instances, the spectral bands, or all unique combinations of time x band?

pat-s commented 3 years ago

Yes I see the issues.

I haven't yet dived into the multi-dim capabilities of {stars}, e.g. a variable with observations across multiple timestamps. For now, I was solely focusing on getting spatial-only prediction right. Maybe one option would be to let {stars} try some "smart" defaults, based on the dimension metadata information it has and report this during the call (something like "Detected XY, using Z as TZ" and so forth?

Some potentially helpful ideas:

If no time or only one time step is found (i.e. spatial-only case), stars::predict() could just focus on the spatial dimension and try to "split" these internally and then do the prediction.
If time is present (i.e. multiple time steps), users could be requested to specify the time step to predict to - all others are included as predictors in a generic naming scheme (whether the auto-generated variable names will match with the model is a different thing)

I also see the advantages of unfolding the stars object into a data.frame first and then do predict.data.frame() and handing the variable curation over to the user. In {mlr3spatial} the return would be a {mlr3::Prediction} object and optionally a stars object written to disk - in {mlr3spatial} we anyhow do the prediction on a data.table internally, also for {terra} and {raster} objects. The use case for me only came up as I am about to benchmark predict() calls for various spatial packages and stumbled over the multi-band predict issue mentioned here :)

edzer commented 3 years ago

I agree that the simple case: a 3 D, single attr cube, where the third needs to be split for predict, could be a good default (with a message, and after trying & failing to call predict on that single attribute), and will look into that.

edzer commented 3 years ago

This should do it; would be great if you could check!

pat-s commented 3 years ago

Thanks!

In my reprex it does not work yet:

library(mlr3spatial)
#> Loading required package: mlr3
# remotes::install_github("mlr-org/mlr3spatial")

library(mlr3)
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.9.1, GDAL 3.3.1, PROJ 8.1.0

stack_rasterbrick <- demo_stack_rasterbrick(size = 1, layers = 5)
stack_stars <- stars::st_as_stars(stack_rasterbrick)
backend_stars <- as_stars_backend(stack_stars, quiet = TRUE, response = "y.1", response_is_factor = TRUE)
task_stars <- as_task_classif(backend_stars, target = "y.1", positive = "1")

set.seed(42)
row_ids <- sample(1:task_stars$nrow, 500)
e1071svm <- e1071::svm(y.1 ~ ., task_stars$data(rows = row_ids))

predict(e1071svm, stack_stars)
#> Error in eval(predvars, data, env): object 'x_2' not found

# works
stack_stars_df <- as.data.frame(split(stack_stars, "band"))
head(predict(e1071svm, stack_stars_df))
#> 1 2 3 4 5 6 
#> 1 1 1 1 1 1 
#> Levels: 1 0

^{Created on 2021-08-31 by the reprex package (v2.0.1)}

Session info

``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.1 Patched (2021-08-27 r80829) #> os macOS Big Sur 10.16 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/Zurich #> date 2021-08-31 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib #> abind * 1.4-5 2016-07-21 [1] #> assertthat 0.2.1 2019-03-21 [1] #> backports 1.2.1 2020-12-09 [1] #> checkmate 2.0.0 2020-02-06 [1] #> class 7.3-19 2021-05-03 [2] #> classInt 0.4-3 2020-04-07 [1] #> cli 3.0.1 2021-07-17 [1] #> codetools 0.2-18 2020-11-04 [2] #> crayon 1.4.1 2021-02-08 [1] #> data.table 1.14.0 2021-02-21 [1] #> DBI 1.1.1 2021-01-15 [1] #> digest 0.6.27 2020-10-24 [1] #> dplyr 1.0.7 2021-06-18 [1] #> e1071 1.7-8 2021-07-28 [1] #> ellipsis 0.3.2 2021-04-29 [1] #> evaluate 0.14 2019-05-28 [1] #> fansi 0.5.0 2021-05-25 [1] #> fastmap 1.1.0 2021-01-25 [1] #> fs 1.5.0 2020-07-31 [1] #> future 1.22.1 2021-08-25 [1] #> generics 0.1.0 2020-10-31 [1] #> globals 0.14.0 2020-11-22 [1] #> glue 1.4.2 2020-08-27 [1] #> highr 0.9 2021-04-16 [1] #> htmltools 0.5.2 2021-08-25 [1] #> KernSmooth 2.23-20 2021-05-03 [2] #> knitr 1.33 2021-04-24 [1] #> lattice 0.20-44 2021-05-02 [2] #> lgr 0.4.2 2021-01-10 [1] #> lifecycle 1.0.0 2021-02-15 [1] #> listenv 0.8.0 2019-12-05 [1] #> lwgeom 0.2-7 2021-07-28 [1] #> magrittr 2.0.1 2020-11-17 [1] #> mlr3 * 0.12.0-9000 2021-08-26 [1] #> mlr3misc 0.9.3.9000 2021-08-23 [1] #> mlr3spatial * 0.0.0.9000 2021-08-28 [1] #> palmerpenguins 0.1.0 2020-07-23 [1] #> paradox 0.7.1 2021-03-07 [1] #> parallelly 1.27.0-9002 2021-08-30 [1] #> pillar 1.6.2 2021-07-29 [1] #> pkgconfig 2.0.3 2019-09-22 [1] #> proxy 0.4-26 2021-06-07 [1] #> purrr 0.3.4 2020-04-17 [1] #> R.cache 0.15.0 2021-04-30 [1] #> R.methodsS3 1.8.1 2020-08-26 [1] #> R.oo 1.24.0 2020-08-26 [1] #> R.utils 2.10.1 2020-08-26 [1] #> R6 2.5.1 2021-08-19 [1] #> raster 3.4-13 2021-06-18 [1] #> Rcpp 1.0.7 2021-07-07 [1] #> rematch2 2.1.2 2020-05-01 [1] #> reprex 2.0.1 2021-08-05 [1] #> rlang 0.4.11 2021-04-30 [1] #> rmarkdown 2.10.6 2021-08-30 [1] #> rstudioapi 0.13 2020-11-12 [1] #> sessioninfo 1.1.1 2018-11-05 [1] #> sf * 1.0-2 2021-07-26 [1] #> sp 1.4-5 2021-01-10 [1] #> stars * 0.5-4 2021-08-31 [1] #> stringi 1.7.4 2021-08-25 [1] #> stringr 1.4.0 2019-02-10 [1] #> styler 1.5.1.9001 2021-08-23 [1] #> tibble 3.1.4 2021-08-25 [1] #> tidyselect 1.1.1 2021-04-30 [1] #> units 0.7-2 2021-06-08 [1] #> utf8 1.2.2 2021-07-24 [1] #> uuid 0.1-4 2020-02-26 [1] #> vctrs 0.3.8 2021-04-29 [1] #> withr 2.4.2 2021-04-18 [1] #> xfun 0.25 2021-08-06 [1] #> yaml 2.2.1 2020-02-01 [1] #> source #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> standard (@2.0.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> standard (@1.0.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> local #> Github (mlr-org/mlr3misc@e02079a) #> local #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> Github (HenrikBengtsson/parallelly@39c2056) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> standard (@0.3.4) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> standard (@2.1.2) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> Github (rstudio/rmarkdown@c8c502b) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> Github (r-spatial/stars@4bfedd7) #> CRAN (R 4.1.1) #> standard (@1.4.0) #> Github (r-lib/styler@2b1eeac) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> standard (@2.2.1) #> #> [1] /Users/pjs/Library/R/x86_64/4.1/library #> [2] /Library/Frameworks/R.framework/Versions/4.1/Resources/library ```

edzer commented 3 years ago

That's because predict.stars seems to be never called - how would stars know it needs to call split() on the object?

> predict(e1071svm, stack_stars)
Error in eval(predvars, data, env) : object 'x_2' not found
> traceback()
8: eval(predvars, data, env)
7: eval(predvars, data, env)
6: model.frame.default(object, data, xlev = xlev)
5: model.frame(object, data, xlev = xlev)
4: model.matrix.default(delete.response(terms(object)), as.data.frame(newdata))
3: model.matrix(delete.response(terms(object)), as.data.frame(newdata))
2: predict.svm(e1071svm, stack_stars)
1: predict(e1071svm, stack_stars)

pat-s commented 3 years ago

Ah well, that was a dumb ordering mistake by me - ofc the stars object needs to be first in line.

Yet I get the following then

predict(stack_stars, e1071svm)
prediction on the entire object failed; will try to split() bands over attributes
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : 
  cannot coerce class ‘"try-error"’ to a data.frame
Backtrace:
1: stop(gettextf("cannot coerce class %s to a data.frame", sQuote(deparse(class(x))[1L])), 2: as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors)
3: as.data.frame(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors)
4: data.frame(prediction = pr)
5: predict.stars(stack_stars, e1071svm)
6: predict(stack_stars, e1071svm)

Could it be that predict(split(object), model, ..., drop_dimensions = drop_dimensions) should be predict(split(object, "band"), model, ..., drop_dimensions = drop_dimensions) to make it work?

edzer commented 3 years ago

Ah, yes, recursion always needs care. This should do it - pls try.

pat-s commented 3 years ago

Looking good!

I am wondering in which case "prediction on the entire object" would succeed here. I got to admit that I am also having a bit of trouble with the terminology used in the message, especially "entire object". In this particular case I would assume that "the entire object" would make use of all attributes of the object and hence succeed - or would that mean here that only the first attribute would be used unless split() is applied? Maybe a refinement of the message using the terms "dimension", "attribute" and "columns" could help clarify from a users perspective?

library(mlr3spatial)
#> Loading required package: mlr3
# remotes::install_github("mlr-org/mlr3spatial")

library(mlr3)
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.9.1, GDAL 3.3.1, PROJ 8.1.0

stack_rasterbrick <- demo_stack_rasterbrick(size = 1, layers = 5)
stack_stars <- stars::st_as_stars(stack_rasterbrick)
backend_stars <- as_stars_backend(stack_stars, quiet = TRUE, response = "y.1", response_is_factor = TRUE)
task_stars <- as_task_classif(backend_stars, target = "y.1", positive = "1")

set.seed(42)
row_ids <- sample(1:task_stars$nrow, 500)
e1071svm <- e1071::svm(y.1 ~ ., task_stars$data(rows = row_ids))

predict(stack_stars, e1071svm)
#> prediction on the entire object failed; will try to split() bands over attributes
#> stars object with 2 dimensions and 1 attribute
#> attribute(s):
#>  prediction 
#>  1:28126    
#>  0:21603    
#> dimension(s):
#>   from  to offset      delta refsys point values x/y
#> x    1 223      0  0.0044843     NA    NA   NULL [x]
#> y    1 223      1 -0.0044843     NA    NA   NULL [y]

^{Created on 2021-08-31 by the reprex package (v2.0.1)}

Session info

``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.1 Patched (2021-08-27 r80829) #> os macOS Big Sur 10.16 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/Zurich #> date 2021-08-31 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib #> abind * 1.4-5 2016-07-21 [1] #> assertthat 0.2.1 2019-03-21 [1] #> backports 1.2.1 2020-12-09 [1] #> checkmate 2.0.0 2020-02-06 [1] #> class 7.3-19 2021-05-03 [2] #> classInt 0.4-3 2020-04-07 [1] #> cli 3.0.1 2021-07-17 [1] #> codetools 0.2-18 2020-11-04 [2] #> crayon 1.4.1 2021-02-08 [1] #> data.table 1.14.0 2021-02-21 [1] #> DBI 1.1.1 2021-01-15 [1] #> digest 0.6.27 2020-10-24 [1] #> dplyr 1.0.7 2021-06-18 [1] #> e1071 1.7-8 2021-07-28 [1] #> ellipsis 0.3.2 2021-04-29 [1] #> evaluate 0.14 2019-05-28 [1] #> fansi 0.5.0 2021-05-25 [1] #> fastmap 1.1.0 2021-01-25 [1] #> fs 1.5.0 2020-07-31 [1] #> future 1.22.1 2021-08-25 [1] #> generics 0.1.0 2020-10-31 [1] #> globals 0.14.0 2020-11-22 [1] #> glue 1.4.2 2020-08-27 [1] #> highr 0.9 2021-04-16 [1] #> htmltools 0.5.2 2021-08-25 [1] #> KernSmooth 2.23-20 2021-05-03 [2] #> knitr 1.33 2021-04-24 [1] #> lattice 0.20-44 2021-05-02 [2] #> lgr 0.4.2 2021-01-10 [1] #> lifecycle 1.0.0 2021-02-15 [1] #> listenv 0.8.0 2019-12-05 [1] #> lwgeom 0.2-7 2021-07-28 [1] #> magrittr 2.0.1 2020-11-17 [1] #> mlr3 * 0.12.0-9000 2021-08-26 [1] #> mlr3misc 0.9.3.9000 2021-08-23 [1] #> mlr3spatial * 0.0.0.9000 2021-08-28 [1] #> palmerpenguins 0.1.0 2020-07-23 [1] #> paradox 0.7.1 2021-03-07 [1] #> parallelly 1.27.0-9002 2021-08-30 [1] #> pillar 1.6.2 2021-07-29 [1] #> pkgconfig 2.0.3 2019-09-22 [1] #> proxy 0.4-26 2021-06-07 [1] #> purrr 0.3.4 2020-04-17 [1] #> R.cache 0.15.0 2021-04-30 [1] #> R.methodsS3 1.8.1 2020-08-26 [1] #> R.oo 1.24.0 2020-08-26 [1] #> R.utils 2.10.1 2020-08-26 [1] #> R6 2.5.1 2021-08-19 [1] #> raster 3.4-13 2021-06-18 [1] #> Rcpp 1.0.7 2021-07-07 [1] #> rematch2 2.1.2 2020-05-01 [1] #> reprex 2.0.1 2021-08-05 [1] #> rlang 0.4.11 2021-04-30 [1] #> rmarkdown 2.10.6 2021-08-30 [1] #> rstudioapi 0.13 2020-11-12 [1] #> sessioninfo 1.1.1 2018-11-05 [1] #> sf * 1.0-2 2021-07-26 [1] #> sp 1.4-5 2021-01-10 [1] #> stars * 0.5-4 2021-08-31 [1] #> stringi 1.7.4 2021-08-25 [1] #> stringr 1.4.0 2019-02-10 [1] #> styler 1.5.1.9001 2021-08-23 [1] #> tibble 3.1.4 2021-08-25 [1] #> tidyselect 1.1.1 2021-04-30 [1] #> units 0.7-2 2021-06-08 [1] #> utf8 1.2.2 2021-07-24 [1] #> uuid 0.1-4 2020-02-26 [1] #> vctrs 0.3.8 2021-04-29 [1] #> withr 2.4.2 2021-04-18 [1] #> xfun 0.25 2021-08-06 [1] #> yaml 2.2.1 2020-02-01 [1] #> source #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> standard (@2.0.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> standard (@1.0.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> local #> Github (mlr-org/mlr3misc@e02079a) #> local #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> Github (HenrikBengtsson/parallelly@39c2056) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> standard (@0.3.4) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> standard (@2.1.2) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> Github (rstudio/rmarkdown@c8c502b) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> Github (r-spatial/stars@89b74ae) #> CRAN (R 4.1.1) #> standard (@1.4.0) #> Github (r-lib/styler@2b1eeac) #> CRAN (R 4.1.1) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> CRAN (R 4.1.0) #> standard (@2.2.1) #> #> [1] /Users/pjs/Library/R/x86_64/4.1/library #> [2] /Library/Frameworks/R.framework/Versions/4.1/Resources/library ```

edzer commented 3 years ago

This now gives the message

prediction on array(s) `x_1' failed; will try to split() dimension `band' over attributes

pat-s commented 3 years ago

Thanks, this is more descriptive for users I'd say :)

Happy with the current setup, hence closing here.

r-spatial / stars

Prediction on multiband rasters #448