r-tidy-remote-sensing / tidyrgee

Create tidyverse methods for dealing with GEE image and imageCollections.
Other
48 stars 3 forks source link

error with `tidyee_ob |>group_by(a,b,c) |> summarise(stat=stat)` when grouping creates >n groups #29

Open zackarno opened 2 years ago

zackarno commented 2 years ago

When your grouping creates a huge number of groups to summarise the tidyee object over there seems to be an issue. This wont happen with typical group_by(year) or group_by(year,month) work flows, but can happen if you include doy in the grouping. I have not figured out the limit # of groups or the exact source of the problem, but the reprex below shows the issue and gets passed the first error message and onto the next.

library(tidyrgee)

library(rgee)
ee_Initialize()
#> -- rgee 1.1.2.9000 ---------------------------------- earthengine-api 0.1.295 -- 
#>  v user: not_defined
#>  v Initializing Google Earth Engine: v Initializing Google Earth Engine:  DONE!
#> --------------------------------------------------------------------------------
ic <- ee$ImageCollection("COPERNICUS/S5P/OFFL/L3_NO2")
ic_tidy <- as_tidyee(ic)
ic_tidy
#> band names: [ NO2_column_number_density, tropospheric_NO2_column_number_density, stratospheric_NO2_column_number_density, NO2_slant_column_number_density, tropopause_pressure, absorbing_aerosol_index, cloud_fraction, sensor_altitude, sensor_azimuth_angle, sensor_zenith_angle, solar_azimuth_angle, solar_zenith_angle ] 
#> 
#> $ee_ob
#> EarthEngine Object: ImageCollection
#> $vrt
#> # A tibble: 21,185 x 8
#>    id           time_start          syste~1 date       month  year   doy band_~2
#>    <chr>        <dttm>              <chr>   <date>     <dbl> <dbl> <dbl> <list> 
#>  1 COPERNICUS/~ 2018-06-28 10:45:42 201806~ 2018-06-28     6  2018   179 <chr>  
#>  2 COPERNICUS/~ 2018-06-28 12:27:12 201806~ 2018-06-28     6  2018   179 <chr>  
#>  3 COPERNICUS/~ 2018-06-28 14:52:09 201806~ 2018-06-28     6  2018   179 <chr>  
#>  4 COPERNICUS/~ 2018-06-28 15:50:11 201806~ 2018-06-28     6  2018   179 <chr>  
#>  5 COPERNICUS/~ 2018-06-28 17:31:41 201806~ 2018-06-28     6  2018   179 <chr>  
#>  6 COPERNICUS/~ 2018-06-28 19:13:12 201806~ 2018-06-28     6  2018   179 <chr>  
#>  7 COPERNICUS/~ 2018-06-28 20:54:41 201806~ 2018-06-28     6  2018   179 <chr>  
#>  8 COPERNICUS/~ 2018-06-28 22:36:11 201806~ 2018-06-28     6  2018   179 <chr>  
#>  9 COPERNICUS/~ 2018-06-29 00:17:40 201806~ 2018-06-29     6  2018   180 <chr>  
#> 10 COPERNICUS/~ 2018-06-29 01:59:11 201806~ 2018-06-29     6  2018   180 <chr>  
#> # ... with 21,175 more rows, and abbreviated variable names 1: system_index,
#> #   2: band_names
#> # i Use `print(n = ...)` to see more rows
#> 
#> attr(,"class")
#> [1] "tidyee"

# the l3_NO2 ic has multiple records per day so I want to summarise by dat (i.e  year, month , doy)
# there is a silent failure going on here
ic_summarised_daily <- ic_tidy |>
  group_by(year, month,doy) |>
  summarise(stat = "mean")

# this often happens with `rgee` and thus `tidyrgee`... it seems like the best
# way to check if the object has been created successfully is to try a `$getInfo` call

ic_summarised_daily$ee_ob$first()$bandNames()$getInfo()
#> Error in py_call_impl(callable, dots$args, dots$keywords): RecursionError: maximum recursion depth exceeded in comparison

# okay a maximum recursion issue - seems reasonable.. under the hood we are splitting 
# the `vrt` and `ic` into thousands of groups... I can increase the recursion limit and see what 
# happens (default is 1000)

sys <-  reticulate::import("sys")
sys$setrecursionlimit(as.integer(5000))

# lets run `$getInfo()` again with the recursion limit increased....

ic_summarised_daily$ee_ob$first()$bandNames()$getInfo()
#> Error in py_call_impl(callable, dots$args, dots$keywords): ee.ee_exception.EEException: Collection.first: merge() is too deeply nested.

# we get a new error, which took alot longer to appear than the first.

Created on 2022-08-19 by the reprex package (v2.0.1)

zackarno commented 2 years ago

Something else I noticed - the issue seems to be occurring in this code (inside summarise_pixels)

tidyee_output <- .data |>
      group_split() |>
      purrr::map(
        ~ee_composite(
          .x |>
          group_by(!!!rlang::syms(group_vars_chr)),
          stat=stat)
        ) |>
      bind_ics()

It seems like it is occurring in the bind_ics() function rather than ee_composite() because if you remove bind_ics() from the above you can query the list of composite_ics with getInfo() without issues, for example:

tidyee_output <- .data |>
      group_split() |>
      purrr::map(
        ~ee_composite(
          .x |>
          group_by(!!!rlang::syms(group_vars_chr)),
          stat=stat)
        ) # removed bind_ics 

# no problem
 tidyee_output[[1]]$ee_ob$bandNames()$getInfo()

So it seems the issue is when the ics are merged - this is also suggested by the second error message in the reprex above