r-spatial / rgee

Google Earth Engine for R
https://r-spatial.github.io/rgee/
Other
677 stars 146 forks source link

Request payload size exceeds the limit:41943304 bytes / Execution failed; out of memory #220

Closed Tai-Rocha closed 2 years ago

Tai-Rocha commented 2 years ago

Hi everyone, I followed this reproducible example to get some statistics (mean, stdDEv, min and max values) for each square of polygon based on NASEDEM elevation data (30-meter resolution).


library(rgee)
library(raster)
library(purrr)

dat <- structure(list(ID = 758432:758443, 
                      lat = c(24.875, 24.875, 24.625, 24.625, 24.875, 24.875, 24.625, 24.625, 24.375, 24.375, 24.125, 24.125), 
                      lon = c(72.875, 72.625, 72.625, 72.875, 72.375, 72.125, 72.125, 72.375, 72.375, 72.125, 72.125, 72.375)), 
                 class = "data.frame", row.names = c(NA, -12L))

dat_rast <- rasterFromXYZ(dat[, c('lon', 'lat', 'ID')], crs = '+proj=longlat +datum=WGS84 +no_defs')
dat_poly <- rasterToPolygons(dat_rast, fun=NULL, na.rm=TRUE, dissolve=FALSE)

# Initialize Earth Engine
ee_Initialize()

# A few days for test
startDate = ee$Date('2020-01-01');
endDate = ee$Date('2020-01-10');

# Open dataset
ImageCollection = ee$ImageCollection('NASA/NEX-GDDP')$filter(ee$Filter$date(startDate, endDate))#$filterBounds(polygonsCollection)

# Polygons collection
coords <- as.data.frame(raster::geom(dat_poly))
polygonsFeatures <- coords %>% split(.$object) %>% purrr::map(~{  
  ee$Feature(ee$Geometry$Polygon(mapply( function(x,y){list(x,y)} ,.x$x,.x$y,SIMPLIFY=F)))
})

polygonsCollection = ee$FeatureCollection(unname(polygonsFeatures))
Map$addLayer(polygonsCollection)

# Get list of images (1 per day)
ListOfImages = ImageCollection$toList(ImageCollection$size());

# first image
image <- ee$Image(ListOfImages$get(0))

# Add the mean of each band as new properties of each polygon
Means = image$reduceRegions(collection = polygonsCollection,reducer= ee$Reducer$mean())
Means$getInfo()

# Export

task_vector <- ee_table_to_drive(
  collection = Means,
  fileFormat = "CSV",
  fileNamePrefix = "test"
)
task_vector$start()
ee_monitoring(task_vector)

My issue is: I have millions of squares and I need to get statistics (mean, stdDEv, max , min values, Morans I) for each square. Also, the NASADEM elevation raster is a high resolution. When I try for a polygon with a few squares it works like this example. However, not work for polygons with many squares. I've been trying many ways (reduceRegion, map). But every time I have an error related to memory when I try to export, like

ERROR in Earth Engine servers: Execution failed; out of memory.
Error in ee_monitoring(task_vector) : 
  ee_monitoring was forced to stop before getting results

or

Request payload size exceeds the limit:41943304 bytes"

Does anyone know how can I get the statistics for large data (polygons with many squares) using GEE API and rgee??

Thanks in advance!

csaybar commented 2 years ago

Hi @Tai-Rocha, you must extract statistics by batches. https://r-spatial.github.io/rgee/articles/rgee03.html#reduceregion-vs--reduceregions-vs--for-loop

library(sf)
library(rgee)
library(raster)
library(foreach)
library(rgeeExtra)

ee_Initialize()

# Toy geometry ------------------------------------------------------------
dat <- structure(list(ID = 758432:758443, 
                      lat = c(24.875, 24.875, 24.625, 24.625, 24.875, 24.875, 24.625, 24.625, 24.375, 24.375, 24.125, 24.125), 
                      lon = c(72.875, 72.625, 72.625, 72.875, 72.375, 72.125, 72.125, 72.375, 72.375, 72.125, 72.125, 72.375)), 
                 class = "data.frame", row.names = c(NA, -12L))
dat_rast <- rasterFromXYZ(dat[, c('lon', 'lat', 'ID')], crs = '+proj=longlat +datum=WGS84 +no_defs')
dat_poly <- rasterToPolygons(dat_rast, fun=NULL, na.rm=TRUE, dissolve=FALSE)
startDate = ee$Date('2020-01-01');
endDate = ee$Date('2020-01-10');

# Load the dataset --------------------------------------------------------
ImageCollection = ee$ImageCollection('NASA/NEX-GDDP') %>% 
  ee$ImageCollection$filter(ee$Filter$date(startDate, endDate))
image <- ImageCollection[[1]]

# Statistics in batches ---------------------------------------------------
statistics <- foreach (index=seq_along(dat_poly), .combine = rbind) %do% {
  ee_poly <- dat_poly[index,] %>%
    st_as_sfc() %>% 
    sf_as_ee()
  ee_extract(image, ee_poly, ee$Reducer$mean(), sf = TRUE)
}
statistics
Tai-Rocha commented 2 years ago

@csaybar Thank you so much for the quick reply! I used the batches and unfortunately, after 12 hours running, I got the same error...

rgee

ambarja commented 2 years ago

Hi @Tai-Rocha here you have a reproducible example, but remember that it is necessary to simplify the geomeries when our dataset has multiple polygons, because they send the request to the "api earth engine" in json format and if the polygons have multiple geometries, the request is blocked, because the size of the json is huge, the best practice is to use st_simplify (sf) or ms_simplify (rmapshaper).

library(rgee)
library(sf)
library(tidyverse)
library(rmapshaper)
library(pryr)
ee_Initialize()

# 1. Reading vector layer ---------------------------------------- 
peru <- st_read(
  "https://github.com/ambarja/gpkg-pe/raw/main/peru_level3.gpkg"
  )

peru_simply <-  peru %>% 
  ms_simplify(keep = 0.001,keep_shapes = TRUE)

object_size(peru)
object_size(peru_simply)

# 2. sf object to featurecollection -------------------------------
peru_ee <- peru_simply %>% 
  sf_as_ee()

# 3. ImageCollection of interest ----------------------------------
im <- ee$Image('WWF/HydroSHEDS/03VFDEM')$
  select('b1')$
  rename("elevation")

# 4. Extract information using a zonal statistic -------------------
ee_mean_data <- function(x, y, by = 1000,scale = 1000) {
  y_len <- y$size()$getInfo()
  for (i in seq(1, y_len, by)) {
    index <- i - 1
    print(sprintf("Extracting information [%s/%s]...", index, y_len))
    ee_value_layer <- ee$FeatureCollection(y) %>%
      ee$FeatureCollection$toList(by, index) %>%
      ee$FeatureCollection()
    if (i == 1) {
      dataset <- ee_extract(
        x = x,
        fun = ee$Reducer$mean(),
        y = ee_value_layer,
        scale = scale,
        sf = T
      )
    } else {
      db_local <- ee_extract(
        x = x,
        y = ee_value_layer,
        fun = ee$Reducer$mean(),
        scale = scale,
        sf = T
      ) 
      dataset <- rbind(dataset, db_local)
    }
  }
  return(dataset)
}

statistics <-  ee_mean_data(x = im,y = peru_ee)
> statistics <-  ee_mean_data(x = im,y = peru_ee)
[1] "Extracting information [0/1874]..."
[1] "Extracting information [1000/1874]..."
> statistics
Simple feature collection with 1874 features and 5 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: -81.2978 ymin: -18.31278 xmax: -68.65446 ymax: -0.038606
Geodetic CRS:  WGS 84
First 10 features:
      NOMBDEP     NOMBDIST   NOMBPROV UBIGEO   elevation                       geometry
1   CAJAMARCA     GUZMANGO  CONTUMAZA 060504 2789.700467 MULTIPOLYGON (((-78.90743 -...
2   CAJAMARCA     EL PRADO SAN MIGUEL 061105 2846.581242 MULTIPOLYGON (((-79.00477 -...
3   CAJAMARCA       NIEPOS SAN MIGUEL 061109 2269.360544 MULTIPOLYGON (((-79.20812 -...
4   CAJAMARCA SAN GREGORIO SAN MIGUEL 061110  943.728828 MULTIPOLYGON (((-79.07362 -...
5   CAJAMARCA     SAN LUIS  SAN PABLO 061203 1329.366966 MULTIPOLYGON (((-78.87211 -...
6  LAMBAYEQUE         ETEN   CHICLAYO 140103   19.698623 MULTIPOLYGON (((-79.83994 -...
7  LAMBAYEQUE  LA VICTORIA   CHICLAYO 140106   19.517739 MULTIPOLYGON (((-79.82947 -...
8  LAMBAYEQUE      LAGUNAS   CHICLAYO 140107   44.651568 MULTIPOLYGON (((-79.64658 -...
9  LAMBAYEQUE      MONSEFU   CHICLAYO 140108   14.909555 MULTIPOLYGON (((-79.77548 -...
10 LAMBAYEQUE   SANTA ROSA   CHICLAYO 140114    5.920391 MULTIPOLYGON (((-79.88307 -...
csaybar commented 2 years ago

I will close this issue since it is not directly related to rgee. https://groups.google.com/g/google-earth-engine-developers would be a better place to issues related to the client-server model. Thanks @ambarja :rocket:

Tai-Rocha commented 2 years ago

Thank you @csaybar and @ambarja ! I tried the suggestion of @ambarja but I'm still receiving the error. Now I am almost sure that the easier way is cut the geometry and do the statistics for small polygons (with fewer squares) ...