r-spatial / rgee

Google Earth Engine for R
https://r-spatial.github.io/rgee/
Other
677 stars 146 forks source link

Guidance on how to extract categorical raster data grouped by categories #199

Closed jordi-1 closed 2 years ago

jordi-1 commented 2 years ago

Description

Not sure if this is the right place to post this. I haven't been able to find this in the "examples" section. Apologies in advance if this is not the right place, please redirect me.

I am looking for a code template/tutorial allowing the grouped extraction of categorical raster data (example: Landuse-landcover classes) according to polygons in a shapefile. Specifically: I would like to calculate the surface occupied by each landcover class in a raster for elements of a polygon. say the raster has coded the terrain in 3 classes: class1, 2 and 3 i would like to produce a table polygon 1 : class1=X m2 , class2=Ym2, class3=Zm2 polygon 2 : class2 =X2m2, class2=...

how should i phrase the ee_extract() part of the code? and are there any specific operations to do before that ?

Thanks in advance,

Jordi

csaybar commented 2 years ago

hi @ambarja I think you were working on something similar can u share your code here. :pray:

jordi-1 commented 2 years ago

Initialize the Earth Engine module.

ee_Initialize()

-- rgee 1.1.2 ---------------------------------------------------------------------------------------------------------------------- earthengine-api 0.1.292 -- √ user: not_defined √ Google Drive credentials: FOUND √ Initializing Google Earth Engine: DONE! √ Earth Engine account: users/jordi-1

Print metadata for a DEM dataset.

print(ee$Image('USGS/SRTMGL1_003')$getInfo()) $type [1] "Image" [...et caetera ]


Attach your Python (reticulate) configuration:

  ```r
library(reticulate)
py_config()

python:         C:/Users/jordi1/miniconda3/envs/rgee_py/python.exe
libpython:      C:/Users/jordi1/miniconda3/envs/rgee_py/python39.dll
pythonhome:     C:/Users/jordi1/miniconda3/envs/rgee_py
version:        3.9.7 (default, Sep 16 2021, 16:59:28) [MSC v.1916 64 bit (AMD64)]
Architecture:   64bit
numpy:          C:/Users/jordi1/miniconda3/envs/rgee_py/Lib/site-packages/numpy
numpy_version:  1.21.5
ee:             C:\Users\jordi1\miniconda3\envs\rgee_py\lib\site-packages\ee\__init__.p

NOTE: Python version was forced by RETICULATE_PYTHON

Description

I have been able to advance a bit on the issue raised above, using reduceRegions() and the reducer ee$Reducer$frequencyHistogram() translating this example : https://stackoverflow.com/questions/62273165/how-to-flatten-featurecollection-to-a-table-when-reducing-regions

Apparently, I manage to end up with a feature collection which includes all I need.

But I can't seem to make work the part about "mapping over the collection to convert the dictionary to a collection of features" which is supposed to transform it into a proper table.

My "data" output consists of 3 character variables : "system.index" which seems to be the line number "histogram" which contains what I am looking for as a character : {30=660.5254901960784, 90=29.87450980392157, 122=247.7764705882353, 80=3.0, 116=743.7568627450981, 50=14.0, 114=10155.823529411766, 112=90764.11372549002, 126=6858.823529411764, 124=5132.537254901959, 20=490.06666666666666} ".geo" which apparently contains the whole geo description of the polygon.

how can i transform the reduceRegions() output into a proper table ?

What I Did


# region of interest from sf shapefile to ee feature collection
ee_x <- sf_as_ee(location_data)
ee_x_bounds <- sf_as_ee(st_as_sfc(st_bbox(location_data)))

lulc<- ee$ImageCollection("COPERNICUS/Landcover/100m/Proba-V-C3/Global") %>%
  ee$ImageCollection$filterDate("2015-01-01","2019-12-31")) %>%
  ee$ImageCollection$select("discrete_classification")

# selecting the first image for simplicity
im_lulc=lulc$first()

frequency = im_lulc$reduceRegions(
  collection = ee_x,
  reducer=ee$Reducer$frequencyHistogram(),
  scale=100
)$ select('histogram')

### HERE COMES THE MISSING PART WHICH I AM UNABLE TO ADAPT. There seems to be an issue defining the keys...
#.map(function (lossInRegionFeature) {
#    // Get the histogram dictionary from the feature produced by reduceRegions.
#    var forestLossDict = ee.Dictionary(lossInRegionFeature.get('histogram'));
#    // Make a FeatureCollection out of the dictionary.
#    return ee.FeatureCollection(
#      forestLossDict
#        .map(function (key, value) {
#          // Construct a feature from the dictionary entry.
#          return ee.Feature(null, {
#            'system:index': key,
#            'WDPAID': lossInRegionFeature.get('WDPAID'),
#           'year': key,
#            'loss': value});
#        })
#        // dict.map() returns a dictionary with newly computed values;
#        // we just want the values in a list, to make a collection of.
#        .values());
#  })
#  // Flatten the collection of collections returned by the map().
#  .flatten();

# Using rgee I/O
task <- ee_table_to_drive(
  collection = frequency,
  description = paste0("test"),
  fileFormat = "CSV"
)
task$start()
ee_monitoring(task)
exported_stats <- ee_drive_to_local(task = task,dsn = "exported_stats.csv")
data=read.csv(exported_stats)
ambarja commented 2 years ago

Hi @jordi-1, the code below extracts values from categorical rasters for each polygon. I will close the issue but feel free to open it again if you consider it necessary.

library(rgee)
library(tidyverse)
library(sf)
ee_Initialize()

lima_prov <- st_read(
  "https://github.com/ambarja/gpkg-pe/raw/main/Lima_distritos.gpkg"
) %>%
  select(NOMBDIST) %>%
  head()
Simple feature collection with 6 features and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -77.08842 ymin: -12.14016 xmax: -76.64668 ymax: -11.87454
Geodetic CRS:  WGS 84
           NOMBDIST                           geom
1        LURIGANCHO MULTIPOLYGON (((-76.7268 -1...
2       JESUS MARIA MULTIPOLYGON (((-77.03762 -...
3              LIMA MULTIPOLYGON (((-77.00972 -...
4             LINCE MULTIPOLYGON (((-77.02807 -...
5        MIRAFLORES MULTIPOLYGON (((-77.02739 -...
6 MAGDALENA DEL MAR MULTIPOLYGON (((-77.07179 -...
lulc <- ee$Image("COPERNICUS/Landcover/100m/Proba-V-C3/Global/2019")$
  select("discrete_classification")

ee_area_lulc <- function(img, region, scale = 1000) {
  lista_histo <- list()
  for (i in 1:nrow(region)) {
    region_ee <- region[i, ] %>% sf_as_ee()
    ee_histo <- img$reduceRegion(
      reducer = ee$Reducer$frequencyHistogram(),
      geometry = region_ee,
      scale = scale
    )
    lista_histo[[i]] <- ee_histo$getInfo() %>%
      map_df(., .f = as.data.frame) %>%
      mutate(DISTRITO = region[[i, 1]] %>% as.vector())
  }
  histo_df <- map_df(lista_histo, .f = as.data.frame) %>%
    mutate_if(is.numeric, .funs = function(x) {
      x * 1000 * 1000 / 10000
    }) %>%
    replace(is.na(.), 0) %>%
    rename_with(~ paste0("Pi", sub("X", "", .), "_Ha"), -DISTRITO) %>%
    select(DISTRITO, ends_with("Ha")) %>%
    gather(Category, values, Pi20_Ha:Pi200_Ha)
  return(histo_df)
}
area_Ha <- ee_area_lulc(img = lulc, region = lima_prov)
area_Ha
            DISTRITO Category      values
1         LURIGANCHO  Pi20_Ha  1282.35294
2        JESUS MARIA  Pi20_Ha     0.00000
3               LIMA  Pi20_Ha     0.00000
4              LINCE  Pi20_Ha     0.00000
5         MIRAFLORES  Pi20_Ha     0.00000
6  MAGDALENA DEL MAR  Pi20_Ha     0.00000
7         LURIGANCHO  Pi30_Ha  1409.41176
8        JESUS MARIA  Pi30_Ha     0.00000
9               LIMA  Pi30_Ha     0.00000
10             LINCE  Pi30_Ha     0.00000
11        MIRAFLORES  Pi30_Ha     0.00000
12 MAGDALENA DEL MAR  Pi30_Ha     0.00000
13        LURIGANCHO  Pi50_Ha  2903.52941
14       JESUS MARIA  Pi50_Ha   456.07843
15              LIMA  Pi50_Ha  2270.98039
16             LINCE  Pi50_Ha   286.27451
17        MIRAFLORES  Pi50_Ha   855.29412
18 MAGDALENA DEL MAR  Pi50_Ha   294.90196
19        LURIGANCHO  Pi60_Ha 20227.84314
20       JESUS MARIA  Pi60_Ha     0.00000
21              LIMA  Pi60_Ha     0.00000
22             LINCE  Pi60_Ha     0.00000
23        MIRAFLORES  Pi60_Ha     0.00000
24 MAGDALENA DEL MAR  Pi60_Ha     0.00000
25        LURIGANCHO Pi200_Ha     0.00000
26       JESUS MARIA Pi200_Ha     0.00000
27              LIMA Pi200_Ha     0.00000
28             LINCE Pi200_Ha     0.00000
29        MIRAFLORES Pi200_Ha   106.66667
30 MAGDALENA DEL MAR Pi200_Ha    87.05882
ee_plot <- function(x) {
  p1 <- x %>%
    ggplot(aes(x = Category, y = values, fill = Category)) +
    geom_bar(stat = "identity") +
    scale_fill_viridis_d() +
    facet_wrap(~DISTRITO, scales = "free")
  return(p1)
}

ee_plot(area_Ha)

image