mapme-initiative / mapme.forest

https://mapme-initiative.github.io/mapme.forest
GNU General Public License v3.0
0 stars 1 forks source link

Substitute functions from raster package for terra #4

Open Jo-Schie opened 3 years ago

Jo-Schie commented 3 years ago

Terra package is the successor of raster and is supposedly better to handle large spatial datasets. I suggest that we substitute all raster functions with terra functions. This might enable us to also process large data without being reliant on GRASS and the rather complicated installation of all the dependencies. I created a small test where I benchmark the performances. You can see that terraseems to even outperform GRASSwhich is quite surprising to me, however I did just compare them for simple zonal statistics.

library(raster)
library(terra)
library(rgrass7)

set.seed(10)

## COMPARE RASTER TO RASTER ZONAL
# zonal statistics using the raster package
r <- raster::raster(ncols=4000, nrows=4000)
raster::values(r) <- runif(ncell(r)) * 1:ncell(r)
z <- r
raster::values(z) <- rep(1:5, each=raster::ncell(z)/5)
# for large files, use a character value rather than a function
system.time({ raster::zonal(r, z, 'sum') })

# zonal statistics using the terra package
x <- terra::rast(nrows=4000, ncols=4000)
terra::values(x) <- runif(ncell(x)) * 1:ncell(x)
zz <- x
terra::values(zz) <- rep(1:5, each=raster::ncell(z)/5)
# calculate zonal
system.time({ terra::zonal(x, zz, 'sum') })

terra::writeRaster(x,"test_raster.tif",overwrite=T)
terra::writeRaster(zz,"test_zonal.tif",overwrite=T,datatype="INT4U")

# zonal statistics using GRASS
initGRASS(gisBase = "/usr/lib/grass78/",
          home = "./",
          addon_base = "./data-raw/addons", 
          override = T)

# Import raster to GRASS
execGRASS(
  "r.in.gdal",
  flags = c("overwrite", "o"),
  parameters = list(input = "test_raster.tif",
                    output = "myraster")
)
# set grass region from raster
execGRASS("g.region",
          parameters = list(raster = "myraster"))

execGRASS(
  "r.in.gdal",
  flags = c("overwrite", "o"),
  parameters = list(input = "test_zonal.tif",
                    output = "myzones")
)

system.time({
execGRASS(
  "r.stats.zonal",
  flags = c("overwrite"),
  parameters = list(base = "myzones",
                    cover = "myraster",
                    method = "sum", output = "mystats")
)
})

@goergen95 : I will try to do a first draft converting some of the scripts when I find a suitable time-spot in the upcoming days. You might want to confirm, nevertheless, the benchmarks and then eventually help and check if I get stuck.

@Ohm-Np : This is just for your information. You can use the package as is in the meantime.

Jo-Schie commented 2 years ago

Just for your information @goergen95 . @Ohm-Np and me are currently working on this. Hope to get it ready soon to test it on big data. @Ohm-Np . Please let us know progress or eventual issues in this threat here...

goergen95 commented 2 years ago

Please consider #6 and #7 when reworking the routine for the CO2 layer calculation.

Jo-Schie commented 2 years ago

@Ohm-Np and @goergen95 : Hi Om, you do not need to consider #6 and #7 for reworking the Terra routines.

Ohm-Np commented 2 years ago

Hi @Jo-Schie,

AreaCalc is working fine now.

But I got some error with LossCalc while running this script:

# calculation of forest loss area
loss_area = LossCalc(inputForestMap = treeCover_yearly,
                     inputLossMap = lossYear,
                     studysite = result_latlon,
                     latlon = TRUE,
                     polyName = "WDPAID",
                     ncores = 4,
                     years = 2000:2020)
# the error
Error in names(x) <- value : 
  'names' attribute [21] must be the same length as the vector [1]

# error in this part of the script LossCalc.R
colnames(LossStats) = paste0("loss_",unis+2000)

What could be the case? If you want to test, you can log in to R with my account.

Jo-Schie commented 2 years ago

Hi @Ohm-Np . I was able to identify the error and fix it. LossCalc should work now. I already pushed the few changes from to the branch. CO2Calc currently does not work but that will require anyways changes from @goergen95 side on the download functions AFAIK.

Could you do two things for me

library(sf)
library(terra)
remotes::install_github("mapme-initiative/mapme.forest",ref = "terra")
library(mapme.forest)

# ---- test download function and prepare data -----
# read in polygons of interest
aoi = st_read(system.file("extdata", "aoi_polys.gpkg", package = "mapme.forest"))

# download GFW data for the area of interest
# raster_files = downloadfGFW(shape = aoi,
#                             basename = "pkgTest",
#                             dataset = "GFC-2018-v1.6",
#                             outdir = "../../datalake/tempdir/pkg_test",
#                             keepTmpFiles = T)
raster_files<-rast(paste0("../../datalake/tempdir/pkg_test/",list.files("../../datalake/tempdir/pkg_test/")))

# we need to reproject the aoi object
aoi = st_transform(aoi, crs = st_crs(raster_files))

# crop the rasters to the extent of the aoi
rasters=terra::crop(raster_files, aoi)

# assigning proper names to the layers and simply plot the results
names(rasters) = c("co2_emission_FAKE", "lossyear", "treecover")
plot(rasters)

#----- test prepTC -----
prep_treeCover = prepTC(inputForestMap = rasters$treecover,
                        thresholdCover = 75,
                        thresholdClump = 20)

# ----- test Yearly forest mask -----
yearlyForestMask = getTM(inputForestMap = prep_treeCover,
                         inputLossMap = rasters$lossyear,
                         years = 2001:2018)

# ---- test Area Calc -----
result_latlon = AreaCalc(yearlyForestMask,
                         studysite = aoi,
                         latlon=TRUE,
                         polyName = "id",
                         ncores = 1,
                         years = 2001:2018)

# ----- test LossCalc -----
loss_area = LossCalc(inputForestMap = yearlyForestMask,
                     inputLossMap = rasters$lossyear,
                     studysite = result_latlon,
                     latlon = TRUE,
                     polyName = "id",
                     ncores = 1,
                     years = 2001:2018)

# ----- test CO2Calc ----
# currently not working
# co2_emission = CO2Calc(inputForestMap = yearlyForestMask,
#                        inputLossMap = rasters$lossyear,
#                        inputCO2Map = rasters$co2_emission_FAKE,
#                        studysite = result_latlon,
#                        polyName = "id",
#                        ncores = 1,
#                        years = 2001:2018)
Jo-Schie commented 2 years ago

i just realized on my private pc that the download function did not work properly there. So if you cannot download and open the data you might want to put this on a hold for a while and tell us here. Maybe we need to merge this branch first to main to get the changes that darius made to the download functions...

goergen95 commented 2 years ago

Hi, you still have errrors with the download function on your PC? I think there is an is sie relativ to Windows so please let me know if it works fo you?

Jo-Schie commented 2 years ago

Hi Darius. no... I fixed it (see my pushes to this branch from Yesterday). However I am not able to fix an issue in the PrepTC function. It is somehow something with the reclassification of the raster in the very beginning of the script. It seems that the terra function does not produce the same output as the raster functions. You may wanna have a look at it. Otherwise all other functions seem to work. If the PrepTC issue can be fixed then I guess we can merge and substitute raster for terra and then do some benchmarks.

Jo-Schie commented 2 years ago

The problem seems to be in the preprocess script in line 64

 # create classification matrix to apply thresholdCover value
    f = function(x) ifelse(x>=thresholdCover, 1, 0)
    inputForestMap = app(inputForestMap, f, cores=ncores)
Jo-Schie commented 2 years ago

I already thought of using maybe the classify function from terra instead but I was not a 100% sure what this part of the script actually does so I didn't try it out yet.