mapme-initiative / mapme.biodiversity

Efficient analysis of spatial biodiversity datasets for global portfolios
https://mapme-initiative.github.io/mapme.biodiversity/dev
GNU General Public License v3.0
33 stars 7 forks source link

Error : [mask] source and target filename cannot be the same #105

Closed fBedecarrats closed 1 year ago

fBedecarrats commented 2 years ago

Hi there! I get this error each time I try to compute treecover related indicators. The process runs about 9%, and then I get this error "Error : [mask] source and target filename cannot be the same" (see below a reproducible example and screenshot of errors in console)

library(dplyr)
library(sf)
library(mapme.biodiversity)
library(wdpar)

# Downloading Protected areas from Madagascar
PA_mada <- wdpa_fetch("Madagascar", wait = TRUE,
                              download_dir = "data_s3/WDPA") %>%
  filter(STATUS != "Proposed") %>%
  filter(DESIG != "Locally Managed Marine Area", DESIG != "Marine Park") 

# Discard points and cast multipolygons as polygons
PA_poly <- PA_mada %>%
  filter(st_geometry_type(.) == "MULTIPOLYGON") %>%
  st_cast("POLYGON")

# Create portfolio
PA_poly <- init_portfolio(x = PA_poly, 
                                  years = 2000:2020,
                                  outdir = "data_s3/mapme",
                                  cores = 1,
                                  add_resources = TRUE,
                                  verbose = TRUE)
# Get GFW data
PA_poly <- get_resources(x = PA_poly, 
                             resources = c("gfw_treecover", "gfw_lossyear", 
                                           "gfw_emissions"))
# Compute indicators
PA_poly <- calc_indicators(x = PA_poly,
                          indicators = "treecover_area_and_emissions", 
                          min_cover = 10, min_size = 1)

The output is as follows:

Screenshot from 2022-10-04 22-34-22 I get the same errors with "treecover_area" and "treecoverloss_emissions". The accessibility indicators from nelson_et_all and the soil indicators from soilgrids are computed correctly. Any idea where this might come from?

melvinhlwong commented 2 years ago

To make it reproducable one needs to add the following line at the beginning of the code. dir.create("data_s3")

I have run the code and I do not get the same error. The code takes long to run. It is at 71% at the moment and progress is slow at the moment. I will write an update once the code ran through.

melvinhlwong commented 2 years ago

The code jsut ran through. I do not have the same issues as you.

image

Is there something on your side? Maybe try a different machine?

fBedecarrats commented 2 years ago

It is really weird! I tried again with only my reprex on a new pod (a clean docker container on the kubernetes platform I use, with only standard parameters) and I have the same problem. But I tried with a different set of protected areas polygons (from a local organization) and it works...

Here is the version that doesn't work, only executing the reprex code at the begining of this issue: Screenshot from 2022-10-10 14-55-08 However, I did it with a different set of PAs (AP_Vahatra_shp.zip) and here it works. That really seems to be a problem with the pods I'm using, but I don't get what it could be. I would need to reproduce it again with the source code of the calc_indicators() function to clearly identify what is throwing the error. image

fBedecarrats commented 2 years ago

An idea: could this have something to do with the temp file the function is writing to?

melvinhlwong commented 2 years ago

@goergen95 Could you have a look into this? I checked the function calc_indicators, but the bug is not obvious, to me at least.

Jo-Schie commented 2 years ago

Are you using Linux @fBedecarrats ? Which rversion are you using and which version of the package?

fBedecarrats commented 2 years ago

I am indeed running linux (ubuntu 20.04.5), my R version is 4.2.1 and my {mapme.biodiversity} version is 0.2.1 (from CRAN).

fBedecarrats commented 2 years ago

The environment it is running into on Kubernetes is defined in a helm chart here: https://github.com/InseeFrLab/helm-charts-datascience/tree/master/charts/rstudio I will try again later this week on my personal computer that also runs Ubuntu 20.04.

Jo-Schie commented 2 years ago

I can confirm the bug on my Mac, however the system does not hang up but keeps on processing all polygons. I also tried it on Ubuntu but there had not been any apparent warning messages. I guess they are just not plotted in the terminal.

Not sure yet what the effect on the output will be since it is still processing. It seems that only some polygons are affected. I furthermore tried to process only a single polygon that seemed to be affected, and the somehow unexpected result was that the error did not appear and processing worked fine.

maybe we could try to use wdpa_clean function to see whether it has to do with the geometries.

Jo-Schie commented 2 years ago

did you try to user the other GFW functions e.g. calc_treecover_area? It would not solve the bug but might be interesting to see and find the source of the problem.

fBedecarrats commented 2 years ago

Sorry for the delay, I was traveling, with unfit conditions to run the tests. I cannot re-run the very same reprex anymore, because {wdpar} seems to have a bug (I see that @Jo-Schie had already raised this issue for this package, but it is still not working on some (my) config apparently: https://github.com/prioritizr/wdpar/issues/63). I am now running a slightly modified reprex:

library(dplyr)
library(sf)
library(mapme.biodiversity)

my_shp <- "https://github.com/mapme-initiative/mapme.biodiversity/files/9746104/AP_Vahatra_shp.zip"
download.file(my_shp, destfile = "Vahatra98AP.zip")
unzip("Vahatra98AP.zip", exdir = "Vahatra")
PA_mada <- st_read("Vahatra/AP_Vahatra.shp", quiet = TRUE) %>%
  # Il manque la projection (pas de fichier .prj), on la spécifie à la main
  st_set_crs("EPSG:4326")

# Discard points and cast multipolygons as polygons
PA_poly <- PA_mada %>%
  st_repair_geometry() %>%
  filter(st_geometry_type(.) == "MULTIPOLYGON") %>%
  st_cast("POLYGON")

# Create portfolio
PA_poly <- init_portfolio(x = PA_poly, 
                          years = 2000:2020,
                          outdir = "data_s3/mapme",
                          cores = 16,
                          add_resources = TRUE,
                          verbose = TRUE)
# Get GFW data
PA_poly <- get_resources(x = PA_poly, 
                         resources = c("gfw_treecover", "gfw_lossyear", 
                                       "gfw_emissions"))
# Compute indicators
PA_poly <- calc_indicators(x = PA_poly,
                           indicators = "treecover_area_and_emissions", 
                           min_cover = 10, min_size = 1)

I'll report back when it is done

fBedecarrats commented 2 years ago

It works fine with these polygons (note that I added a 'st_repair_geometry()' in the process as there were some topology errors in the custom polygons I used this time). Screenshot from 2022-10-14 14-02-33 This suggests that the problem reported above is related to the specific polygons fetched by WDPA... I need to make a manual download from WDPA to confirm this.

fBedecarrats commented 2 years ago

I tried using the same WDPA polygons than mentionned at the beginning of this issue, but this time I use a file I had previously downloaded from portectedplanet.net from the same source, instead of using wdpar. And it works! The only explanation I see is that something was wrong in the way the {wdpar} package was downloading the file on a linux/unix platform !?? In any case, I suggest to close this issue and move on with this. The code I use for the testing :

library(dplyr)
library(sf)
library(mapme.biodiversity)
library(purrr)
library(stringr)

# Manual download of WDPA data (splitted zipped shapefile... yuck!)
unzip("data_s3/WDPA/WDPA_Jul2022_MDG-shapefile.zip", exdir = "data_s3/WDPA")
my_shps_zip <- list.files(path = "data_s3/WDPA", pattern = ".*shp_.\\.zip", 
                          full.names = TRUE)
exdirs <- paste("data_s3/WDPA", seq_along(my_shps_zip), sep = "/")
map2(my_shps_zip, exdirs, ~ unzip(.x, exdir = .y))
my_shps <- list.files(path = "data_s3/WDPA", pattern = "polygons\\.shp$",
                     full.names = TRUE, recursive = TRUE)
PA_mada <- map_df(my_shps, st_read) %>%
  filter(STATUS != "Proposed") %>%
  filter(DESIG != "Locally Managed Marine Area", DESIG != "Marine Park")

# Discard points and cast multipolygons as polygons
PA_poly <- PA_mada %>%
  st_repair_geometry() %>%
  filter(st_geometry_type(.) == "MULTIPOLYGON") %>%
  st_cast("POLYGON")

# Create portfolio
PA_poly <- init_portfolio(x = PA_poly, 
                          years = 2000:2020,
                          outdir = "data_s3/mapme",
                          cores = 16,
                          add_resources = TRUE,
                          verbose = TRUE)
# Get GFW data
PA_poly <- get_resources(x = PA_poly, 
                         resources = c("gfw_treecover", "gfw_lossyear", 
                                       "gfw_emissions"))

# Compute indicators
PA_poly <- calc_indicators(x = PA_poly,
                           indicators = "treecover_area", 
                           min_cover = 10, min_size = 1)

# Compute indicators
PA_poly <- calc_indicators(x = PA_poly,
                           indicators = "treecover_area_and_emissions", 
                           min_cover = 10, min_size = 1)

And the successful output: Screenshot from 2022-10-14 19-00-26

Jo-Schie commented 2 years ago

Happy to be of help, although I did not contribute to solve the issue 😃 Leasson Learnt: always check input and output data

goergen95 commented 2 years ago

Re-Opening since I see some regressions concerning the usage of mask layers in the gfw indicators. The original error message is quite informative. For large rasters (N>1024x1024 pixels) we write to disk and using a mask's layers file location as output to the masking operations causes this error. What I currently do not understand is why it now is actually working. Maybe some changes in recent versions of {terra}... Could you all please report your versions of {terra}? @fBedecarrats, any chance you update {terra} between you first report of this issue and your latest try?

melvinhlwong commented 2 years ago

Thank yu for following up. I have terra version 1.6.7

fBedecarrats commented 2 years ago

Hi @goergen95 , it is very possible. It's hard for me to check now, as I'm traveling to Mafagascar and this happened on the kubernetes platform I can use when I have a reliable internet connexion. It's also hard to know because I launch each time a conteneurized pod which config evolves with successive updates. I will check and send you some more precise information (probably not next week, sorry).

fBedecarrats commented 2 years ago

Also, I don't know if it is related or if it would fit in a different issue, but now the code runs but the deforestation values are inconsistent or null for several PAs when I run it on (AP_Vahatra_shp.zip). When I run it on WDPA Madagascar, all returned values are null. This is only for the GFW indicators, the others seem OK. I am switching to another data for the workshop because I am too late in the preparation process, but when it's done I'll make a proper issue.

Jo-Schie commented 2 years ago

hummm. this question regarding terrais interesting. My Rstudio tells me that installed terraversion is 1.6-17. However, the current version in the terra repo is reported as 1.5-16 from january...

I am able to process the data mentioned by @fBedecarrats ... 42 PAs out of the 196 have zero values. Here is a repex:

dir.create("mapme.biodiversity.issues/")
dir.create("mapme.biodiversity.issues/105")

setwd("mapme.biodiversity.issues/105/")

library(dplyr)
library(sf)
library(mapme.biodiversity)
library(mapview)

download.file("https://github.com/mapme-initiative/mapme.biodiversity/files/9746104/AP_Vahatra_shp.zip",destfile = "AP_Vahatra_shp.zip")
unzip("AP_Vahatra_shp.zip",exdir = "AP_Vahatra_shp")

PA_poly<-read_sf("AP_Vahatra_shp/AP_Vahatra.shp")
mapView(PA_poly)

# There is no CRS in the data. We assume WGS 84 is correct
st_crs(PA_poly)<-"EPSG:4326"
mapView(PA_poly)

# fix data
PA_poly<-st_make_valid(PA_poly)

# Discard points and cast multipolygons as polygons
PA_poly <- PA_poly %>%
  filter(st_geometry_type(.) == "MULTIPOLYGON") %>%
  st_cast("POLYGON")

# Create portfolio
PA_poly <- init_portfolio(x = PA_poly, 
                          years = 2000:2020,
                          outdir = "out",
                          cores = 6,
                          add_resources = TRUE,
                          verbose = TRUE)
# Get GFW data
PA_poly <- get_resources(x = PA_poly, 
                         resources = c("gfw_treecover", "gfw_lossyear", 
                                       "gfw_emissions"))
# Compute indicators
PA_poly_results <- calc_indicators(x = PA_poly,
                           indicators = "treecover_area_and_emissions", 
                           min_cover = 10, min_size = 1)

table(lapply(PA_poly_results$treecover_area_and_emissions, nrow) == 1)

note that your data did not have a CRS @fBedecarrats .

I'll try reinstalling / downgrading terra and report back.

Jo-Schie commented 2 years ago

ok. Funfact. on our linux server with terra 1.6-7 it worked .... however i justr tried calculating treecover_area and not the emissions. I'll keep on trying and report back. Maybe emissions is the problem and not the terra version.

Jo-Schie commented 2 years ago

ok so treecover_area works on my private pc as well. I guess the error must be somewhere in the combination of both indicators, i.e. in that script. I would recommend @fBedecarrats to try the area estimates if you do not necessarily need the emissions at this point.

Jo-Schie commented 2 years ago

@fBedecarrats : The problem seems to be fixed now (see linked issue above). Would you like to test this for your data and eventually close this issue as well?

fBedecarrats commented 2 years ago

Salama Mapme team, Sorry for the delay! I'm still traveling in Madagascar, with very scarce access to internet. I'll test and close when I'm back to France next week. Veloma, Florent -- Sent from /e/ Mail.

fBedecarrats commented 1 year ago

I have just tested: the problem persists with the CRAN version of {mapme.biodiversity}, but it is solved after installing from github with {remotes}. Many thanks for the solution!