Closed fBedecarrats closed 1 year ago
To make it reproducable one needs to add the following line at the beginning of the code. dir.create("data_s3")
I have run the code and I do not get the same error. The code takes long to run. It is at 71% at the moment and progress is slow at the moment. I will write an update once the code ran through.
The code jsut ran through. I do not have the same issues as you.
Is there something on your side? Maybe try a different machine?
It is really weird! I tried again with only my reprex on a new pod (a clean docker container on the kubernetes platform I use, with only standard parameters) and I have the same problem. But I tried with a different set of protected areas polygons (from a local organization) and it works...
Here is the version that doesn't work, only executing the reprex code at the begining of this issue: However, I did it with a different set of PAs (AP_Vahatra_shp.zip) and here it works. That really seems to be a problem with the pods I'm using, but I don't get what it could be. I would need to reproduce it again with the source code of the calc_indicators() function to clearly identify what is throwing the error.
An idea: could this have something to do with the temp file the function is writing to?
@goergen95 Could you have a look into this? I checked the function calc_indicators, but the bug is not obvious, to me at least.
Are you using Linux @fBedecarrats ? Which rversion are you using and which version of the package?
I am indeed running linux (ubuntu 20.04.5), my R version is 4.2.1 and my {mapme.biodiversity} version is 0.2.1 (from CRAN).
The environment it is running into on Kubernetes is defined in a helm chart here: https://github.com/InseeFrLab/helm-charts-datascience/tree/master/charts/rstudio I will try again later this week on my personal computer that also runs Ubuntu 20.04.
I can confirm the bug on my Mac, however the system does not hang up but keeps on processing all polygons. I also tried it on Ubuntu but there had not been any apparent warning messages. I guess they are just not plotted in the terminal.
Not sure yet what the effect on the output will be since it is still processing. It seems that only some polygons are affected. I furthermore tried to process only a single polygon that seemed to be affected, and the somehow unexpected result was that the error did not appear and processing worked fine.
maybe we could try to use wdpa_clean
function to see whether it has to do with the geometries.
did you try to user the other GFW functions e.g. calc_treecover_area
? It would not solve the bug but might be interesting to see and find the source of the problem.
Sorry for the delay, I was traveling, with unfit conditions to run the tests. I cannot re-run the very same reprex anymore, because {wdpar} seems to have a bug (I see that @Jo-Schie had already raised this issue for this package, but it is still not working on some (my) config apparently: https://github.com/prioritizr/wdpar/issues/63). I am now running a slightly modified reprex:
library(dplyr)
library(sf)
library(mapme.biodiversity)
my_shp <- "https://github.com/mapme-initiative/mapme.biodiversity/files/9746104/AP_Vahatra_shp.zip"
download.file(my_shp, destfile = "Vahatra98AP.zip")
unzip("Vahatra98AP.zip", exdir = "Vahatra")
PA_mada <- st_read("Vahatra/AP_Vahatra.shp", quiet = TRUE) %>%
# Il manque la projection (pas de fichier .prj), on la spécifie à la main
st_set_crs("EPSG:4326")
# Discard points and cast multipolygons as polygons
PA_poly <- PA_mada %>%
st_repair_geometry() %>%
filter(st_geometry_type(.) == "MULTIPOLYGON") %>%
st_cast("POLYGON")
# Create portfolio
PA_poly <- init_portfolio(x = PA_poly,
years = 2000:2020,
outdir = "data_s3/mapme",
cores = 16,
add_resources = TRUE,
verbose = TRUE)
# Get GFW data
PA_poly <- get_resources(x = PA_poly,
resources = c("gfw_treecover", "gfw_lossyear",
"gfw_emissions"))
# Compute indicators
PA_poly <- calc_indicators(x = PA_poly,
indicators = "treecover_area_and_emissions",
min_cover = 10, min_size = 1)
I'll report back when it is done
It works fine with these polygons (note that I added a 'st_repair_geometry()' in the process as there were some topology errors in the custom polygons I used this time). This suggests that the problem reported above is related to the specific polygons fetched by WDPA... I need to make a manual download from WDPA to confirm this.
I tried using the same WDPA polygons than mentionned at the beginning of this issue, but this time I use a file I had previously downloaded from portectedplanet.net from the same source, instead of using wdpar. And it works! The only explanation I see is that something was wrong in the way the {wdpar} package was downloading the file on a linux/unix platform !?? In any case, I suggest to close this issue and move on with this. The code I use for the testing :
library(dplyr)
library(sf)
library(mapme.biodiversity)
library(purrr)
library(stringr)
# Manual download of WDPA data (splitted zipped shapefile... yuck!)
unzip("data_s3/WDPA/WDPA_Jul2022_MDG-shapefile.zip", exdir = "data_s3/WDPA")
my_shps_zip <- list.files(path = "data_s3/WDPA", pattern = ".*shp_.\\.zip",
full.names = TRUE)
exdirs <- paste("data_s3/WDPA", seq_along(my_shps_zip), sep = "/")
map2(my_shps_zip, exdirs, ~ unzip(.x, exdir = .y))
my_shps <- list.files(path = "data_s3/WDPA", pattern = "polygons\\.shp$",
full.names = TRUE, recursive = TRUE)
PA_mada <- map_df(my_shps, st_read) %>%
filter(STATUS != "Proposed") %>%
filter(DESIG != "Locally Managed Marine Area", DESIG != "Marine Park")
# Discard points and cast multipolygons as polygons
PA_poly <- PA_mada %>%
st_repair_geometry() %>%
filter(st_geometry_type(.) == "MULTIPOLYGON") %>%
st_cast("POLYGON")
# Create portfolio
PA_poly <- init_portfolio(x = PA_poly,
years = 2000:2020,
outdir = "data_s3/mapme",
cores = 16,
add_resources = TRUE,
verbose = TRUE)
# Get GFW data
PA_poly <- get_resources(x = PA_poly,
resources = c("gfw_treecover", "gfw_lossyear",
"gfw_emissions"))
# Compute indicators
PA_poly <- calc_indicators(x = PA_poly,
indicators = "treecover_area",
min_cover = 10, min_size = 1)
# Compute indicators
PA_poly <- calc_indicators(x = PA_poly,
indicators = "treecover_area_and_emissions",
min_cover = 10, min_size = 1)
And the successful output:
Happy to be of help, although I did not contribute to solve the issue 😃 Leasson Learnt: always check input and output data
Re-Opening since I see some regressions concerning the usage of mask layers in the gfw indicators. The original error message is quite informative. For large rasters (N>1024x1024 pixels) we write to disk and using a mask's layers file location as output to the masking operations causes this error. What I currently do not understand is why it now is actually working. Maybe some changes in recent versions of {terra}
... Could you all please report your versions of {terra}
? @fBedecarrats, any chance you update {terra}
between you first report of this issue and your latest try?
Thank yu for following up. I have terra version 1.6.7
Hi @goergen95 , it is very possible. It's hard for me to check now, as I'm traveling to Mafagascar and this happened on the kubernetes platform I can use when I have a reliable internet connexion. It's also hard to know because I launch each time a conteneurized pod which config evolves with successive updates. I will check and send you some more precise information (probably not next week, sorry).
Also, I don't know if it is related or if it would fit in a different issue, but now the code runs but the deforestation values are inconsistent or null for several PAs when I run it on (AP_Vahatra_shp.zip). When I run it on WDPA Madagascar, all returned values are null. This is only for the GFW indicators, the others seem OK. I am switching to another data for the workshop because I am too late in the preparation process, but when it's done I'll make a proper issue.
hummm. this question regarding terra
is interesting. My Rstudio tells me that installed terra
version is 1.6-17
. However, the current version in the terra repo is reported as 1.5-16
from january...
I am able to process the data mentioned by @fBedecarrats ... 42 PAs out of the 196 have zero values. Here is a repex:
dir.create("mapme.biodiversity.issues/")
dir.create("mapme.biodiversity.issues/105")
setwd("mapme.biodiversity.issues/105/")
library(dplyr)
library(sf)
library(mapme.biodiversity)
library(mapview)
download.file("https://github.com/mapme-initiative/mapme.biodiversity/files/9746104/AP_Vahatra_shp.zip",destfile = "AP_Vahatra_shp.zip")
unzip("AP_Vahatra_shp.zip",exdir = "AP_Vahatra_shp")
PA_poly<-read_sf("AP_Vahatra_shp/AP_Vahatra.shp")
mapView(PA_poly)
# There is no CRS in the data. We assume WGS 84 is correct
st_crs(PA_poly)<-"EPSG:4326"
mapView(PA_poly)
# fix data
PA_poly<-st_make_valid(PA_poly)
# Discard points and cast multipolygons as polygons
PA_poly <- PA_poly %>%
filter(st_geometry_type(.) == "MULTIPOLYGON") %>%
st_cast("POLYGON")
# Create portfolio
PA_poly <- init_portfolio(x = PA_poly,
years = 2000:2020,
outdir = "out",
cores = 6,
add_resources = TRUE,
verbose = TRUE)
# Get GFW data
PA_poly <- get_resources(x = PA_poly,
resources = c("gfw_treecover", "gfw_lossyear",
"gfw_emissions"))
# Compute indicators
PA_poly_results <- calc_indicators(x = PA_poly,
indicators = "treecover_area_and_emissions",
min_cover = 10, min_size = 1)
table(lapply(PA_poly_results$treecover_area_and_emissions, nrow) == 1)
note that your data did not have a CRS @fBedecarrats .
I'll try reinstalling / downgrading terra
and report back.
ok. Funfact. on our linux server with terra 1.6-7
it worked .... however i justr tried calculating treecover_area
and not the emissions. I'll keep on trying and report back. Maybe emissions is the problem and not the terra version.
ok so treecover_area
works on my private pc as well. I guess the error must be somewhere in the combination of both indicators, i.e. in that script. I would recommend @fBedecarrats to try the area estimates if you do not necessarily need the emissions at this point.
@fBedecarrats : The problem seems to be fixed now (see linked issue above). Would you like to test this for your data and eventually close this issue as well?
Salama Mapme team, Sorry for the delay! I'm still traveling in Madagascar, with very scarce access to internet. I'll test and close when I'm back to France next week. Veloma, Florent -- Sent from /e/ Mail.
I have just tested: the problem persists with the CRAN version of {mapme.biodiversity}, but it is solved after installing from github with {remotes}. Many thanks for the solution!
Hi there! I get this error each time I try to compute treecover related indicators. The process runs about 9%, and then I get this error "Error : [mask] source and target filename cannot be the same" (see below a reproducible example and screenshot of errors in console)
The output is as follows:
I get the same errors with "treecover_area" and "treecoverloss_emissions". The accessibility indicators from nelson_et_all and the soil indicators from soilgrids are computed correctly. Any idea where this might come from?