Closed karpfen closed 1 year ago
Thanks for the repex! Potentially related to #105, I got a simillar error message (Error : [mask] source and target filename cannot be the same
) on a Linux machine with the same version of packages. The error seems to come from using existing filenames in terra::mask
and/or terra::classify
for rasters with N > 1024*1024
cells. I can't really get my head around why it is sometimes working and sometimes it is not... But changing the source code should solve the problem. Would you like to prepare a PR?
Also, currently the 1024*1024 cells threshold is hard-coded in the GFW routines and not tested against with the unit tests. We need a smarter way to maybe interactively set this value so that we can set it down during tests. But this should not be part of init_portfolio()
in order to not over-complicate things for users.
I can confirm the bug using both terra 1-6-7
and terra 1-6-17
. Can you elaborate a bit more @goergen95 what would need to be changed to prepare a PR? I guess the affected scripts are this and this and potentially this.
From my limited understanding about the inner workings of the package, I would guess that the problem is e.g. in this line here.
I would see potentially two options:
Could you confirm @goergen95 that this could be a feasible approach? Or am I getting something totally wrong here?
ifel()
is an exported function of the {ŧerra}
package, so no need to change that.
I suspect the general issue to occur when we use mask()
or classify()
, etc., on raster files that already exist on disk. I agree with your first option as a viable solution. The original intention here was to implement a memory safe but efficient processing of many assets in parallel. Depending on the available RAM on a given machine and the number of parallel threads used, our idea was to write rasters of a certain size to disk so that smaller rasters than this threshold get processed more efficiently in memory. This is currently set to a fairly random threshold of 1024x1024 pixels because we wanted to wait for the results of @Ohm-Np investigation on parallel processing to set sensible thresholds within the package. By removing the to_disk
logic, we would simply let {terra}
handle which files to write to temporary files and which are not. That is definitely a better option than the current solution where in some settings it causes bugs because the same filenames are used. However, it might not be the most efficient solution for the package's use case because {terra}
uses its own heuristics to determine whether or not to write a certain file to disk. So in my view there are two options:
to_disk
logic and let {terra}
handle the internals. Safe way to go to get rid of the bug, but maybe not the most efficient solution.Thanks for your comment @goergen95. My interpretation of how terra internals work would have been, that anyway raster are stored to disk but normally in the tempfolder. AFAIK this is the main difference to raster
where you would always run out of memory when operating with large rasters. So I am not really sure if this manual storage to disk is required anyway...(?). In that sense removing the line would probably not make any difference regarding memory and CPU usage.
As to your second suggestion. Also there I am unsure if it is really necessary to elaborate sophisticated memory and cpu handling routines. If we were to just change the file name to a clearly distinguishable file name then I guess there would be no differences to the current approach despite the filename...(?). In that sense all could stay the same.
Anyways we should probably just try it out, so maybe I'll just open up a branch for that.
okay. I created a quickfix please see PR #109
As mentioned above I am really not sure if we need the todisk
routine for memory handling. Maybe some tests with larger data would shed a new light on this @karpfen. If so, then we would need to go for the second option outlined above. I think it would be enough then to work on this if we face the problem. The quickfix should enable users to keep on working with the package and we can just wait and see what happens.
@karpfen : Do you want to try the quickfix on your windows machine with the following repex. If it works you may close the issue and @goergen95 might trigger a bugfix update on CRAN. The following works on linux:
remotes::install_github("https://github.com/mapme-initiative/mapme.biodiversity")
library(sf)
library(mapme.biodiversity)
# dir.create("~/mapme.biodiversity.issues/108/")
setwd("~/mapme.biodiversity.issues/108/")
# unlink("~/mapme.biodiversity.issues/108/gfw_lossyear/", recursive=TRUE)
# unlink("~/mapme.biodiversity.issues/108/gfw_treecover/", recursive=TRUE)
faulty_poly <- st_as_sf(st_as_sfc(
"POLYGON ((-59.6354 5.0372, -59.3598 5.0372, -59.3598 5.3175, -59.6354 5.3175, -59.6354 5.0372))"
, crs = 4326
))
tree_cover <- faulty_poly %>%
init_portfolio(2000:2022,
cores = 6,
verbose = TRUE) %>%
get_resources(
resources = c("gfw_treecover", "gfw_lossyear","gfw_emissions"),
vers_treecover = "GFC-2020-v1.8",
vers_lossyear = "GFC-2020-v1.8"
)
tc_area <- tree_cover %>%
calc_indicators("treecover_area",
min_size = 1,
min_cover = 1)
print(tc_area$treecover_area)
tc_area_emission <- tree_cover %>%
calc_indicators("treecover_area_and_emissions",
min_size = 1,
min_cover = 1)
print(tc_area_emission$treecover_area_and_emissions)
tc_emission <- tree_cover %>%
calc_indicators("treecoverloss_emissions",
min_size = 1,
min_cover = 1)
print(tc_emission$treecoverloss_emissions)
Thanks for the explanations and quick fix, @Jo-Schie and @goergen95! I can confirm that with the fix in place, it works as expected for all three indicators on my Windows setup.
I was getting some unexpected results when calculating GFW tree coverage and noticed that it only happens on my windows machine Here is a small example:
When running this code, it prints a warning "Error : [classify] cannot overwrite existing file", which is produced by
terra::mask
in https://github.com/mapme-initiative/mapme.biodiversity/blob/main/R/calc_treecover_area.R Runningterra::mask
by itself works as expected, so I don't think the bug lies there.The same code works as expected when I run it on a Linux machine. Even though I'm not 100% sure yet what causes this bug, maybe some other users can try and reproduce it on their machines to narrow it down.
Versions: R: 4.2.1, terra: 1.6-17, mapme.biodiversity: 0.2.1, sf: 1.0-8