Erase overlaps keeps failing

Va99le commented 1 year ago

Hello,

I genuinely appreciate the "wdpar" package for its usefulness and quality. Unfortunately, I continue to encounter the "topology exception, side location conflict" error while using the "erase overlaps" function. I have read from previous responses that it is necessary to increase the "geometry precision" and "snap tolerance" parameters, both of which I have tried. Even by raising the geometry precision to 10^7, I have been unable to overcome the error. However, considering the required level of precision, I cannot raise the snap tolerance beyond 25-30 meters. When attempting higher values, the computation time increases significantly, up to 50 days.

What I am trying to achieve is a file that contains all the terrestrial and partially marine protected areas of the European continent without any overlaps. Another issue I have noticed, which might be solely based on my own mistaken impression, is that the computation seems to slow down over time.

I would greatly appreciate any guidance on resolving these issues. Thank you for your work

jeffreyhanson commented 1 year ago

Hi @Va99le,

Sorry for my slow response - I'm at a conference at the moment and things have been pretty full on.

Thank you very much for providing all these details! I'm assuming you're using wdpa_clean(), but please correct me if I've got that wrong? Given the size of your datasdet, I would suggest seeing if there's anyway you can simplify the process. For example, do you really need every border of every single PA? If not, maybe you could dissolve the data (e.g., using wdpa_dissolve()) to remove overlaps? If you still need to retain other critera (e.g. IUCN category), you could run the dissolve step for PAs belonging each category seperately and then combine them togeather at the end (eg., using bind_rows()).

Also, yeah, the computation for erasing overlaps definietly slows down over time for larger datadets - this is because it works by starting with nothing and then iteratively adding in protected areas, and within each iteration, it erases any parts of the new protected area that overlap with any previously added protected areas (note that PAs are added in order such that the "best" and "oldest" PAs are at the start).

How does that sound?

Va99le commented 1 year ago

Hi, thank you very much and take your time. Yes, I'm using the wdpa_clean() function. I do need the attributes of every protected area, at least the year of foundation of the area other than the IUCN category. I will definitely try to do what you proposed with wdpa_dissolve for every IUCN category anyway, at least as a temporary solution. I have just one doubt about this procedure: it looks to me like many of the overlapping areas have different IUCN category, following this procedure wouldn't risk to miss many overlapping areas? I guess there would still be some problem of double counting some areas, but i may be wrong or not have understood clearly. Also thanks for explaining the process of erasing overlaps. Just one last question: is there a way to parallelize the process? I could gain access to a powerful server in some days (if i'm lucky), but I have no experience about parallelazing and according to some pages I have read it's not always possibile or useful to parallelize. My worry is that I could have many cores and ram to process but that only 1 would be used without adequate procedures. Again, thank you very much for your time and patience

jeffreyhanson commented 1 year ago

No worries!

Yeah, you're exactly correct. Sorry I forgot to mention, after running the wdpa_dissolve() seperately for each IUCN category, you would then need to run an interative procedure where you start with the best IUCN category, and then erase overlapping bits for the second best IUCN category, merge (bind_rows() / rbind()) the best and second best IUCN category PAs togeather, and then continue on with the third best and so on. How does that sound?

Yeah, you could definitely parallelize the process for running wdpa_dissolve() on PAs for different IUCN categories, but parallelizing the iterative process for combining them all into a single dataset would be much trickier. I suppose/guess/wonder that it might be possible to use some cool graph theory stuff from igraph to identify seperate PAs into seperate chunks for parallel processing.

Also, I suppose you could extend the procedure so that it also accounts for year as well as IUCN category (i.e., wdpa_dissolve() each combination of year and IUCN category seperately, and then use an interative procedure to handle overlaps).

Does that help?

jeffreyhanson commented 1 year ago

In case it helps, here's a quick attempt to prototype the code for parlallizing the first step of running the dissolves. Note that I haven't tested it, so it probably needs some deugging.

# load packages
library(wdpar)
library(sf)
library(dplyr)
library(parallel)

# set parameters
n_threads <- 2

# import data
path <- "my_protected_area_data.gpkg"
x <- read_sf(path)

# find unique IUCN categories
iucn_cat <- unique(x$IUCN_CAT)

# prepare cluster for parallel processing
## initalize variables for parallelizing
temp_dir <- tempdir()

## use PSOCK for windows, FORK for linux/mac
cl <- makeCluster(n_threads, "PSOCK") 
cl <- clusterExport(cl, c("path", "temp_dir", "iucn_cat"))
cl <- clusterEvalQ(cl, {
  library(sf)
  library(wdpar)
  x <- read_sf(path)
})

# run dissolve in parallel dissolve by country
dissolved_paths <- parLapply(cl, seq_along(iucn_cat), function(i) {
  ## subset data
  y <- x[x$IUCN_CAT == iucn_cat[i], , drop = FALSE]
  ## run dissolve
  y <- sf::st_set_precision(y, 1e5)
  y <- sf::st_make_valid(y)
  y <- wdpa_dissolve(y)
  ## save data to temporary directory
  out_path <- paste0(temp_dir, "/data_", i, ".gpkg")
  sf::write_sf(y, out_path)
  ## return path
  out_path
})

# clean up cluster
stopCluster(cl)

# import results from parallelization and merge togeather
result <- bind_rows(lapply(dissolved_paths, sf::read_sf))

Va99le commented 1 year ago

Thank you very much, I'll try it as soon as possibile. It looks like a good solution. I was wondering also if there is a way to use the server calculus power to just make erase_overlaps faster, but I guess mclapply would just repeat the wdpa_clean() procedures on many cores, without being of any help. Anyway, now I have something to work on. Thanks

Va99le commented 1 year ago

Hi, I'm sorry to bother you again. Unfortunately, although the suggested approach is useful for some of my analyses, it's not suitable for all of them. For parts of my study, I need every border of every protected area, without overlaps. I am following an idea, and I just wanted to know if there are or if there might be some issues with it. My first step is to divide in 6 scripts the European continent. I realized that some areas have multiple "iso parent", so I proceeded to put those countries in the same script, to make sure they did not end up being reproposed multiple times in different scripts. I do realize that there will still be some overlapping areas between the adjacent countries, but I guess that it will be much better than pre-processing anyway. After this procedure, I'd merge the 6 resulting vectors together (or 2 per time) and repeat the wdpa_clean(), cleaning procedures. In my opinion, this should allow for much faster processing, since almost all the areas would already be cleared and the overlapping parts would be limited to a tiny fraction. Am I missing something? Does wdpa_clean() risks to create problems if repeated multiple times on a dataset? Is there an easier way to make it? I wanted to thank you for your time and patience, I'm pretty new at R, and I'm at the beginning of my studies, I'm stuck with this file since the beginning of May and I seriously hope that this solution may work.

jeffreyhanson commented 1 year ago

Sorry for the slow response, I'm travelling at the moment.

Yeah, that sounds like a good approach. Although most PAs many PAs would likely be restricted within the boundaries of a single country, I'm sure that there will be some exceptions (e.g., perhaps Biosphere reserves?). So while this approach might help speed things up, I think you'll probably still need to account for potetially overlapping PAs along country borders. How does that sound?

Va99le commented 1 year ago

I'm following this method right now, and it's working pretty good for the moment. It will take some time anyway. Now I'm stuck due to an error (always in _wdpaclean(), that i do not understand, should I post it here or open a new issue? The error I'm getting is: Error in CPL_geos_is_empty(st_geometry(x)) : CHAR() can only be applied to a 'CHARSXP', not a 'NULL' The error arrives after a while of performing the erasing overlaps of protected areas. I performed the wdpa_clean() multiple times on the same dataset without this error, but now it happens everytime i run it. I'm working on Ubuntu 20.04.06, R version 4.3.0.

jeffreyhanson commented 1 year ago

Sorry for my slow response - I'm travelling at the moment.

Yeah, let's keep it in this issue so I can keep track of it - thanks for asking.

Hmm, I haven't seen this error before sorry. I wonder if this is related to the geometry precision? What if you tried increasing it, e,g., to 1e10 or something?

Va99le commented 1 year ago

I tried increasing geometry precision even tohigher values than 1e10, up to 1e12, nothing changed. In my opinion the problem may happen before,maybe something goes wrong when fixing geometry? So that when it comes to erasing overlaps, it encounters a "null" geometry and the process stops. I also tried to uninstall the packages and to redowload them with their dependecies, including dickoa/prepr, still nothing. It's important to note that problems started to occur after a switch of R version, from 4.2.3 to 4.3. I guess there might be a problem of version with one of the required packages, more than a R version problem's, since on another device i got the same error with the precedent R version.

alemid commented 1 year ago

Hi, I appreciate the "wdpar" package for its usefulness and quality, too. Regarding last comment of Va99le, I would like to report same behaviour and error (Error in CPL_geos_is_empty(st_geometry(x)) : CHAR() can only be applied to a 'CHARSXP', not a 'NULL') that was not present before. R version is not changed, but it is possible that some upgrade of dependency library of Os caused the problem. In fact this error appeared after system administrator launched apt-get update and rebooted the server (Ubuntu 20.04). So I tried to install all packages to another server (RedHat 8). It was a little bit more complicated to install all dependency, but at last I obtain the same error. Unfortunatelly I'm stuck now with this error I never met before. I really apprecciate every suggestion you can give us. Thank you again.

Alex

jeffreyhanson commented 1 year ago

Sorry for the slow response.

Hmm, it seems like st_geometry() is returning a NULL for some reason. I'm not sure why or where this is happening in the wdpar (e.g. which line number of which function). If you run wdpa_clean() code again and then immediately run traceback() after it throws an error - then this will help tell us exactly where it's going wrong. Could you please try that and post the output here? Also, @Va99le and @alemid, which countries are you processing? I can try running this on my computer too, to help understand what's going on.

Va99le commented 1 year ago

Hi @jeffreyhanson, this is what i get: 10: CPL_geos_is_empty(st_geometry(x)) 9: st_is_empty(x) 8: st_cast.sfc(x, "GEOMETRYCOLLECTION") 7: st_cast(x, "GEOMETRYCOLLECTION") 6: st_collection_extract.sfc(d, "POLYGON") 5: sf::st_collection_extract(d, "POLYGON") 4: withCallingHandlers(expr, warning = function(w) if (inherits(w, classes)) tryInvokeRestart("muffleWarning")) 3: suppressWarnings(sf::st_collection_extract(d, "POLYGON")) 2: st_erase_overlaps(x[order(x$IUCN_CAT, x$STATUS_YR), ], verbose) 1: wdpa_clean(FRA, geometry_precision = 7e+11, snap_tolerance = 50, erase_overlaps = TRUE)

I'm processing all european countries. I get the error processing every state that has a considerable amount of protected area. The example reported in the traceback is from France, but it happens the same thing with Italy or Germany for example.

jeffreyhanson commented 1 year ago

Thanks @Va99le - that's very helpful! I'll try and take a look at this later today or tomorrow.

jeffreyhanson commented 1 year ago

I think I've got it working now, could you please try installing the fixes branch and see if it works for you? This can be done using the following code:

remotes::install_github("prioritizr/wdpar@fixes")

Va99le commented 1 year ago

It worked! Thank you very much, both for your patience and for your work

jeffreyhanson commented 1 year ago

Awesome! Glad to hear it's finally working - thanks for bearing with me. This PR also contains a fix for download data from protected planet, so once other people have verified that that works too, I'll merge the PR and prepare to submit to CRAN.

alemid commented 1 year ago

You're the man! It worked for me too! Thank you very very much!

jeffreyhanson commented 1 year ago

Brilliant - no worries!

prioritizr / wdpar

Erase overlaps keeps failing #68