Open jfulponi opened 1 year ago
thanks for your advice, actual when I take the spark_apply it is pretty fast in my experience through the context function, did you user spark_apply with package distributed mode?
this is my code example how to use spark_apply:
st_dump <- function(x){
tryCatch({
wkt_list_ <- stringr::str_split(x, ";")
wkt_list_[[1]] %>%
sf::st_as_sfc() %>%
sf::st_sf() %>%
sf::st_union() %>%
sf::st_cast("POLYGON") %>%
lwgeom::st_astext() %>%
paste0(collapse=";")
},error=function(cond){
message("Here's the original error message st_dump:")
message(cond)
x
},finally={
# message("Some other message at the end")
})
gh_aggr_fun <- function(e, context){
library(dplyr);
for (name in names(context)) assign(name, context[[name]], envir = .GlobalEnv);
e %>%
select(wkt_list_str=wkt_list, g7_list,g6) %>%
mutate(g6_wkt = purrr::map_chr(.x = wkt_list_str,
.f = st_dump))
}
context_f = list(st_dump = st_dump)
sdf_shd_ <- sdf_g6_ %>%
sparklyr::spark_apply(gh_aggr_fun
,columns = c("wkt_list","g7_list","g6","g6_wkt")
,name = 'g6_well_tbl' # cache table name
,memory = TRUE
,context = context_f)
in my example, I convert the raw data to geohash, which is pretty similar like H3. wish this can help you.
here is another way to remove the hole by sf
data.frame object directly:
st_rm_holes <- function(e, context){
library(dplyr);
library(sf);
library(sfheaders);
for (name in names(context)) assign(name, context[[name]], envir = .GlobalEnv);
e %>%
sf::st_as_sf(wkt="wkt_val") %>%
sfheaders::sf_remove_holes() %>%
as.data.frame() %>%
mutate(wkt_val = lwgeom::st_astext(wkt_val))
}
Hi. I'm working with the geospark sparklyr extension with huge spatial datasets (mostly points datasets). When I need to compute a geospatial index like H3, I have to use spark_apply() with the R h3 package, but it usually takes hours. The same task with the h3 expansion for pyspark is a lot faster, obviously because all the cores are working at the same time. Is there a plan to add the H3 index functionalities? I could help in some coding if you want, I think I can be helpful. Thanks.