Open jlacko opened 3 years ago
Hello,
I have the same issue with the fable
package. I am able to build the model fit because the data is a tsibble and inherits from tibble, but when I go to forecast, the dataframe is a mable
, a model table or a table of models, and this gets converted to a tibble so the fable package's forecast function doesn't know what to do with it.
It would be nice if there was some way to inherit the class of the object that is being passed to the cluster.
This is a great package by the way. The future
package is much more complicated than this, touchy, and inconsistent when trying to take advantage of nearly all cores. Please do not let this project slide.
Thanks!!
Definitely second the {sf} package inheritance request. I may be wrong, but multidplyr is an incredible opportunity to make massive computations more efficient.
I also would like to see support for sf (or in general other "specialised" tibble classes).
As an aside, to work with sf in a parallel pipe:
grid_sf3 <- grid_sf2 %>%
multidplyr::partition(cluster) %>%
dplyr::mutate(
dist = as.numeric(sf::st_distance(geometry, coast))
) %>%
dplyr::collect() %>%
sf::st_sf()
Hello! does anyone have a better approach? Unfortunately the only thing I can think of is to process with multidplyr
the parts of the "data.frame" without the geometries and on the other hand do the sf
operations to end up doing a join between both tables. I also did some things with furrr
but I have the feeling that there are endless and unreadable lines of code to do something relatively simple. Also, in the worst case, put together a Spark cluster just for a summarize()
.
The
{multidplyr}
package changes class of object distributed to workers tomultidplyr_party_df
. This causes a loss of the "special sauce" that is provided by the{sf}
package for spatial datasets (special interpretation of the geometry column, and information about the coordinate reference system).It would be advantageous for spatial data processing to allow parallelization of some tasks, such as point-in-polygon operation demonstrated in the reprex bellow.
To do so would likely require keeping the class of the distributed object unchanged (or perhaps re-implementing the
sf
methods, in which case the issue would likely fall outside of scope of the{multidplyr}
package).