Open bfakos opened 1 year ago
Sorry, there were some mistakes, now corrected:
blocks_intersection <- sf::st_intersects(sf::st_geometry(x), sf::st_geometry(blocks))
mask <- lengths(blocks_intersection) != 0
sub_blocks <- blocks[mask, ]
blocks_len <- nrow(sub_blocks)
blocks_df <- as.data.frame(blocks_intersection[mask])
Thanks for the report @bfakos I'll check it and get back to you soon
In the meanwhile, I realized that the two st_intersects() has different direction (i.e., x and y are swapped), but sgbp objects can be transposed by t(), which is really fast (implemented in C).
Dear Roozbeh, I tried to run cv_spatial(x), where nrow(x) == 37000000, and it does not finish within a day, even if iteration == 1 or selection == "systematic". I wanted to understand the source code, find the bottleneck, and modify the code in a way it run parallel in multiple cores, when I found this issue. The code of cv_spatial() calls sf::st_intersects() twice, even though st_intersects() is a really time-consuming function (and can be easily parallelized).
Here, you run "[.sf", which call st_intersects() and create a logical mask from the results (please refer to L325 in sf.R in package "sf"), but drop the results of the predicate function
You call again the time-consuming st_intersects(), with almost the same inputs. I recommend calling st_intersects() once, then calculating the logical mask, and finally subsetting the blocks. E.g. something like this:
Also I suggest parallelizing st_intersects() that could make cv_spatial() much faster. Please let me know if I could collaborate in this development. Have a nice week, Ákos