ipeaGIT / gtfs2emis

R package to estimate public transport emissions based on GTFS data
https://ipeagit.github.io/gtfs2emis/
Other
27 stars 2 forks source link

emis_grid function is slow #46

Closed rafapereirabr closed 3 years ago

rafapereirabr commented 3 years ago

The emis_grid() functions seems to be quite slow. This seems to be largely due the performance of sf::st_intersection() when there are too many polygons.

Running the vignette example on my notebook, sf::st_intersection() takes more than 35 minutes (I stopped the process because it was taking too long an using 17GB of RAM).

I could add this piece of code right before the intersection operation to improve code performance by reducing the number of grid cells to intersect. With this addition, the sf::st_intersection() operation takes less than a minute.

  # Keep only polygons that intersect with lines
  intersect_index <- st_intersects(net, grid)
  intersect_index <- unlist(intersect_index)
  intersect_index <- unique(intersect_index)
  grid <- subset(grid, id %in% intersect_index)
ibarraespinosa commented 3 years ago

I think It is slower because sf now imports S2. I'm using projected data such as epsg 3857

Em dom, 4 de jul de 2021 12:00, Rafael H M Pereira @.***> escreveu:

Closed #46 https://github.com/rafapereirabr/gtfs2emis/issues/46 via 6740af7 https://github.com/rafapereirabr/gtfs2emis/commit/6740af701a1b22b01f8a0917fc19257b06479505 .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rafapereirabr/gtfs2emis/issues/46#event-4975211249, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGRM74C5MQTV323AEAHORKTTWBZRJANCNFSM47ZH6BCQ .

rafapereirabr commented 3 years ago

The function can still be made faster by using data.table operation to sum emissions of by road segment of multiple trips. @Joaobazzo , you have already written the code to do this, right?

Joaobazzo commented 3 years ago

So I implemented a function that sums emissions over the linestring through a DT operation.

gps <- gtfs2gps::read_gtfs(system.file("extdata/saopaulo.zip", package = "gtfs2gps")) %>%
  gtfs2gps::filter_by_shape_id(c("51982")) %>%
  gtfs2gps::gtfs2gps() %>%
  gtfs2gps::gps_as_sflinestring() 

ef <- ef_europe(speed = gps$speed,
                veh_type = c("Ubus Std 15 - 18 t","Ubus Artic >18 t"),
                euro = c("IV","V"),
                pollutant = c("CO2","NOx"),
                fuel = "D" ,
                tech =  c("SCR","EGR"),
                slope = 0.0,
                load = 0.5,
                fcorr = 1,
                as_list = TRUE)

emi_df <- emis(fleet_composition =  c(0.7,0.3),
            dist = units::set_units(gps$dist,"km"),
            ef = ef,
            aggregate = FALSE,
            as_list = FALSE)

data1 <- cbind(emi_df,gps) %>% sf::st_as_sf()
grid <- vein::make_grid(spobj = data, width =  0.25 / 102.47) # 500 meters

library(microbenchmark)

mbm <- microbenchmark::microbenchmark("new" = {

  emis_grid(data = sf::st_as_sf(data1),
            grid = grid,
            emi = c("CO2_Euro_IV","CO2_Euro_V","NOx_Euro_IV","NOx_Euro_V"))

},
"old" = {

  emis_grid_old(data =  sf::st_as_sf(data1),
            grid = grid,
            emi = c("CO2_Euro_IV","CO2_Euro_V","NOx_Euro_IV","NOx_Euro_V"))

},times = 5)

mbm

# Unit: seconds
# expr      min       lq     mean   median       uq      max neval
# new 1.206147 1.221862 1.226598 1.228865 1.229715 1.246401     5
# old 7.632974 7.703001 7.974765 7.918362 7.968041 8.651446     5
rafapereirabr commented 3 years ago

That's a big improvement ! Thanks, @Joaobazzo . Please feel free to push these changes to the package and close this issue.

Joaobazzo commented 3 years ago

last update commit 60f62794726926ba070ecb062b1d5def229732c9