r-transit / tidytransit

R package for working with GTFS data
https://r-transit.github.io/tidytransit/
150 stars 22 forks source link

Error using raptor for batch travel time matrix calculation #189

Closed MSchroembges closed 1 year ago

MSchroembges commented 2 years ago

I am using raptor to calculate a traveltime matrix for a large gtfs dataset. When trying it out I noticed a difference in the output between calulating a single start id

ttm1 <- raptor(stop_times=stop_times,
                 transfers=transfers,
                 stop_ids = c("de:01001:27380"),
                 arrival = FALSE,
                 time_range = 1*3600,
                 max_transfers = 2,
                 keep="shortest")
)

ttm1 <- ttm1 %>% filter(travel_time <= 3600)

and calulating a batch of 100 stops and then filtering the id of the stop

ttm2 <- raptor(stop_times=stop_times,
                 transfers=transfers,
                 stop_ids = [batch of 100 stop ids],
                 arrival = FALSE,
                 time_range = 1*3600,
                 max_transfers = 2,
                 keep="shortest")
)

ttm2 <- ttm2 %>% filter(travel_time <= 3600)

ttm2 <- ttm2 %>% filter(from_stop_id=="de:01001:27380")

ttm1 contains 3439 rows vs ttm2 only 2972 rows.

I am not sure, whether I am missing out something when passing the batch of stop ids. Is there a limit for calculation?

polettif commented 2 years ago

The documentation on this might be a bit misleading,stop_ids doesn't really work as a "batch" parameter if you use keep="shortest". When you provide multiple stop_ids, raptor() does calculate journeys starting from all stop_ids. However, the intended use case is providing multiple stop_ids that belong to the same stop name or that are close to a location (like the example in the raptor doc).

Now, if you use raptor() with keep="shortest", only one journey is kept for every to_stop_id, not every journey starting from stop_ids. So some to_stop_ids are reached faster from another stop_id in your batch than from "de:01001:27380". So I'd suggest you use keep="all" and then filter the result with something like that:

ttm |>
  arrange(travel_time) |>
  group_by(from_stop_id, to_stop_id) |>
  slice_head(n = 1)