ipeaGIT / r5r

https://ipeagit.github.io/r5r/
Other
180 stars 29 forks source link

Error in transit routing after graph builds #284

Closed alenastern closed 1 year ago

alenastern commented 2 years ago

Hello! I'm using r5r to create origin-destination travel time matrices using r5r::travel_time_matrix(). The problem I'm running into is that my graph is building successfully (returning network.dat and not producing any obvious error messages when I inspect the verbose console output), yet when I try to find transit routes using travel_time_matrix() as shown below, the code runs but doesn't return anything.

I suspect the problem is with some of the input GTFS files, as when I delete some of the files and try rebuilding the graph, I'm able to successfully obtain transit routes using the identical code below. I identified the GTFS files that I suspect are causing the issue using tidytransit::read_gtfs() to validate the GTFS files (see full function below), and identified that some files had non .txt files in the zip file (generally a pdf developer agreement). I cleaned the GTFS files to remove the non .txt files, and then was able to run tidytransit::read_gtfs() successfully on all my GTFS files without error. However, I continued to have the same issue with travel_time_matrix() for transit trips until I removed the cleaned GTFS files for the agencies that were having issues. This makes me think that there is some other issue in one or more of these files affecting the graph build, but I'm not able to figure out what it is. I inspected the setup_r5() output for those agencies specifically and also didn't identify any clear errors.

I've included two travel_time_matrix() calls in the code example below: the first runs on the full set of start and end points and the second on a subset. The first call runs and returns similar console output as the second (notably the "Skipping transit search. No transit stops were reached or no transit modes were selected" messages), but only the second call returns the "java.util.concurrent.ExecutionException".

I also tried inspecting the transit elements of the graph using transit_network_to_sf(r5r_core) and did find that a transit network was present.

My questions are:

  1. Is this a bug in the r5r package?
  2. Is it possible to add error messages that make it easier to spot GTFS data that would cause this issue with transit routing? I need to do this work for 25 different areas so I'm hoping for a way to programmatically identify and fix these issues.

Thanks so much for your time and for creating such a great package!

Example

options(java.parameters = "-Xmx24G")
library(r5r)
library(sf)
library(data.table)
library(here)
library(tidyverse)
library(dotenv)
library(aws.s3)

# build graph
r5r_core <- setup_r5(data_path = data_path, verbose = TRUE)

# load origin/destination points
all_end_points_transit <- read_csv(here(data_dir, 
                                        str_glue("{graph_name}_unique_end_transit.csv")),
                                   col_types = c("geoid" = "c")) %>%
  select(geoid) %>%
  left_join(all_end_points, by =c("geoid" = "id")) %>%
  rename(id = geoid) 

all_start_points_transit <- read_csv(here(data_dir, 
                                          str_glue("{graph_name}_unique_start_transit.csv")),
                                     col_types = c("geoid" = "c")) %>%
  select(geoid) %>%
  left_join(all_start_points, by =c("geoid" = "id")) %>%
  rename(id = geoid) 

# select points in Miami Dade County (likelier to have valid transit routes)
md_start_points_transit <- all_start_points_transit %>%
  filter(substr(id, 1, 5) == "12086")

md_end_points_transit <- all_end_points_transit %>%
  filter(substr(id, 1, 5) == "12086")

# sample 10 start and end points in Miami Dade County
md_start_points_transit_n <- md_start_points_transit %>%
  sample_n(10)

md_end_points_transit_n <- md_end_points_transit  %>%
  sample_n(10)

# set inputs
max_walk_dist <- 804.672 # half a mile in meters
max_trip_duration <- 60
departure_datetime <- as.POSIXct("07-10-2019 08:00:00",
                                 format = "%d-%m-%Y %H:%M:%S")

# conduct routing
ttm_transit <- travel_time_matrix(r5r_core = r5r_core,
                          origins = all_start_points_transit,
                          destinations = all_end_points_transit,
                          mode = c("WALK", "TRANSIT"),
                          departure_datetime = departure_datetime,
                          max_walk_dist = max_walk_dist,
                          max_trip_duration = max_trip_duration,
                          verbose = TRUE,
                          breakdown = TRUE)

ttm_transit_md <- travel_time_matrix(r5r_core = r5r_core,
                                  origins = md_start_points_transit_n,
                                  destinations = md_end_points_transit_n,
                                  mode = c("WALK", "TRANSIT"),
                                  departure_datetime = departure_datetime,
                                  max_walk_dist = max_walk_dist,
                                  max_trip_duration = max_trip_duration,
                                  verbose = TRUE,
                                  breakdown = TRUE)

Function to validate gtfs files:

library(tidytransit)
library(tidyverse)

validate_gtfs <- function(file_name) {

  file_vec <- unlist(str_split(file_name, '/'))
  feed_name <- tail(file_vec, n = 1)

  gtfs <- read_gtfs(file_name)

  validation_result <- attr(gtfs, "validation_result") %>%
    mutate(feed_name = feed_name)

  return(validation_result)
}

Selected console output for second travel_time_matrix() call above:

14:39:50.462 [main] INFO  com.conveyal.r5.analyst.LinkageCache - Seeking linkage for (com.conveyal.r5.analyst.FreeFormPointSet@55237207, StreetLayer(base), WALK) in cache...
14:39:50.462 [main] INFO  com.conveyal.r5.analyst.LinkageCache - Building Linkage for (com.conveyal.r5.analyst.FreeFormPointSet@55237207, StreetLayer(base), WALK) because it was not found in cache.
14:39:50.462 [main] INFO  c.conveyal.r5.streets.LinkedPointSet - Linking pointset to street network...
14:39:50.463 [main] INFO  c.conveyal.r5.streets.LinkedPointSet - Done. Linked 10 of 10 PointSet points to streets.
14:39:50.463 [main] INFO  c.conveyal.r5.streets.LinkedPointSet -       0 of 10 point linkages were copied directly from a source linkage;
14:39:50.463 [main] INFO  c.conveyal.r5.streets.LinkedPointSet -       the remaining 10 linkages were refreshed, of which 0 changed;
14:39:50.463 [main] INFO  c.conveyal.r5.streets.LinkedPointSet -       of which 0 changed to added edges, 0 to baseline edges, and 0 became unlinked.
14:39:50.464 [ForkJoinPool.commonPool-worker-31] INFO  c.c.r5.analyst.TravelTimeComputer - Performing street search for mode: WALK
14:39:50.467 [ForkJoinPool.commonPool-worker-7] INFO  c.c.r5.analyst.TravelTimeComputer - Skipping transit search. No transit stops were reached or no transit modes were selected.
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  : 
  java.util.concurrent.ExecutionException: org.apache.commons.math3.exception.NotStrictlyPositiveException: 0 is smaller than, or equal to, the minimum (0)
dhersz commented 2 years ago

Hi @alenastern, thanks for opening the issue.

Some clarification on the GTFS side of things: R5 uses its own GTFS-handling libraries and we've found it to be quite "picky" with some GTFS inputs. So your suspicion that the some problem with the GTFS might be causing all of this is quite correct, but we can't easily spot problems in the GTFS and raise errors in the r5r side of things.

I recommend validating your GTFS files using the dev version of {gtfstools} that includes a wrapper to the MobilityData Canonical GTFS validator. It not only checks if tables/fields exist and are correctly named, but also perform some "business logic" validation, such as checking if there are too fast trips, if some stops are too far from the trip shapes, etc. Most of these problems shouldn't affect the routing process, but perhaps we can identify something relevant to this issue.

Since you also mention that you'd like to programatically check for problems in the feed, the validation output includes a JSON file that can be used to parse potential breaking problems. On sample data, that's how you'd use it:

remotes::install_github("ipeaGIT/gtfstools")
library(gtfstools)
data_path <- system.file("extdata/spo_gtfs.zip", package = "gtfstools")
output_path <- tempfile("validation_result")
validator_path <- download_validator(tempdir())
validate_gtfs(data_path, output_path, validator_path)

You could then inspect the validation result as HTML or JSON to see if some issue relates to this problem.

rafapereirabr commented 2 years ago

On top of Daniel's suggestion, please note that the output message of r5r says "No transit stops were reached or no transit modes were selected."

I woud suspect that, perhaps, your origin / destination points are not in the same region of the GTFS.zip and OSM.pbf data used to create the network. I would suggest ploting the network data and the origin / destination points to check if that's the case. You could do that with little effort with code along these lines:

library(r5r)
library(ggplot2)

# extract transit network from r5r_core
transit_net <- transit_network_to_sf(r5r_core)

# plot
ggplot() + 
  geom_sf(data=transit_net$routes) + 
  geom_point(md_start_points_transit_n, aes(x=lon, y=lat))
ansoncfit commented 2 years ago

Hi @alenastern -- much of the GTFS logging/validation in R5 is tailored to the Conveyal cloud platform for regional accessibility mapping. For example, we added a set of validation messages to our user interface last year (https://docs.conveyal.com/changelog/2021/05/15/). I'm not sure how well suited this validation is for logging in third-party tools like r5r, but I would be happy to chat more with you if it might be helpful.

alenastern commented 2 years ago

Thank you all so much for your help and I apologize for the delay in following up.

@dhersz - as you predicted, an issue with some GTFS files was causing the issue and the dev version of gtfstools was very helpful in getting to the bottom of things! One challenge I found using the tool for the purposes of validating gtfs inputs for routing in r5r is that validate_gtfs() identifies a large number of errors, only some of which cause transit routing to break in r5r. As a user, it would be amazing to have some field that indicates which errors would be "critical errors" in that way though I realize that may not be feasible! I was able to identify the specific errors in my case through a bit of digging, which I've outlined here in case it's helpful.

@ansoncfit - one challenge I found with debugging these errors is that it seems that r5 deviates a bit from the gtfs static reference in what is valid in both of these cases. I'm not sure if this is more clear in the user interface, but it would be amazing to give users a heads up on where r5's data requirements deviate from the reference (or potentially align r5 with the gtfs static reference in these cases).

Issue 1: the monorail route type

One of the transit agencies in my focus area is a monorail, and includes routes with route_type = 12 in routes.txt, which is the valid gtfs code for monorail but caused the r5r transit routing to return no results as described above.

Issue 2: headway_secs equal to 0

Values of 0 for headway_secs in the frequencies.txt file cause the transit routing to return no results. This case is a bit less clear because the gtfs reference indicates "non-negative integers" are valid values for headway_secs, which I'd interpret to include 0 as a valid value, but I realize that might just be my interpretation!

I hope this is useful to you all in your work creating such tremendously useful software (truly, thank you!). Please let me know if you have any questions and thank you again for your help!

ansoncfit commented 2 years ago

Hi @alenastern, Regarding monorails, starting with v6.6 r5 accepts GTFS route_type = 12 and maps it to the r5 TRAM mode. Further details are available at https://docs.conveyal.com/prepare-inputs#uploading-gtfs-feeds

alenastern commented 2 years ago

Awesome news @ansoncfit! Thank you!

rafapereirabr commented 1 year ago

Hi @alenastern . I guess we can close this issue, right? Or are there any pending problems ?

rafapereirabr commented 1 year ago

closing this issue due to inactivity. Happy to reponen it if the issue persists